This is my configuration scrapy.
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import Selector
from play.items import PlayItem
class PlaySpider(CrawlSpider):
name = 'play'
allowed_domains = ['lo.lesko.pl']
start_urls = ['http://www.lo.lesko.pl/']
rules = [Rule(SgmlLinkExtractor(allow=[]), follow=True,
callback='parse_play')]
def parse_play(self, response):
sel = Selector(response)
play = PlayItem()
play['url'] = response.url[0].strip()
# play['title'] = sel.xpath("//title/text()").extract()
play['body'] = sel.select("//body").extract()[0].strip()
return play
I use the strip function because I would like to have a text without tags html
but am I doing something wrong there are html tags in my xml file
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.