strip() only cuts leading and trailing spaces in the string. I advise you using BeautifulSoup4 (maybe this<http://beautiful-soup-4.readthedocs.org/en/latest/#strings-and-stripped-strings>will help). It will satisfy your needs and will simplify interaction with HTML DOM.
Понеділок, 3 березня 2014 р. 17:47:31 UTC+2 користувач [email protected] написав: > > This is my configuration scrapy. > > > from scrapy.contrib.spiders import CrawlSpider, Rule > from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor > from scrapy.selector import Selector > > from play.items import PlayItem > > class PlaySpider(CrawlSpider): > name = 'play' > allowed_domains = ['lo.lesko.pl'] > start_urls = ['http://www.lo.lesko.pl/'] > rules = [Rule(SgmlLinkExtractor(allow=[]), follow=True, > callback='parse_play')] > > def parse_play(self, response): > sel = Selector(response) > play = PlayItem() > play['url'] = response.url[0].strip() > # play['title'] = sel.xpath("//title/text()").extract() > play['body'] = sel.select("//body").extract()[0].strip() > return play > > > I use the strip function because I would like to have a text without tags > html > but am I doing something wrong there are html tags in my xml file > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/groups/opt_out.
