How i can get only text from body

krzysiel00 Mon, 03 Mar 2014 07:49:36 -0800

This is my configuration scrapy.


from scrapy.contrib.spiders import CrawlSpider, Rule 
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor 
from scrapy.selector import Selector 

from play.items import PlayItem 

class PlaySpider(CrawlSpider): 
    name = 'play' 
    allowed_domains = ['lo.lesko.pl'] 
    start_urls = ['http://www.lo.lesko.pl/'] 
    rules = [Rule(SgmlLinkExtractor(allow=[]), follow=True, 
callback='parse_play')] 

    def parse_play(self, response): 
        sel = Selector(response) 
        play = PlayItem() 
        play['url'] = response.url[0].strip() 
       # play['title'] = sel.xpath("//title/text()").extract() 
        play['body'] = sel.select("//body").extract()[0].strip() 
        return play


I use the strip function because I would like to have a text without tags html 
but am I doing something wrong there are html tags in my xml file

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.

How i can get only text from body

Reply via email to