Re: How i can get only text from body

Svyatoslav Sydorenko Mon, 03 Mar 2014 15:12:12 -0800

strip() only cuts leading and trailing spaces in the string.
I advise you using BeautifulSoup4 (maybe 
this<http://beautiful-soup-4.readthedocs.org/en/latest/#strings-and-stripped-strings>will
 help). It will satisfy your needs and will simplify interaction with 
HTML DOM.


Понеділок, 3 березня 2014 р. 17:47:31 UTC+2 користувач [email protected] 
написав:
>
> This is my configuration scrapy.
>
>
> from scrapy.contrib.spiders import CrawlSpider, Rule 
> from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor 
> from scrapy.selector import Selector 
>
> from play.items import PlayItem 
>
> class PlaySpider(CrawlSpider): 
>     name = 'play' 
>     allowed_domains = ['lo.lesko.pl'] 
>     start_urls = ['http://www.lo.lesko.pl/'] 
>     rules = [Rule(SgmlLinkExtractor(allow=[]), follow=True, 
> callback='parse_play')] 
>
>     def parse_play(self, response): 
>         sel = Selector(response) 
>         play = PlayItem() 
>         play['url'] = response.url[0].strip() 
>        # play['title'] = sel.xpath("//title/text()").extract() 
>         play['body'] = sel.select("//body").extract()[0].strip() 
>         return play
>
>
> I use the strip function because I would like to have a text without tags 
> html 
> but am I doing something wrong there are html tags in my xml file
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.

Re: How i can get only text from body

Reply via email to