Hi I an not sure I am able to follow ... Here is my spider code: https://gist.github.com/balderman1/4079fe01ebdf1990bdf5a6c4e1e08691 The interesting lines are around line 61.
BTW: I was not able to fins the extract_first() method on the selector. Thanks Avishay On Monday, February 20, 2017 at 7:53:52 PM UTC+2, Arnaud Knobloch wrote: > > Maybe try something like > xpath("div[@class='x']/text()").extract_first().encode('utf8').strip()? > > On Monday, February 20, 2017 at 11:51:30 AM UTC+1, Avishay Balderman wrote: >> >> I have a spider that runs on a site and extract the text from specific >> table cells. >> Markup example below. >> >> <td class="x">I want this text</td> >> >> The real text I am looking for is in Hebrew and contains XML > char. >> Example: לנגר שרונה <- לוי שולמית ט5 417 >> >> My xpath expression works fine and I am able to find the relevant table >> cells. The problem is that when I extract the text I get only *לנגר >> שרונה* >> which is only part of the text. >> Is it possible that the '>' inside the text causes the problem? >> If it is - is there a workaround? >> >> Thanks >> >> Avishay >> >> >> >> >> >> -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscr...@googlegroups.com. To post to this group, send email to scrapy-users@googlegroups.com. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.