I tried scraping the text you posted on this page and found the same issue. If you use '.extract()' rather than '.extract_first()' you get an array with the text split on the <-. You can then join the two parts of the text on the '->' character if you want, but I didn't find a way to extract the character and all in a single action.
I would think there's some solution that involves encoding the response body before extracting, but don't know any offhand. On Monday, February 20, 2017 at 5:51:30 AM UTC-5, Avishay Balderman wrote: > > I have a spider that runs on a site and extract the text from specific > table cells. > Markup example below. > > <td class="x">I want this text</td> > > The real text I am looking for is in Hebrew and contains XML > char. > Example: לנגר שרונה <- לוי שולמית ט5 417 > > My xpath expression works fine and I am able to find the relevant table > cells. The problem is that when I extract the text I get only *לנגר שרונה* > which is only part of the text. > Is it possible that the '>' inside the text causes the problem? > If it is - is there a workaround? > > Thanks > > Avishay > > > > > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscr...@googlegroups.com. To post to this group, send email to scrapy-users@googlegroups.com. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.