Hi
I an not sure I am able to follow ...
Here is my spider 
code: https://gist.github.com/balderman1/4079fe01ebdf1990bdf5a6c4e1e08691
The interesting lines are around line 61.

BTW: I was not able to fins the extract_first() method on the selector.

Thanks

Avishay


On Monday, February 20, 2017 at 9:20:42 PM UTC+2, bbur...@fxcompared.com 
wrote:
>
> I tried scraping the text you posted on this page and found the same 
> issue. If you use '.extract()' rather than '.extract_first()' you get an 
> array with the text split on the <-. You can then join the two parts of 
> the text on the '->' character if you want, but I didn't find a way to 
> extract the character and all in a single action. 
>
> I would think there's some solution that involves encoding the response 
> body before extracting, but don't know any offhand.
>
> On Monday, February 20, 2017 at 5:51:30 AM UTC-5, Avishay Balderman wrote:
>>
>> I have a spider that runs on a site and extract the text from specific 
>> table cells.
>> Markup example below.
>>
>> <td class="x">I want this text</td>
>>
>> The real text I am looking for is in Hebrew and contains XML &gt; char.
>> Example: לנגר שרונה <- לוי שולמית ט5 417
>>
>> My xpath expression works fine and I am able to find the relevant table 
>> cells. The problem is that when I extract the text I get only *לנגר 
>> שרונה*
>> which is only part of the text.
>> Is it possible that the '&gt;' inside the text causes the problem?
>> If it is - is there a workaround?
>>
>> Thanks
>>
>> Avishay
>>
>>
>>
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to