Re: retrieving url from based on text inside tag with different encoding

Raf Roger Sun, 18 Sep 2016 12:20:44 -0700

ok i found i needed to write u before the text :(

On Sunday, September 18, 2016 at 8:01:24 PM UTC+2, Raf Roger wrote:
>
> Hi,
>
> for testing purpose, i wrote a small scrapy script that should find next 
> page based on <a> text.
> However if i was able to do it in utf-8, i encountrer some issue with web 
> page encoded in "windows-1250" while my scrapy script and by default text 
> is written in utf-8
>
> let's have a look at: https://www.vsetkyfirmy.sk/autoskoly
>
> the bottom pagination display "Next page" in local language "Ďalšie >> 
> <https://www.vsetkyfirmy.sk/autoskoly/strana_2.html>" and i would like to 
> retrieve the complete url of this <a> so if we are on the first page: 
> https://www.vsetkyfirmy.sk/autoskoly/strana_2.html, if we are on the page 
> 2, https://www.vsetkyfirmy.sk/autoskoly/strana_3.html, etc...
>
> however this webpage is encoded in "windows-1250" and in my scrapy script 
> i'm confused as i use utf-8 and the following code to retrieve the <a> url:
> t = Selector(response).xpath('//*[text()[contains(., 
> "Ďalšie")]]/@href').extract()
>
> but once done...scrapy says:
> ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL 
> bytes or control characters
>
> So what should i do to achieve what i want ?
>
> thx
>


-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: retrieving url from based on text inside tag with different encoding

Reply via email to