ok i found i needed to write u before the text :( On Sunday, September 18, 2016 at 8:01:24 PM UTC+2, Raf Roger wrote: > > Hi, > > for testing purpose, i wrote a small scrapy script that should find next > page based on <a> text. > However if i was able to do it in utf-8, i encountrer some issue with web > page encoded in "windows-1250" while my scrapy script and by default text > is written in utf-8 > > let's have a look at: https://www.vsetkyfirmy.sk/autoskoly > > the bottom pagination display "Next page" in local language "Ďalšie >> > <https://www.vsetkyfirmy.sk/autoskoly/strana_2.html>" and i would like to > retrieve the complete url of this <a> so if we are on the first page: > https://www.vsetkyfirmy.sk/autoskoly/strana_2.html, if we are on the page > 2, https://www.vsetkyfirmy.sk/autoskoly/strana_3.html, etc... > > however this webpage is encoded in "windows-1250" and in my scrapy script > i'm confused as i use utf-8 and the following code to retrieve the <a> url: > t = Selector(response).xpath('//*[text()[contains(., > "Ďalšie")]]/@href').extract() > > but once done...scrapy says: > ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL > bytes or control characters > > So what should i do to achieve what i want ? > > thx >
-- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscr...@googlegroups.com. To post to this group, send email to scrapy-users@googlegroups.com. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.