Hi, for testing purpose, i wrote a small scrapy script that should find next page based on <a> text. However if i was able to do it in utf-8, i encountrer some issue with web page encoded in "windows-1250" while my scrapy script and by default text is written in utf-8
let's have a look at: https://www.vsetkyfirmy.sk/autoskoly the bottom pagination display "Next page" in local language "Ďalšie >> <https://www.vsetkyfirmy.sk/autoskoly/strana_2.html>" and i would like to retrieve the complete url of this <a> so if we are on the first page: https://www.vsetkyfirmy.sk/autoskoly/strana_2.html, if we are on the page 2, https://www.vsetkyfirmy.sk/autoskoly/strana_3.html, etc... however this webpage is encoded in "windows-1250" and in my scrapy script i'm confused as i use utf-8 and the following code to retrieve the <a> url: t = Selector(response).xpath('//*[text()[contains(., "Ďalšie")]]/@href').extract() but once done...scrapy says: ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters So what should i do to achieve what i want ? thx -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscr...@googlegroups.com. To post to this group, send email to scrapy-users@googlegroups.com. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.