Hello Scrapy developers, I'm really pleased with Python 3 support in an upcoming Scrapy 1.1 release. I'm thinking about introducing this great release in my blog article and a book now authoring.
I have a question about a limitation of handling non-ASCII URLs. The release note of 1.1 (*http://doc.scrapy.org/en/master/news.html#news-betapy3 <http://doc.scrapy.org/en/master/news.html#news-betapy3>*) says: > * Scrapy has problems handling non-ASCII URLs in Python 3 This limitation seems to be big enough to make Japanese people like me hesitate to use Scrapy 1.1 in Python 3. However testing with simple spiders to crawl non-ASCII URLs ( https://gist.github.com/orangain/3724b86a5dc5b2a279f9), I didn't have any problem. So my question is: * What does the limitation exactly mean? More specifically: * In my understanding, non-ASCII URLs means URLs contain percent-encoded non-ASCII characters. Is this right? Or, does it mean URLs contain non-ASCII characters without percent-encoding? * What kind of problems will occur? * In what component, problems will occur? * In what condition, problems will occur? I've explored the following issues, but I couldn't find a clear answer for my question. HTML entity causes UnicodeEncodeError in LxmlLinkExtractor · Issue #998 · scrapy/scrapy https://github.com/scrapy/scrapy/issues/998 Speedup & fix URL parsing · Issue #1306 · scrapy/scrapy https://github.com/scrapy/scrapy/issues/1306 Exception in LxmLinkExtractor.extract_links 'charmap' codec can't encode character · Issue #1403 · scrapy/scrapy https://github.com/scrapy/scrapy/issues/1403 Exception in LxmLinkExtractor.extract_links 'ascii' codec can't encode character · Issue #1405 · scrapy/scrapy https://github.com/scrapy/scrapy/issues/1405 PY3: add back 3 URL normalization tests by redapple · Pull Request #1664 · scrapy/scrapy https://github.com/scrapy/scrapy/pull/1664 get_base_url fails for non-ascii URLs in Python 3 · Issue #1783 · scrapy/scrapy https://github.com/scrapy/scrapy/issues/1783 Best, orangain -- [email protected] -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
