Oleg Broytmann <phd <at> phd.pp.ru> writes: > On Fri, May 30, 2008 at 02:19:23PM +0200, Georg Brandl wrote: > > Python 3.0's urllib.quote() and unquote() handle non-ASCII data strangely. > > quote() encodes characters with codepoint < 256 using latin-1, but others > > using utf-8. unquote() decodes everything using latin-1. > > > > Is the correct behavior to always use utf-8? > > Always UTF-8. See > http://en.wikipedia.org/wiki/Percent-encoding#Current_standard
Well, according to your link things are not that simple: """ This requirement was introduced in January 2005 with the publication of RFC 3986. URI schemes introduced before this date are not affected. """ Practically, in the particular case of HTTP, you must probably distinguish between the file path part (before the ? sign) and the query string part (after the ? sign). The file path percent-encoding may depend on the actual filesystem encoding, or the Web server configuration. The query string percent-encoding may depend on the actual Web application being queried, or the programming language in which it's written, or anything else altogether :-) Regards Antoine. _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com