On Wed, Aug 12, 2009 at 12:05:40AM -0500, Ian Bicking wrote: > Correct -- you can write any set of % encodings, and I don't think it even > has to be able to validly url-decode (e.g., /foo%zzz will work). It > definitely doesn't have to be a valid encoding. However, if you actually > include unicode characters, they will always be encoded as UTF-8 (as goes > with the IRI standard). This is in a case like <a href="/some page">, the > browser will request /some%20page, because it escapes unsafe characters. > Similarly if you request <a href="/fran??ais"> it will encode that ?? in > UTF-8, then url-encode it, even if the page itself is ISO-8859-1. Well, at > least on Firefox. I used this to test: > http://svn.colorstudy.com/home/ianb/wsgi-unicode-test.py
I have run some tests regarding the encoding issue: curl doesn't 'url-encode' its URLs: curl 'http://hostname/fran?ais' ^ <e7> latin-1 character The latin-1 character is send to the server. Lighttpd accepts the URL and even return a file if it exists. Of course if I try with the same characters in UTF-8 it doesn't work. AFAIK RFC 2396 forbid non-ASCII characters in URLs. The problem is that libcurl is quite popular (it used to be the transport library of Webkit/GTK+ for example.) It's hard to discard it as a utterly broken & obscure tool. Many 'simplistic' HTTP clients may have the same problem. Now let's talk a little bit about cookies... Cookies can contain whatever 'binary junk' the server send. RFC 2965 says (http://tools.ietf.org/html/rfc2965#page-5): > The VALUE is opaque to the user agent and may be anything the origin > server chooses to send, possibly in a server-selected printable ASCII > encoding. Also, cookies can contain 'comments' which contains UTF-8 strings. (http://tools.ietf.org/html/rfc2965#page-6): > Characters in value MUST be in UTF-8 encoding. Firefox has no problem with cookies containing non-ASCII characters. It looks like it assumes cookies are encoded using latin-1, since latin-1 characters are displayed correctly in Firebug, but not UTF-8 ones. Cheers, -- Henry Pr?cheur
_______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com