Re: [Quixote-users] urllib.quote() and cgi.escape()

David Binger Fri, 27 Jan 2006 07:05:08 -0800


On Jan 26, 2006, at 12:12 PM, Patrik Simons wrote:

And here quixote.html.url_quote does it wrong, imho. If you set
quixote.DEFAULT_CHARSET to 'utf-8' and then url_quote a unicodestring,
url_quote should first encode the string as utf-8 and then quote it.


This is interesting.  For a unicode argument, the url_quote in quixote
is really the same as urllib.quote.

It doesn't and quixote breaks with a UnicodeDecodeError on urls like
this one: u'/component?test=\xc4'


On python 2.3.5, urllib.quote(u'\xc4') returns '%C4'.
On python 2.4.2, urllib.quote(u'\xc4') raises KeyError.

From rfc3986:

   When a new URI scheme defines a component that represents textual

data consisting of characters from the Universal Character Set[UCS],

   the data should first be encoded as octets according to the UTF-8
   character encoding [STD63]; then only those octets that do not
   correspond to characters in the unreserved set should be percent-
   encoded.  For example, the character A would be represented as "A",
   the character LATIN CAPITAL LETTER A WITH GRAVE would be represented

as "%C3%80", and the character KATAKANA LETTER A would berepresented

   as "%E3%82%A2".

This suggests to me that urllib.quote should *always* encode unicode
arguments to 'utf8' first.

Is this a bug in urllib.quote?


_______________________________________________
Quixote-users mailing list
[email protected]
http://mail.mems-exchange.org/mailman/listinfo/quixote-users

Re: [Quixote-users] urllib.quote() and cgi.escape()

Reply via email to