I may be missing something, but it seems that RFC 3987 (which is about IRIs) basically says:

1) IRIs are identical to URIs except they may have unicode characters in them
2) IRIs must be converted to URIs before being used in HTTP
3) The way to convert IRIs to URIs is to UTF-8 encode the unicode characters in the IRI and then percent encode the resulting octects that are unsafe to have in a URI 4) There's some ambiguity over what to do with the hostname portion of the URI if it hash one (IDN, replace non-ascii characters with dashes etc)

If this is indeed the case, it sounds perfectly legal (according to the RFC) and perfectly practical (as required by numerous popular websites) to have urllib.quote and urllib.quote_plus do an automatic UTF-8 encoding of unicode strings before percent encoding them.

It's not entirely clear to me if people should be calling urllib.quote on hostnames and expecting them to be encoded properly if the hostname contains non-ascii characters. Perhaps the docs should be clarified on this matter?

Similarly, urllib.unquote should precent-decode characters and then attempt to convert the resulting octects from utf-8 to unicode. If that conversion fails, we can assume the octects should be returned as a byte string rather than a unicode string.

On May 7, 2008, at 8:12 AM, Armin Ronacher wrote:

Hi,

Jeroen Ruigrok van der Werven <asmodai <at> in-nomine.org> writes:

Would people object if such functionality got added to urllib?
I would ;-) There are IRIs, just that nobody wrote a useful module for that. There are algorithms in the RFC that can convert URIs to IRIs and the other way
round.  IMO that's the way to go.

Regards,
Armin

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/thomaspinckney3%40gmail.com

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to