I may be missing something, but it seems that RFC 3987 (which is about
IRIs) basically says:
1) IRIs are identical to URIs except they may have unicode characters
in them
2) IRIs must be converted to URIs before being used in HTTP
3) The way to convert IRIs to URIs is to UTF-8 encode the unicode
characters in the IRI and then percent encode the resulting octects
that are unsafe to have in a URI
4) There's some ambiguity over what to do with the hostname portion of
the URI if it hash one (IDN, replace non-ascii characters with dashes
etc)
If this is indeed the case, it sounds perfectly legal (according to
the RFC) and perfectly practical (as required by numerous popular
websites) to have urllib.quote and urllib.quote_plus do an automatic
UTF-8 encoding of unicode strings before percent encoding them.
It's not entirely clear to me if people should be calling urllib.quote
on hostnames and expecting them to be encoded properly if the hostname
contains non-ascii characters. Perhaps the docs should be clarified on
this matter?
Similarly, urllib.unquote should precent-decode characters and then
attempt to convert the resulting octects from utf-8 to unicode. If
that conversion fails, we can assume the octects should be returned as
a byte string rather than a unicode string.
On May 7, 2008, at 8:12 AM, Armin Ronacher wrote:
Hi,
Jeroen Ruigrok van der Werven <asmodai <at> in-nomine.org> writes:
Would people object if such functionality got added to urllib?
I would ;-) There are IRIs, just that nobody wrote a useful module
for that.
There are algorithms in the RFC that can convert URIs to IRIs and
the other way
round. IMO that's the way to go.
Regards,
Armin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/thomaspinckney3%40gmail.com
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com