Re: [Python-Dev] urllib unicode handling

Tom Pinckney Wed, 07 May 2008 13:04:50 -0700

I was assuming urllib.quote/unquote would only be called on textintended to be used in non-hostname portions of the URIs. I'm not sureif this is the actual intent of urllib.quote and perhaps thedocumentation should be updated to specify what precisely it does andthen peopel can decide what parts of URIs it is appropriate to quote/unquote. I don't believe quote/unquote does anything sensical withhostnames today that contain non-printable ascii, so this is no lossof existing functionality.

Re your suggestion that IRIs should be a separate module: I guess mythought is that urllib out of the box should just work with the waywebsites on the web today actually work. Thus, we should make urllibdo the utf-8 encode / decode rather than make users switch to adifferent module for certain URLs and another library for other URLs.

Re the specific issue of how urllib.unquote should work: Perhaps therecould be an optional second argument that specified a content encodingto use when decoding escaped characters? I would propose that thisparameter have a default value of utf-8 since that is what mostwebsites seem to do, but if the author knew that the website they wereusing encoded URLs in iso-8559 then they could unquote using thatscheme.


On May 7, 2008, at 3:10 PM, Martin v. Löwis wrote:

If this is indeed the case, it sounds perfectly legal (according totheRFC) and perfectly practical (as required by numerous popularwebsites)
to have urllib.quote and urllib.quote_plus do an automatic UTF-8
encoding of unicode strings before percent encoding them.
It's probably legal, but I don't understand why you think it's
practical. The DNS lookup then will certainly fail, no?

Regards,
Martin


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] urllib unicode handling

Reply via email to