Re: [Python-Dev] bytes / unicode

James Y Knight Tue, 22 Jun 2010 11:09:32 -0700


On Jun 22, 2010, at 1:03 PM, Ian Bicking wrote:

Similarly I'd expect (from experience) that a programmer usingPython to want to take the same approach, sticking with unencodeddata in nearly all situations.

Yeah. This is a real issue I have with the direction Python3 went: itpushes you into decoding everything to unicode early, even when youdon't care -- all you really wanted to do is pass it from one API toanother, with some well-defined transformations, which don't actuallydepend on it having being decoded properly. (For example, extractingthe path from the URL and attempting to open it as a file on thefilesystem.)

This means that Python3 programs can become *more* fragile in the faceof random data you encounter out in the real world, rather than lessfragile, which was the goal of the whole exercise.

The surrogateescape method is a nice workaround for this, but I can'thelp thinking that it might've been better to just treat stuff aspossibly-invalid-but-probably-utf8 byte-strings from input, throughprocessing, to output. It seems kinda too late for that, though: nexttime someone designs a language, they can try that. :)


James

_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes / unicode

Reply via email to