On Jun 22, 2010, at 1:03 PM, Ian Bicking wrote:
Similarly I'd expect (from experience) that a programmer using
Python to want to take the same approach, sticking with unencoded
data in nearly all situations.
Yeah. This is a real issue I have with the direction Python3 went: it
pushes you into decoding everything to unicode early, even when you
don't care -- all you really wanted to do is pass it from one API to
another, with some well-defined transformations, which don't actually
depend on it having being decoded properly. (For example, extracting
the path from the URL and attempting to open it as a file on the
filesystem.)
This means that Python3 programs can become *more* fragile in the face
of random data you encounter out in the real world, rather than less
fragile, which was the goal of the whole exercise.
The surrogateescape method is a nice workaround for this, but I can't
help thinking that it might've been better to just treat stuff as
possibly-invalid-but-probably-utf8 byte-strings from input, through
processing, to output. It seems kinda too late for that, though: next
time someone designs a language, they can try that. :)
James
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com