On 10/8/2010 9:45 AM, Hallvard B Furuseth wrote:
Actually, the implicit contract of __str__ is that it never fails, so
that everything can be printed out (for debugging purposes, etc.).
Nope:
$ python2 -c 'str(u"\u1000")'
Traceback (most recent call last):
File "<string>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\u1000' in position
0: ordinal not in range(128)
This could be considered a design bug due to 'str' being used both to
produce readable string representations of objects (perhaps one that
could be eval'ed) and to convert unicode objects to equivalent string
objects. which is not the same operation!
The above really should have produced '\u1000'! (the equivavlent of what
str(bytes) does today). The 'conversion to equivalent str object' option
should have required an explicit encoding arg rather than defaulting to
the ascii codec. This mistake has been corrected in 3.x, so Yep.
And the equivalent:
$ python2 -c 'unicode("\xA0")'
Traceback (most recent call last):
File "<string>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal
not in range(128)
This is an application bug: either bad string or missing decoding arg.
In Python 2, these two UnicodeEncodeErrors made our data safe from code
which used str and unicode objects without checking too carefully which
was which. Code which sort the types out carefully enough would fail.
In Python 3, that safety only exists for bytes(str), not str(bytes).
If you prefer the buggy 2.x design (and there are *many* tracker bug
reports that were fixed by the 3.x change), stick with it.
--
Terry Jan Reedy
--
http://mail.python.org/mailman/listinfo/python-list