Re: harmful str(bytes)

Terry Reedy Fri, 08 Oct 2010 15:58:26 -0700

On 10/8/2010 9:45 AM, Hallvard B Furuseth wrote:

Actually, the implicit contract of __str__ is that it never fails, so
that everything can be printed out (for debugging purposes, etc.).


Nope:

$ python2 -c 'str(u"\u1000")'
Traceback (most recent call last):
   File "<string>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\u1000' in position 
0: ordinal not in range(128)

This could be considered a design bug due to 'str' being used both toproduce readable string representations of objects (perhaps one thatcould be eval'ed) and to convert unicode objects to equivalent stringobjects. which is not the same operation!

The above really should have produced '\u1000'! (the equivavlent of whatstr(bytes) does today). The 'conversion to equivalent str object' optionshould have required an explicit encoding arg rather than defaulting tothe ascii codec. This mistake has been corrected in 3.x, so Yep.

And the equivalent:

$ python2 -c 'unicode("\xA0")'
Traceback (most recent call last):
   File "<string>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal 
not in range(128)


This is an application bug: either bad string or missing decoding arg.

In Python 2, these two UnicodeEncodeErrors made our data safe from code
which used str and unicode objects without checking too carefully which
was which.  Code which sort the types out carefully enough would fail.

In Python 3, that safety only exists for bytes(str), not str(bytes).

If you prefer the buggy 2.x design (and there are *many* tracker bugreports that were fixed by the 3.x change), stick with it.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Re: harmful str(bytes)

Reply via email to