On 2/13/06, Michael Foord <[EMAIL PROTECTED]> wrote:
> Phillip J. Eby wrote:
> [snip..]
> >
> > In fact, the 'encoding' argument seems useless in the case of str objects,
> > and it seems it should default to latin-1 for unicode objects. The only
> >
> -1 for having an implicit encode that behaves differently to other
> implicit encodes/decodes that happen in Python. Life is confusing enough
> already.
But adding an encoding doesn't help. The str.encode() method always
assumes that the string itself is ASCII-encoded, and that's not good
enough:
>>> "abc".encode("latin-1")
'abc'
>>> "abc".decode("latin-1")
u'abc'
>>> "abc\xf0".decode("latin-1")
u'abc\xf0'
>>> "abc\xf0".encode("latin-1")
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position
3: ordinal not in range(128)
>>>
The right way to look at this is, as Phillip says, to consider
conversion between str and bytes as not an encoding but a data type
change *only*.
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com