[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

Fredrik Lundh Thu, 11 Mar 2010 11:03:22 -0800

Fredrik Lundh <fred...@effbot.org> added the comment:

> if I don't specify an encoding, I get unicode.  If I do specify an encoding, 
> I get encoded bytes.


You're confusing the XML document encoding with character set encoding.

A serialized (unparsed) XML document is a byte stream, not a string of Unicode 
characters.  And the character set encoding is both embedded in that byte 
stream and affects how it's generated in more than one way; you cannot just 
recode XML documents nilly willy and expect things to work.

A parsed XML document (an infoset) -- for ET, that's the tree of Element 
objects -- does indeed contain Unicode strings, but the transformation from the 
byte stream to the Unicode string doesn't just involve character set decoding; 
there are several other constructs that are handled by the XML parser.

> Ha. There has been a very long temporal window

You should have had plenty of time to fix it, then, right?

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8047>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

Reply via email to