Florent Xicluna florent.xicl...@gmail.com added the comment:
I propose to close this as won't fix.
The upgrade to ElementTree 1.3 brought some consistency when dealing with
Unicode and encodings.
The reported behavior was only seen in Python 2.7, when using bytes improperly.
--
nosy:
Ulrich Seidl ulrich.se...@muneda.com added the comment:
I would suggest adding an additional except branch to (at least) the following
functions of ElementTree.py:
* _encode,
* _escape_attrib, and
* _escape_cdata
The except branch could look like:
except (UnicodeDecodeError):
return
New submission from Ulrich Seidl ulrich.se...@muneda.com:
The following code leads to an UnicodeError in python 2.7 while it works fine
in 2.6 2.5:
# -*- coding: latin-1 -*-
import xml.etree.cElementTree as ElementTree
oDoc = ElementTree.fromstring(
'?xml version=1.0
Changes by Brian Curtin cur...@acm.org:
--
nosy: +flox
stage: - needs patch
type: - behavior
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9692
___
Amaury Forgeot d'Arc amaur...@gmail.com added the comment:
IMO the code is not correct: how does ElementTree know which encoding is used
for the attribute value? Even 2.5 prints a different content when the script
is saved with a different encoding.
The line should look like:
oDoc.set(
Ulrich Seidl ulrich.se...@muneda.com added the comment:
Of course, if you use an unicode string it works and of course it would be easy
to switch to unicode for this demo code. Unfortunately, the affected
application is a little bit more complex and it is not that easy to switch to
unicode. I
Amaury Forgeot d'Arc amaur...@gmail.com added the comment:
Testing with python 2.5: oDoc.set(ATTR, ÄÖÜ) uses the encoding used by the
source code (with # -*- coding:;) If I use utf-8 instead, the output is:
ROOT ATTR=#195;#132;#195;#150;#195;#156; /
which contains the numbers of the 3 pairs
Ulrich Seidl ulrich.se...@muneda.com added the comment:
Well, the output of the print is not that interesting as long as ElementTree is
able the restore the former attributes value when reading it in again. The
print was just used to illustrate that an UnicodeDecodeError appears. Think
about