Eli Bendersky <eli...@gmail.com> added the comment:

Stefan,

Thanks a lot for taking the time to review the patch. As you correctly say, the 
current pathch's goal is just to align with existing behavior in the Python 
implementation of ET.

I understand the problem you are describing, but at least it's not a regression 
vs. previous behavior, while the original problem this issue complains about 
*is* a regression.

I propose to commit this to fix the regression and open a separate issue with 
the insight you provided. One easy solution could be to just require the 
encoding to be UTF-8 when passing unicode to the module, and to document it 
explicitly. Another solution would be to actually fix it in the module itself.

If there is a decision to fix it, the fix should then cover both the C and 
Python implementations, in all possible places (all functions reading XML from 
strings will also suffer from the same problem, since they get passed to 
xmlparse_Parse in pyexpat, which just uses PyArg_ParseTuple with the "s#" 
format - encoding unicode in utf-8 without looking at the XML encoding itself).

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue14246>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to