Eli Bendersky <eli...@gmail.com> added the comment: Stefan,
Thanks a lot for taking the time to review the patch. As you correctly say, the current pathch's goal is just to align with existing behavior in the Python implementation of ET. I understand the problem you are describing, but at least it's not a regression vs. previous behavior, while the original problem this issue complains about *is* a regression. I propose to commit this to fix the regression and open a separate issue with the insight you provided. One easy solution could be to just require the encoding to be UTF-8 when passing unicode to the module, and to document it explicitly. Another solution would be to actually fix it in the module itself. If there is a decision to fix it, the fix should then cover both the C and Python implementations, in all possible places (all functions reading XML from strings will also suffer from the same problem, since they get passed to xmlparse_Parse in pyexpat, which just uses PyArg_ParseTuple with the "s#" format - encoding unicode in utf-8 without looking at the XML encoding itself). ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue14246> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com