Stefan Behnel added the comment: I don't think I have my head deep enough in the encodings implementation to say that this is the correct/best way to do it, but the patch looks mostly reasonable to me and would be a helpful addition.
I have two comments on the pyexpat_encoding_convert() function: 1) I can't see a safe-guard against reading beyond the data buffer. What if s already points to the last byte and we are trying to read two or three bytes to decode them? I wouldn't be surprised to see that this kind of input can be crafted. 2) Creating a throw-away Unicode object through a named decoder looks like a huge overhead for decoding two bytes. It might be considered an optimisation to change that, but if you are really trying to parse a longer XML document with lots of Japanese text in it (i.e. if you actually *need* this feature), it will most likely end up being way too slow to make any real use of it. I think that both points should be addressed before this gets added. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue18059> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com