Stefan Behnel added the comment:

I don't think I have my head deep enough in the encodings implementation to say 
that this is the correct/best way to do it, but the patch looks mostly 
reasonable to me and would be a helpful addition.

I have two comments on the pyexpat_encoding_convert() function:

1) I can't see a safe-guard against reading beyond the data buffer. What if s 
already points to the last byte and we are trying to read two or three bytes to 
decode them? I wouldn't be surprised to see that this kind of input can be 

2) Creating a throw-away Unicode object through a named decoder looks like a 
huge overhead for decoding two bytes. It might be considered an optimisation to 
change that, but if you are really trying to parse a longer XML document with 
lots of Japanese text in it (i.e. if you actually *need* this feature), it will 
most likely end up being way too slow to make any real use of it.

I think that both points should be addressed before this gets added.


Python tracker <>
Python-bugs-list mailing list

Reply via email to