STINNER Victor added the comment:

Attached patch fixes the UTF-8 decoder to support correctly incremental decoder 
using surrogatepass error handler.

The bug occurs when b'\xed\xa4\x80' is decoded in two parts: the first two 
bytes (b'\xed\xa4'), and then the last byte (b'\x80').

It works as expected if we decode the first byte (b'\xed') and then the two 
last bytes (b'\xa4\x80').

My patch tries to keep best performances for the UTF-8/strict decoder.

@Serhiy: Would you mind to review my patch since you helped to design the fast 
UTF-8 decoder?

----------
keywords: +patch
Added file: http://bugs.python.org/file43911/surrogatepass.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue24214>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to