STINNER Victor <victor.stin...@haypocalc.com> added the comment:

This issue is a regression introduced by r72208 to fix the issue #3672.

Attached patch fixes PyUnicode_EncodeUTF8() if 
unicode_encode_call_errorhandler() returns an unicode string (eg. 
backslackreplace error handler). I don't know unicodeobject.c code (very well), 
and my patch should be far from being perfect.

I suppose that the maximum length of an escaped characters is 8 bytes 
(xmlcharrefreplace error error for U+DFFFF). When the first lone surrogate is 
found, reallocate the buffer to size*8 bytes. The escaped character have to be 
an ASCII character or an UnicodeEncodeError is raised.

Note: unicode_encode_ucs1() doesn't have hardcoded for the maximum length ot 
escaped string. Its code might be reused in PyUnicode_EncodeUTF8() to remove 
the hardcoded limits.

----------
keywords: +patch
Added file: http://bugs.python.org/file16503/utf8_surrogate_error.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8092>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to