New submission from Xiang Zhang: In utf8_encoder, when a codecs returns a string with non-ascii characters, it raises encodeerror but the start and end position are not perfect. This seems like an oversight during evolution. Before, utf8_encoder only recognize one surrogate character a time. After 2b5357b38366, it tries to recognize as much as possible a time. Patch also includes some cleanup.
---------- files: utf8_encoder.patch keywords: patch messages: 279712 nosy: haypo, serhiy.storchaka, xiang.zhang priority: normal severity: normal stage: patch review status: open title: Report surrogate characters range in utf8_encoder type: behavior Added file: http://bugs.python.org/file45271/utf8_encoder.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue28561> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com