Ezio Melotti <ezio.melo...@gmail.com> added the comment:

Here's a new patch. Should be complete but I want to test it some more before 
committing.
I decided to follow RFC 3629, putting 0 instead of 5/6 for bytes in range F5-FD 
(we can always put them back in the unlikely case that the Unicode Consortium 
changed its mind) and also for other invalid ranges (e.g. C0-C1). This lead to 
some simplification in the code.

I also found out that, according to RFC 3629, surrogates are considered invalid 
and they can't be encoded/decoded, but the UTF-8 codec actually does it. I 
included tests and fix but I left them commented out because this is out of the 
scope of this patch, and it probably need a discussion on python-dev.

----------
stage: test needed -> patch review
versions: +Python 2.6
Added file: http://bugs.python.org/file16741/issue8271v2.diff

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8271>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to