> I qualify with a). I believe I understand c) but, as explained in my
> other post, I do not think your reason applies. In fact, I think
> concern for naming rights might suggest that you *not* reuse the name
> for something different. I would have to learn more about the existing
> 'surrogates' handler to judge Antione's suggestion 'surrogates-pass'.
> 'Surrogates-escape' is pretty good for the new handler since, to my
> understanding, it 'escapes' 'bad bytes' by prefixing them with bits that
> push them to the surrogates plane.
See issue 3672. In essence, in python 2.5:
py> u"\ud800".encode("utf-8")
'\xed\xa0\x80'
py> '\xed\xa0\x80'.decode("utf-8")
u'\ud800'
In 3.1,
py> "\ud800".encode("utf-8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in
position 0: surrogates not allowed
py> "\ud800".encode("utf-8","surrogates")
b'\xed\xa0\x80'
py> b'\xed\xa0\x80'.decode("utf-8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2:
illegal encoding
py> b'\xed\xa0\x80'.decode("utf-8","surrogates")
'\ud800'
Regards,
Martin
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com