Marc-Andre Lemburg <[email protected]> added the comment: John Machin wrote: > > John Machin <[email protected]> added the comment: > > Unicode has been frozen at 0x10FFFF. That's it. There is no such thing as a > valid 5-byte or 6-byte UTF-8 string.
The UTF-8 codec was written at a time when UTF-8 still included the possibility to have 5 or 6 bytes: http://www.rfc-editor.org/rfc/rfc2279.txt Use of those encodings has always raised an error, though. For error handling purposes it still has to support those possibilities. ---------- _______________________________________ Python tracker <[email protected]> <http://bugs.python.org/issue8271> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
