Adam Olsen <[EMAIL PROTECTED]> added the comment: Marc, I don't understand what you're saying. UTF-16's surrogates are not optional. Unicode 2.0 and later require them, and Python is supposed to support it.
Likewise, UCS-4 originally allowed a much larger range of code points, but it no longer does; allowing them would mean supporting only old, archaic versions of the standards (which is clearly not desirable.) You are right in that I shouldn't have said "a pair of ill-formed code units". I should have said "a pair of unassigned code points", which is how UCS-2 always have and always will classify them. Although python may allow ill-formed sequences to be created internally (primarily lone surrogates on UTF-16 builds), it cannot encode or decode them. The standard is clear that these are to be treated as errors, which the .decode()'s "errors" argument controls. You could add a new value for "errors" to pass-through the garbage, but I fail to see a use case for it. _______________________________________ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3297> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com