MRAB wrote: > Martin v. Löwis wrote: > [snip] >> To convert non-decodable bytes, a new error handler "python-escape" is >> introduced, which decodes non-decodable bytes using into a private-use >> character U+F01xx, which is believed to not conflict with private-use >> characters that currently exist in Python codecs. >> >> The error handler interface is extended to allow the encode error >> handler to return byte strings immediately, in addition to returning >> Unicode strings which then get encoded again. >> >> If the locale's encoding is UTF-8, the file system encoding is set to >> a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes >> (which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF. >> > If the byte stream happens to include a sequence which decodes to > U+F01xx, shouldn't that raise an exception?
I apparently have not expressed it clearly, so please help me improve the text. What I mean is this: - if the environment encoding (for lack of better name) is UTF-8, Python stops using the utf-8 codec under this PEP, and switches to the utf-8b codec. - otherwise (env encoding is not utf-8), undecodable bytes get decoded with the error handler. In this case, U+F01xx will not occur in the byte stream, since no other codec ever produces this PUA character (this is not fully true - UTF-16 may also produce PUA characters, but they can't appear as env encodings). So the case you are referring to should not happen. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com