"Martin v. Löwis" writes: > > Now, with Python's file system encoding == UTF-8 or any packed EUC, > > and more than a handful of Shift JIS or Big5 characters in file names, > > one is *almost certain* to encounter ASCII as the second byte of a > > multibyte sequence. PEP 383 can't handle this
Ah, I see. Of course, the algorithm not only has to handle the ASCII octet which is erroneous because it can't be a trailing byte, but *also the leading byte that signalled to expect a trailing byte >127*. So the algorithm backs up to the character boundary (which is well-defined for all the "sane" encodings), encode the high byte(s) in the character with lone surrogates, and encode the ASCII as itself (promoted to a Unicode code point). Sorry, you're right, I was just confused. I withdraw the objection as completely mistaken, and apologize for not thinking more carefully in the first place. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com