> The UTF-8b representation suffers from the same potential ambiguities as > the PUA characters...
Not at all the same ambiguities. Here, again, the two choices: A. use PUA characters to represent undecodable bytes, in particular for UTF-8 (the PEP actually never proposed this to happen). This introduces an ambiguity: two different files in the same directory may decode to the same string name, if one has the PUA character, and the other has a non-decodable byte that gets decoded to the same PUA character. B. use UTF-8b, representing the byte will ill-formed surrogate codes. The same ambiguity does *NOT* exist. If a file on disk already contains an invalid surrogate code in its file name, then the UTF-8b decoder will recognize this as invalid, and decode it byte-for-byte, into three surrogate codes. Hence, the file names that are different on disk are also different in memory. No ambiguity. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com