On Tue, Sep 30, 2008 at 2:28 AM, Antoine Pitrou <[EMAIL PROTECTED]> wrote: > Adam Olsen <rhamph <at> gmail.com> writes: >> >> The only way to display that file would be to transform it into some >> other valid unicode string. However, as that string is already valid, >> you've just made any files named after it impossible to open. > > Not if those valid sequences are also properly escaped to avoid collisions. > That's what utf-8b claims to do. > > My view of utf-8b is that if is not really a new codec, but an escaping phase > added in front of utf-8, such that illegal byte sequences get converted to > legal > byte sequences. This is how e.g. XML-escaping works ("&" -> "&", etc.). > The > only difficulty being in choosing sufficiently rare escaping sequences, so > that > readability is not impacted.
The problem is that there's no way (at least nobody has proposed one AFAICT) to tell whether the escaping has been applied. When reading XML, you *know* that you are expected to unescape exactly one level of & escaping. You would never find XML with the unescaping already done for you. But the output of utf-8b is indistinguishable from regular utf-8 so you don't know whether you need to unescape things. -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com