On Tue, Sep 30, 2008 at 2:28 AM, Antoine Pitrou <[EMAIL PROTECTED]> wrote:
> Adam Olsen <rhamph <at> gmail.com> writes:
>>
>> The only way to display that file would be to transform it into some
>> other valid unicode string.  However, as that string is already valid,
>> you've just made any files named after it impossible to open.
>
> Not if those valid sequences are also properly escaped to avoid collisions.
> That's what utf-8b claims to do.
>
> My view of utf-8b is that if is not really  a new codec, but an escaping phase
> added in front of utf-8, such that illegal byte sequences get converted to 
> legal
> byte sequences. This is how e.g. XML-escaping works ("&" -> "&amp;", etc.). 
> The
> only difficulty being in choosing sufficiently rare escaping sequences, so 
> that
> readability is not impacted.

The problem is that there's no way (at least nobody has proposed one
AFAICT) to tell whether the escaping has been applied. When reading
XML, you *know* that you are expected to unescape exactly one level of
& escaping. You would never find XML with the unescaping already done
for you. But the output of utf-8b is indistinguishable from regular
utf-8 so you don't know whether you need to unescape things.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to