Re: [Python-3000] Pre-PEP: Easy Text File Decoding

Marcin 'Qrczak' Kowalczyk Sat, 14 Oct 2006 11:46:13 -0700

"Martin v. Löwis" <[EMAIL PROTECTED]> writes:

> Marcin 'Qrczak' Kowalczyk schrieb:
>> I've implemented a hack which allows simple programs to "just work" in
>> case of UTF-8. It's a modified encoder/decoder which escapes malformed
>> UTF-8 sequences with '\0' bytes, and thus allows arbitrary byte
>> sequences to round-trip UTF-8 decoding and encoding. It's not used by
>> default and it's never used when "UTF-8" is specified explicitly,
>> because it's not the true UTF-8, but I have an environment variable
>> which says "if the locale is UTF-8, use the modified UTF-8 as the
>> default encoding".
>
> Actually, I think there is a "better" (i.e. more unicode-like way):
> use the private-use area.


It changes the interpretation of some filenames which are valid UTF-8
(or generally of texts known to not contain '\0'). My hack is a pure
extension since U+0000 can't be produced by standard UTF-8.

> For Py3k, I would like to propose a standard "binary" codec,
> which is an ASCII superset and decodes bytes 00..7F to ASCII,
> and bytes 80..FF to U+EFxx. This would allow to round-trip
> bytes through text.

It's simpler to use the existing ISO-8859-1 encoding.

-- 
   __("<         Marcin Kowalczyk
   \__/       [EMAIL PROTECTED]
    ^^     http://qrnik.knm.org.pl/~qrczak/
_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] Pre-PEP: Easy Text File Decoding

Reply via email to