"Martin v. Löwis" <[EMAIL PROTECTED]> writes:
> Marcin 'Qrczak' Kowalczyk schrieb:
>> I've implemented a hack which allows simple programs to "just work" in
>> case of UTF-8. It's a modified encoder/decoder which escapes malformed
>> UTF-8 sequences with '\0' bytes, and thus allows arbitrary byte
>> sequences to round-trip UTF-8 decoding and encoding. It's not used by
>> default and it's never used when "UTF-8" is specified explicitly,
>> because it's not the true UTF-8, but I have an environment variable
>> which says "if the locale is UTF-8, use the modified UTF-8 as the
>> default encoding".
>
> Actually, I think there is a "better" (i.e. more unicode-like way):
> use the private-use area.
It changes the interpretation of some filenames which are valid UTF-8
(or generally of texts known to not contain '\0'). My hack is a pure
extension since U+0000 can't be produced by standard UTF-8.
> For Py3k, I would like to propose a standard "binary" codec,
> which is an ASCII superset and decodes bytes 00..7F to ASCII,
> and bytes 80..FF to U+EFxx. This would allow to round-trip
> bytes through text.
It's simpler to use the existing ISO-8859-1 encoding.
--
__("< Marcin Kowalczyk
\__/ [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe:
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com