Re: [Python-3000] Pre-PEP: Easy Text File Decoding

Paul Prescod Mon, 11 Sep 2006 07:19:00 -0700

On 9/11/06, Marcin 'Qrczak' Kowalczyk <[EMAIL PROTECTED]> wrote:

"Paul Prescod" <[EMAIL PROTECTED]> writes:

> Guido's goal was that quick and dirty text processing should "just
> work" for newbies and encoding-disintererested expert programmers.

What does 'guess' mean for creating files?

I wasn't sure about this one. But on Windows and Mac it seems safe to generate UTF-8-with-BOM. Textedit, VIM and notepad all auto-detect the UTF-8 BOM and do the right thing.

2. Files are created in UTF-8.

   Then files encoded with the locale encoding will be silently
   recoded to UTF-8, causing trouble for further work with the file
   (it can't be even typed to the terminal).

It can on the teriminal on the mac. And on the increasing number of UTF-8 defaulted Linux distributions. Perhaps it should by default use the Unix locale for output, but only on Unix and not on mac/Windows.

I've implemented a hack which allows simple programs to "just work" in
case of UTF-8. It's a modified encoder/decoder which escapes malformed
UTF-8 sequences with '\0' bytes, and thus allows arbitrary byte
sequences to round-trip UTF-8 decoding and encoding. It's not used by
default and it's never used when "UTF-8" is specified explicitly,
because it's not the true UTF-8, but I have an environment variable
which says "if the locale is UTF-8, use the modified UTF-8 as the
default encoding".

That's an interesting idea. I'm not sure if you are proposing it as being applicable to this PEP or not...

Paul Prescod

_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] Pre-PEP: Easy Text File Decoding

Reply via email to