On 9/5/06, David Hopwood <[EMAIL PROTECTED]> wrote:
> Guido van Rossum wrote:
> > On 9/5/06, Paul Prescod < [EMAIL PROTECTED]> wrote:
> >
> >> Beyond all of that: It just seems wrong to me that I could send someone a
> >> bunch of files and a Python program and their results processing them
> >> would be different from mine, despite the fact that we run the same version of
> >> Python on the same operating system.
> >
> > And it seems just as wrong if Python doesn't do what the user expects.
> > If I were a beginning Python user, I'd hate it if I had prepared a
> > simple data file in vi or notepad and my Python program wouldn't read
> > it right because Python's idea of encoding differs from my editor's.
>
> I don't know about vi, but notepad will open and save files that are not in
> the system ("ANSI") encoding just fine. On opening it checks for a BOM and
> auto-detects UTF-8 and UTF-16; on saving it will write a BOM if you choose
> "Unicode" (UTF-16LE), "Unicode big-endian" (UTF-16BE), or UTF-8 in the
> Encoding drop-down box.
>
> This is exactly the behaviour that most users would expect of a well-behaved
> Unicode-aware app. It should be as easy as possible to match this behaviour
> in a Python program.
And this is exactly why I want the determination of the default
encoding (i.e. the encoding to be used when opening a file when no
explicit encoding is specified by the Python code that does the
opening) to be open-ended, rather than picking some standard default
like UTF-8 and saying (like Paul seems to want to say) "this is it".
I never suggested that UTF-8 should be the default. In fact, I think it was very wise of Python 2.x to make ASCII the default and I'm astounded to hear that you regret that decision. "In the face of ambiguity, refuse the temptation to guess."
Python 2.x provided an option to allow users to change the default system-wide and ever since then we've (almost unanimously) counselled users against changing it.
> > Sorry Paul, I appreciate your standards-driven perspective, but in
> > this area I'd rather build in more flexibility than strictly needed,
> > than too little. If it turns out that on a particular platform all
> > files are in UTF-8, making Python *on that platform* always choose
> > UTF-8 is simple enough.
>
> The problem is not the systems where all files are UTF-8, or all files are
> another known charset. The problem is the platforms where half of the files
> are UTF-8 and half are in some other charset, determined either by type or by
> presence of a UTF-8 BOM. This is a *very* common situation, especially for
> European users.
Right. (And Paul appears to be ignorant of this.)
I don't see how the fact that an individual system can have half of the files in one encoding and half in another could argue IN FAVOUR of a system-global default. I would have thought it strengthens my argument AGAINST trying to apply a random encoding to files.
You said:
"If on a particular box
most files are encoded in encoding X, and the user did whatever is
necessary to tell the tools that that's their preferred encoding, I
want Python to honor that encoding when opening text files, unless the
program makes other arrangements explicitly (such as specifying an
explicit encoding as a parameter to open())."
But there is no such thing that "most users do" to tell tool what's their preferred encoding. Most users use some random (to them) operating system default which on Windows is usually wrong and is different (for no particular reason) on the Macintosh than on Linux. Long-time Windows users in this thread cannot even agree what is the default for US English Windows because there is no single default. There are two.
Can we at least agree that if LC_CHARSET is demonstrably wrong most of the time on Windows that we should not use it (at least on Windows)?
Paul Prescod
_______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com