On 9/5/06, Marcin 'Qrczak' Kowalczyk <[EMAIL PROTECTED]> wrote: > David Hopwood <[EMAIL PROTECTED]> writes: > > > The whole idea of a default encoding is flawed. Ideally there would be > > no default; programmers should be forced to think about the issue > > on a case-by-case basis. In some cases they might choose to open a file > > with the system encoding, but that should be an explicit decision. > > Perhaps this is shows a difference between Unix and Windows culture. > > On Unix there is definitely a default encoding; this is what most good > programs operating on text files assume by default. It would be insane > to have to tell each program separately about the encoding. Locale is > the OS mechanism used to provide this information in a uniform way.
Windows users do not "tell each program separately about the encoding." The encoding varies by file type. It makes no more sense to have a global variable that says "all of my files are Shift-JIS" than it does to say "all of my files are PowerPoint files." Because someday somebody is going to email you a Big-5 file (or a zipfile) and that setting will be wrong. Once you know that a file is of type Zip then you know that the "encoding" is zipped binary. Once you know that it is an Office 2007 file, then you know that the encoding is Zipped XML and that the XML will have its own encoding declaration. Once you know that it is HTML, then you look for meta tags. This is how real-world programs work. They shouldn't guess based on system global variables. May I ask an empircal question? In your experience, what percentage of Macintosh users change the default encoding from US-ASCII to something specific to their culture? What percentage of Ubuntu users change it froom UTF-8 to something specific? If the answers are "few", then we are talking about a feature that will break Windows programs and offer little value to Unix and Macintosh users. If "many" users change the global system encoding on their modern Unix distributions then I propose the following. There should be a property called something like "encodings.recommendedEncoding". On Windows it should be ASCII. On Unix-like platforms it can be inferred from the locale. Programmers who know what it means and want to take advantage of it can do so like this: opentext(filename, "r", encoding=encodings.recommendedEncoding) This is almost exactly how C# does it, though it uses the confusing term "defaut encoding" which implies a default behaviour. The lack of an encoding argument should default to ASCII or perhaps UTF-8. (either one is relatively safe about not processing data incorrectly by accident) Paul Prescod _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
