On 9/6/06, Marcin 'Qrczak' Kowalczyk <[EMAIL PROTECTED]> wrote: > "Paul Prescod" <[EMAIL PROTECTED]> writes: > > > Windows users do not "tell each program separately about the > > encoding." The encoding varies by file type. > > There are lots of Unix file types which are based on text files > and their encoding is not specified explicitly.
Of course. But you asserted that the Windows world was insane and I made the point that it is not. They've just consciously and explicitly moved away from the situation where the encoding is inferred from the environment instead of from the file's context. I'm not starting a Windows versus Unix debate. I'm talking about the direction that the world is working. Python need not move forward in that direction but it should not move backwards.Today, Python does not use the locale in inferring a file's type. Python also explicitly chose not to use the locale in inferring string encodings when Unicode was added. I'm not saying that Python programmers should be disallowed from using the system locale. I'm saying that Python itself should "resist the urge to guess" encodings. Python programmers who want to guess could have an easy, one-line way, as C# programmers do. > But they do. It's a fact which is impossible to change with a > decree. I'm not trying to change tools. I'm asking that Python not emulate their broken behaviour. If a Python programmer wants to do so, then they should add one line of code. > > What percentage of Ubuntu users change it froom UTF-8 to something > > specific? > > Why would it matter? I said explicitly why it matters in my first program. If most Unix uses just accept system defaults then the feature is of no value to them. If the feature actively hurts Windows programmers. So you have decreasing value on one side and a steady amount of pain on the other. > If a program can't read my text files or filenames or environment > variables or program invocation arguments, while they are encoded > according to the locale, then the program is broken. Either you are saying that Python is broken today, or you are saying that Python should allow people to write programs that are "not broken" according to your definition. In the former case, I disagree. In the latter case, I agree. The only thing we could disagree on is whether Python's default behaviour should be to guess the encodings based upon locale, despite Python's long history of avoiding guessing in general and guessing encodings in particular. >... > If a language requires extra steps in order to make the locale > encoding work, then it's unhelpful. Most programmers won't bother, > and their programs will work most of the time when they test it, > assuming they use it with English texts. Such programs suddenly break > when used in a non-English speaking country. Loudly and suddenly breaking is better than silently munging data. There are vast application classes where using the system encoding is the wrong thing. For example, an FTP server. An application working with data from a remote socket. An application working with a file from a remote server. An application working with incoming email. Python cannot know whether you are building a client/server application or a script for working with local files. It can't even really know whether a file that it opens is truly local. So it shouldn't guess. > > If the answers are "few", then we are talking about a feature that > > will break Windows programs and offer little value to Unix and > > Macintosh users. > > How does it break more programs than assuming ASCII does? All > encodings suitable as a system encoding are ASCII supersets, so if > a file can't be read using the locale encoding, it can't be read > in ASCII either. If a program expecting ASCII sees an unknown character then it can throw an exception and say: "You haven't thought through the internationalization aspects properly. Read the Python docs for more information." Silently munging data is worse. "In the face of ambiguity, refuse the temptation to guess." Paul Prescod _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com