On 2/15/06, Fuzzyman <[EMAIL PROTECTED]> wrote: > Forcing the programmer to be aware of encodings, also pushes the same > requirement onto the user (who is often the source of the text in question).
The programmer shouldn't have to be aware of encodings most of the time -- it's the job of the I/O library to determine the end user's (as opposed to the language's) default encoding dynamically and act accordingly. Users who use non-ASCII characters without informing the OS of their encoding are in a world of pain, *unless* they use the OS default encoding (which may vary per locale). If the OS can figure out the default encoding, so can the Python I/O library. Many apps won't have to go beyond this at all. Note that I don't want to use this OS/user default encoding as the default encoding between bytes and strings; once you are reading bytes you are writing "grown-up" code and you will have to be explicit. It's only the I/O library that should automatically encode on write and decode on read. > Currently you can read a text file and process it - making sure that any > changes/requirements only use ascii characters. It therefore doesn't matter > what 8 bit ascii-superset encoding is used in the original. If you force the > programmer to specify the encoding in order to read the file, they would > have to pass that requirement onto their user. Their user is even less > likely to be encoding aware than the programmer. I disagree -- the user most likely has set or received a default encoding when they first got the computer, and that's all they are using. If other tools (notepad, wordpad, emacs, vi etc.) can figure out the encoding, so can Python's I/O library. > What this means, is that for simple programs where the programmer doesn't > want to have to worry about encoding, or can't force the user to be aware, > they will read in the file as bytes. Of course not! > Modules will quickly and inevitably be > created implementing all the 'string methods' for bytes. New programmers > will gravitate to these and the old mess will continue, but with a more > awkward hybrid than before. (String manipulations of byte sequences will no > longer be a core part of the language - and so be harder to use.) This seems an unlikely development if we do the conversions in the I/O library. > Not sure what we can do to obviate this of course... but is this change > actually going to improve the situation or make it worse ? I'm not worried about this scenario. "What if all the programmers in the world suddenly became dumb?" -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com