On Sep 18, 2007, at 11:11 AM, Guido van Rossum wrote: > If they contain > non-ASCII bytes I am currently in favor os doing a best-effort > decoding using the default locale encoding, replacing errors with '?' > rather than throwing exception.
One of the more common things to do with command line arguments is open them. So, it'd really be nice if: python -c 'import sys; open(sys.argv[1])' [some filename] would always work, regardless of the current system encoding and what characters make up the filename. Note that filenames are essentially random binary gunk in most Unix systems; the encoding is unspecified, and there can in fact be multiple encodings, even for different directories making up a single file's path. I'd like to propose that python simply assume the external world is likely to be UTF-8, and always decode command-line arguments (and environment vars), and encode for filesystem operations using the roundtrip-able UTF-8b. Even if the system says its encoding is iso-2022 or some other abomination. This has upsides (simple, doesn't trample on PUA codepoints, only needs one new codec, never throws exception in the above example, and really is correct much of the time), and downsides (if the system locale is iso-2022, and all the filenames you're dealing with really are also properly encoded in iso-2022, it might be nice if they decoded into the sensible unicode string, instead of a non-sensical (but still round-trippable) one. I think the advantages outweigh the disadvantages, but the world I live in, using anything other than UTF8 or ASCII is grounds for entry into an insane asylum. ;) James _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com