On 9/14/07, Greg Ewing <[EMAIL PROTECTED]> wrote: > Hagen Fürstenau wrote: > > sys.argv could be of type bytes and sys.arguments (or whatever) could be > > a function taking an encoding parameter (which defaults to UTF-8) and > > returning strings. > > > > Of course that's backwards incompatible and I'm not sure if it's too > > late for something like this now. > > It would be pretty disruptive to ask everyone to change > their habit of thinking of sys.argv as a list of strings.
Would it? We're already asking them to convert between bytes and unicode strings anywhere else I/O is done. I see the command line and environment as merely more forms of input. The only way to parse them into data structures automatically is to keep them as bytes. They are C concepts and can't imply an encoding. As it is, its entirely possible to have -multiple- encodings on a command line at once as well as in environment variables. They're all context sensitive. This isn't going to change. > I would suggest doing it the other way around -- have > sys.argv be an object that automatically converts to > unicode on access, and something else, such as > sys.argbytes, for getting the raw bytes if that fails. I'd leave sys.argv bytes and make sys.args/arguments/argstrs be some best effort parsing. argv is the C/C++ name for bytes, lets not confuse people. similarly for the environment. os.environ dict should be bytes object keys and values (or perhaps a bytes object subclass that refuses null bytes). the os.getenv and os.putenv functions should take care of any best effort decoding/encoding and have an optional getenv encoding= parameter to explicitly specify. -gps _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com