On 9/14/07, Greg Ewing <[EMAIL PROTECTED]> wrote:
> Hagen Fürstenau wrote:
> > sys.argv could be of type bytes and sys.arguments (or whatever) could be
> > a function taking an encoding parameter (which defaults to UTF-8) and
> > returning strings.
> >
> > Of course that's backwards incompatible and I'm not sure if it's too
> > late for something like this now.
>
> It would be pretty disruptive to ask everyone to change
> their habit of thinking of sys.argv as a list of strings.

Would it?  We're already asking them to convert between bytes and
unicode strings anywhere else I/O is done.  I see the command line and
environment as merely more forms of input.  The only way to parse them
into data structures automatically is to keep them as bytes.  They are
C concepts and can't imply an encoding.  As it is, its entirely
possible to have -multiple- encodings on a command line at once as
well as in environment variables.  They're all context sensitive.
This isn't going to change.

> I would suggest doing it the other way around -- have
> sys.argv be an object that automatically converts to
> unicode on access, and something else, such as
> sys.argbytes, for getting the raw bytes if that fails.

I'd leave sys.argv bytes and make sys.args/arguments/argstrs be some
best effort parsing.  argv is the C/C++ name for bytes, lets not
confuse people.  similarly for the environment.  os.environ dict
should be bytes object keys and values (or perhaps a bytes object
subclass that refuses null bytes).  the os.getenv and os.putenv
functions should take care of any best effort decoding/encoding and
have an optional getenv encoding= parameter to explicitly specify.

-gps
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to