Guido van Rossum:

> In some sense the safest approach from this POV would be to return
> Unicode as soon as it can't be encoded using the global default
> encoding. IOW normally this would return Unicode for all names
> containing non-ASCII characters.

   On unicode versions of Windows, for attributes like os.listdir,
os.getcwd, sys.argv, and os.environ, which can usefully return unicode
strings, there are 4 options I see:

1) Always return unicode. This is the option I'd be happiest to use,
myself, but expect this choice would change the behaviour of existing
code too much and so produce much unhappiness.

2) Return unicode when the text can not be represented in ASCII. This
will cause a change of behaviour for existing code which deals with
non-ASCII data.

3) Return unicode when the text can not be represented in the default
code page. While this change can lead to breakage because of combining
byte string and unicode strings, it is reasonably safe from the point
of view of data integrity as current code is returning garbage strings
that look like '?????'.

4) Provide two versions of the attribute, one with the current name
returning byte strings and a second with a "u" suffix returning
unicode. This is the least intrusive, requiring explicit changes to
code to receive unicode data. For patch #1231336 I chose this approach
producing sys.argvu and os.environu.

    For os.listdir the current behaviour of returning unicode when its
argument is unicode can be retained but that is not extensible to, for
example, sys.argv.

   Since this issue may affect many attributes a common approach
should be chosen.

   For experimenting with os.listdir, there is a patch for
posixmodule.c at http://www.scintilla.org/difft.txt which implements
(2). To specify the US-ASCII code page, the number 20127 is used as
there is no definition for this in the system headers. To change to
(3) comment out the line with 20127 and uncomment the line with
CP_ACP. Unicode arguments produce unicode results.

   Neil
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to