On 09Feb2016 1801, Andrew Barnert wrote:
On Feb 9, 2016, at 17:37, Steve Dower <pyt...@stevedower.id.au
<mailto:pyt...@stevedower.id.au>> wrote:
Could we perhaps redefine bytes paths on Windows as utf8 and use
Unicode everywhere internally?
When you receive bytes from argv, stdin, a text file, a GUI, a named
pipe, etc., and then use them as a path, Python treating them as UTF-8
would break everything.
Sure, but that's already broken today if you're communicating bytes via
some protocol without manually managing the encoding, in which case you
should be decoding it (and potentially re-encoding to
sys.getfilesystemencoding()).
The problem here is the protocol that Python uses to return bytes paths,
and that protocol is inconsistent between APIs and information is lost.
It really requires going through all the OS calls and either (a) making
them consistently decode bytes to str using the declared FS encoding
(currently 'mbcs', but I see no reason we can't make it 'utf_8'), or (b)
make them consistently use the user's current system locale setting by
always using the *A Win32 APIs rather than the *W ones.
I really don't like the idea of not being able to use bytes in cross
platform code. Unless it's become feasible to use Unicode for lossless
filenames on Linux - last I heard it wasn't.
It is, and has been for years. Surrogate escaping solved the linux
problem. That doesn't help for Python 2, but again, it's too late for
Python 2.
Okay, guess I was operating out of old information. Thanks (and thanks
Chris for the same answer).
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com