On 09Feb2016 1801, Andrew Barnert wrote:
On Feb 9, 2016, at 17:37, Steve Dower <pyt...@stevedower.id.au
<mailto:pyt...@stevedower.id.au>> wrote:

Could we perhaps redefine bytes paths on Windows as utf8 and use
Unicode everywhere internally?

When you receive bytes from argv, stdin, a text file, a GUI, a named
pipe, etc., and then use them as a path, Python treating them as UTF-8
would break everything.

Sure, but that's already broken today if you're communicating bytes via some protocol without manually managing the encoding, in which case you should be decoding it (and potentially re-encoding to sys.getfilesystemencoding()).

The problem here is the protocol that Python uses to return bytes paths, and that protocol is inconsistent between APIs and information is lost. It really requires going through all the OS calls and either (a) making them consistently decode bytes to str using the declared FS encoding (currently 'mbcs', but I see no reason we can't make it 'utf_8'), or (b) make them consistently use the user's current system locale setting by always using the *A Win32 APIs rather than the *W ones.

I really don't like the idea of not being able to use bytes in cross
platform code. Unless it's become feasible to use Unicode for lossless
filenames on Linux - last I heard it wasn't.

It is, and has been for years. Surrogate escaping solved the linux
problem. That doesn't help for Python 2, but again, it's too late for
Python 2.

Okay, guess I was operating out of old information. Thanks (and thanks Chris for the same answer).
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to