Re: [Python-Dev] Windows: Remove support of bytes filenames in theos module?

Steve Dower Tue, 09 Feb 2016 20:42:08 -0800

On 09Feb2016 2017, Stephen J. Turnbull wrote:

  > The problem here is the protocol that Python uses to return bytes paths,
  > and that protocol is inconsistent between APIs and information is lost.


No, the problem is that the necessary information simply isn't always
available.  Not even today: think removable media, especially archival
content.  Also network file systems: I don't know if it still happens,
but I've seen Shift JIS, GB2312, and KOI8-R all in the same directory,
and sometimes two of those in the *same path*.  (Don't ask me how
non-malicious users managed to do the latter!)

But if we return bytes paths and the user passes them back in unchanged,that should be irrelevant. The earlier issue was that that doesn't work(e.g. a bytes path from os.scandir couldn't be passed back into open()).

  > It really requires going through all the OS calls and either (a) making
  > them consistently decode bytes to str using the declared FS encoding
  > (currently 'mbcs', but I see no reason we can't make it 'utf_8'),

If it were that easy, it would have been done two decades ago.  I'm no
fan of Windows[1], but it's obvious that Microsoft has devoted
enormous amounts of brainpower to the problem of encoding
rationalization since the early 90s.  I don't think they would have
missed this idea.

I meant with Python's calls into the API. Anywhere Python does theconversion from bytes to LPCWSTR (the UTF-16 type) there's a chanceit'll be wrong.

Your earlier comments (regarding encoding/decoding to/from Unicode,which I didn't have anything valuable to add to) basically reflect thefact that developers need to treat bytes paths as blobs on all platformsand the core Python runtime needs to obtain and use them consistently.Which means *always* using the Win32 *A APIs and never doing aconversion ourselves.

Microsoft's solution here is the user's active code page, much like*nix's solution as I understand it, except that where *nix will convert_to_ the encoding as a normalized form, Windows will convert _from_ theencoding to its UTF-16 "normalized" form. Back-compat concerns haveprevented any significant changes being made here, otherwise therewouldn't be a 'bytes' interface at all. (Or more likely, everythingwould be UTF-8 based, but back-compat is king in Windows-land.)


Cheers,
Steve
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Windows: Remove support of bytes filenames in theos module?

Reply via email to