Eryk Sun <eryk...@gmail.com> added the comment:

> lets not claim that bytes cannot represent everything on a filesystem 
> with an encoding.

Gregory, before changing the filesystem encoding to UTF-8 in Python 3.6, the 
[A]NSI file API (e.g. CreateFileA) was used for bytes paths and the [W]ide 
character file API was used for str paths (e.g. CreateFileW). The ANSI API is a 
set of wrapper functions that automatically translate strings between the ANSI 
code page of the current process and the system's native UTF-16 encoding, 
before and after calling the wide-character function (or a common internal 
function). Starting with Windows 10, the ANSI and OEM code pages of a process 
are finally allowed to be UTF-8 (code page 65001), but it's still considered 
beta and barely used. Usually the ANSI API is set to a legacy single-byte or 
double-byte code page such as 1252 (Western Europe) or 932 (Japanese). 

Natively, Windows is UTF-16, and native Windows filesystems store filenames on 
disk using 16-bit characters. The system doesn't check for valid Unicode, so 
lone surrogate codes are allowed. This is sometimes called a "Wobbly" format. 
In Python it requires the "surrogatepass" error handler.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue43403>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to