Eryk Sun <eryk...@gmail.com> added the comment:
> lets not claim that bytes cannot represent everything on a filesystem > with an encoding. Gregory, before changing the filesystem encoding to UTF-8 in Python 3.6, the [A]NSI file API (e.g. CreateFileA) was used for bytes paths and the [W]ide character file API was used for str paths (e.g. CreateFileW). The ANSI API is a set of wrapper functions that automatically translate strings between the ANSI code page of the current process and the system's native UTF-16 encoding, before and after calling the wide-character function (or a common internal function). Starting with Windows 10, the ANSI and OEM code pages of a process are finally allowed to be UTF-8 (code page 65001), but it's still considered beta and barely used. Usually the ANSI API is set to a legacy single-byte or double-byte code page such as 1252 (Western Europe) or 932 (Japanese). Natively, Windows is UTF-16, and native Windows filesystems store filenames on disk using 16-bit characters. The system doesn't check for valid Unicode, so lone surrogate codes are allowed. This is sometimes called a "Wobbly" format. In Python it requires the "surrogatepass" error handler. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue43403> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com