Eryk Sun <[email protected]> added the comment:
> FYI, I expect cp65001 will be used more widely in near future,
[...]
> It seems use `SetConsoleOutputCP(65001)` and `SetConsoleCP(65001)`.
Unless PYTHONLEGACYWINDOWSSTDIO is defined, Python 3.6+ doesn't use the
console's codepage-based interface (except for low-level os.read and os.write).
Console files uses the wide-character console API internally, and have a
"utf-8" encoding. "cp65001" isn't a factor in this context.
This issue probably occurs due to the encoding returned by
locale.getpreferredencoding(). This calls _locale._getdefaultlocale, which
returns a tuple that mixes the user locale with the system ANSI codepage. For
example, with ANSI set to UTF-8 (Windows 10):
>>> _locale._getdefaultlocale()
('en_GB', 'cp65001')
The Universal CRT special cases CP_UTF8 (codepage 65001) as "utf8" and accepts
"utf-8" as an alias. For example, after setting the ANSI codepage to UTF-8:
>>> locale.setlocale(locale.LC_CTYPE, '')
'English_United Kingdom.utf8'
Python could similarly special case CP_UTF8 as "utf-8" in
_locale._getdefaultlocale.
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue36778>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com