Eryk Sun <[email protected]> added the comment:
> Even some well known locale names still use the utf-8 code page. Most
> seem to uncommon, but at least es-BR (Brazil) does and would still
> fall victim to these UCRT bugs.
es-BR is a custom locale for the Spanish language in Brazil, as opposed to the
common Portuguese locale (pt-BR). It's a Unicode-only locale, which means its
ANSI codepage is 0. Since 0 is CP_ACP, its effective ANSI codepage is the
system or process ANSI codepage.
For example:
>>> kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)
>>> buf = (ctypes.c_wchar * 10)()
Portuguese in Brazil uses codepage 1252 as its ANSI codepage:
>>> n = kernel32.GetLocaleInfoEx('pt-BR', 0x1004, buf, 10)
>>> buf.value
'1252'
Spanish in Brazil uses CP_ACP:
>>> n = kernel32.GetLocaleInfoEx('es-BR', 0x1004, buf, 10)
>>> buf.value
'0'
hi-IN (Hindi, India) is a common Unicode-only locale:
>>> n = kernel32.GetLocaleInfoEx('hi-IN', 0x1004, buf, 10)
>>> buf.value
'0'
ucrt has switched to using UTF-8 for Unicode-only locales:
>>> locale.setlocale(locale.LC_CTYPE, 'hi_IN')
'hi_IN'
>>> ucrt = ctypes.CDLL('ucrtbase', use_errno=True)
>>> ucrt.___lc_codepage_func()
65001
Note that ucrt uses UTF-8 for Unicode-only locales only when using an
explicitly named locale such as "hi_IN", "Hindi_India" or even just "Hindi". On
the other hand, if a Unicode-only locale is used implicitly, ucrt instead uses
the system ANSI codepage:
>>> locale.setlocale(locale.LC_CTYPE, '')
'Hindi_India.1252'
>>> ucrt.___lc_codepage_func()
1252
I suppose this is for backwards compatibility. Windows 10 at least supports
setting the system ANSI codepage to UTF-8, or overriding the process ANSI
codepage to UTF-8 via the application manifest "actveCodePage" setting. For the
latter, I modified the manifest in a "python_utf8.exe" copy of the normal
"python.exe" binary, which is simpler than having to reboot to change the
system locale:
C:\>python_utf8 -q
>>> import locale
>>> locale.setlocale(locale.LC_CTYPE, '')
'Hindi_India.utf8'
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue36792>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com