Eryk Sun <eryk...@gmail.com> added the comment:
> I don't think that this fallback is needed anymore. Which Windows > code page can be used as ANSI code page which is not already > implemented as a Python codec? Python has full coverage of the ANSI and OEM code pages in the standard Windows locales, but I don't have any experience with custom (i.e. supplemental or replacement) locales. https://docs.microsoft.com/en-us/windows/win32/intl/custom-locales Here's a simple script to check the standard locales. import codecs import ctypes kernel32 = ctypes.WinDLL('kernel32', use_last_error=True) LOCALE_ALL = 0 LOCALE_WINDOWS = 1 LOCALE_IDEFAULTANSICODEPAGE = 0x1004 LOCALE_IDEFAULTCODEPAGE = 0x000B # OEM EnumSystemLocalesEx = kernel32.EnumSystemLocalesEx GetLocaleInfoEx = kernel32.GetLocaleInfoEx GetCPInfoExW = kernel32.GetCPInfoExW EnumLocalesProcEx = ctypes.WINFUNCTYPE(ctypes.c_int, ctypes.c_wchar_p, ctypes.c_ulong, ctypes.c_void_p) class CPINFOEXW(ctypes.Structure): _fields_ = (('MaxCharSize', ctypes.c_uint), ('DefaultChar', ctypes.c_ubyte * 2), ('LeadByte', ctypes.c_ubyte * 12), ('UnicodeDefaultChar', ctypes.c_wchar), ('CodePage', ctypes.c_uint), ('CodePageName', ctypes.c_wchar * 260)) def get_all_locale_code_pages(): result = [] seen = set() info = (ctypes.c_wchar * 100)() @EnumLocalesProcEx def callback(locale, flags, param): for lctype in (LOCALE_IDEFAULTANSICODEPAGE, LOCALE_IDEFAULTCODEPAGE): if (GetLocaleInfoEx(locale, lctype, info, len(info)) and info.value not in ('0', '1')): cp = int(info.value) if cp in seen: continue seen.add(cp) cp_info = CPINFOEXW() if not GetCPInfoExW(cp, 0, ctypes.byref(cp_info)): cp_info.CodePage = cp cp_info.CodePageName = str(cp) result.append(cp_info) return True if not EnumSystemLocalesEx(callback, LOCALE_WINDOWS, None, None): raise ctypes.WinError(ctypes.get_last_error()) result.sort(key=lambda x: x.CodePage) return result supported = [] unsupported = [] for cp_info in get_all_locale_code_pages(): cp = cp_info.CodePage try: codecs.lookup(f'cp{cp}') except LookupError: unsupported.append(cp_info) else: supported.append(cp_info) if unsupported: print('Unsupported:\n') for cp_info in unsupported: print(cp_info.CodePageName) print('\nSupported:\n') else: print('All Supported:\n') for cp_info in supported: print(cp_info.CodePageName) Output: All Supported: 437 (OEM - United States) 720 (Arabic - Transparent ASMO) 737 (OEM - Greek 437G) 775 (OEM - Baltic) 850 (OEM - Multilingual Latin I) 852 (OEM - Latin II) 855 (OEM - Cyrillic) 857 (OEM - Turkish) 862 (OEM - Hebrew) 866 (OEM - Russian) 874 (ANSI/OEM - Thai) 932 (ANSI/OEM - Japanese Shift-JIS) 936 (ANSI/OEM - Simplified Chinese GBK) 949 (ANSI/OEM - Korean) 950 (ANSI/OEM - Traditional Chinese Big5) 1250 (ANSI - Central Europe) 1251 (ANSI - Cyrillic) 1252 (ANSI - Latin I) 1253 (ANSI - Greek) 1254 (ANSI - Turkish) 1255 (ANSI - Hebrew) 1256 (ANSI - Arabic) 1257 (ANSI - Baltic) 1258 (ANSI/OEM - Viet Nam) Some locales are Unicode only (e.g. Hindi-India) or have no OEM code page, which the above code skips by checking for "0" or "1" as the code page value. Windows 10+ allows setting the system locale to a Unicode-only locale, for which it uses UTF-8 (65001) for ANSI and OEM. The OEM code page matters because the console input and output code pages default to OEM, e.g. for os.device_encoding(). The console's I/O code pages are used in Python by low-level os.read() and os.write(). Note that the console doesn't properly implement using UTF-8 (65001) as the input code page. In this case, input read from the console via ReadFile() or ReadConsoleA() has a null byte in place of each non-ASCII character. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue46668> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com