On 11/13/22, Jessica Smith <12jessicasmit...@gmail.com> wrote: > Consider the following code ran in Powershell or cmd.exe: > > $ python -c "print('└')" > └ > > $ python -c "print('└')" > test_file.txt > Traceback (most recent call last): > File "<string>", line 1, in <module> > File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 19, in > encode > return codecs.charmap_encode(input,self.errors,encoding_table)[0] > UnicodeEncodeError: 'charmap' codec can't encode character '\u2514' in > position 0: character maps to <undefined>
If your applications and existing data files are compatible with using UTF-8, then in Windows 10+ you can modify the administrative regional settings in the control panel to force using UTF-8. In this case, GetACP() and GetOEMCP() will return CP_UTF8 (65001), and the reserved code page constants CP_ACP (0), CP_OEMCP (1), CP_MACCP (2), and CP_THREAD_ACP (3) will use CP_UTF8. You can override this on a per-application basis via the ActiveCodePage setting in the manifest: https://learn.microsoft.com/en-us/windows/win32/sbscs/application-manifests#activecodepage In Windows 10, this setting only supports "UTF-8". In Windows 11, it also supports "legacy" to allow old applications to run on a system that's configured to use UTF-8. Setting an explicit locale is also supported in Windows 11, such as "en-US", with fallback to UTF-8 if the given locale has no legacy code page. Note that setting the system to use UTF-8 also affects the host process for console sessions (i.e. conhost.exe or openconsole.exe), since it defaults to using the OEM code page (UTF-8 in this case). Unfortunately, a legacy read from the console host does not support reading non-ASCII text as UTF-8. For example: >>> os.read(0, 6) SPĀM b'SP\x00M\r\n' This is a trivial bug in the console host, which stems from the fact that UTF-8 is a multibyte encoding (1-4 bytes per code), but for some reason the console team at Microsoft still hasn't fixed it. You can use chcp.com to set the console's input and output code pages to something other than UTF-8 if you have to read non-ASCII input in a legacy console app. By default, this problem doesn't affect Python's sys.stdin, which internally uses wide-character ReadConsoleW() with the system's native text encoding, UTF-16LE. -- https://mail.python.org/mailman/listinfo/python-list