Eryk Sun <eryk...@gmail.com> added the comment:
In Windows, Python defaults to the system ANSI codepage (e.g. 1252 in the West) for non-console standard I/O. For the case of a `for /f` loop in CMD, stdout is a pipe, so Python defaults to writing ANSI encoded text to its end of the pipe. I recommend overriding the encoding to UTF-8 using the PYTHONIOENCODING environment variable. CMD uses the console's output codepage to decode bytes read from its end of the pipe, so the batch script should temporarily change the console codepage to UTF-8 via `chcp.com 65001`. Note that this won't work if CMD is running without a console (i.e. a DETACHED_PROCESS), in which case it defaults to ANSI. (I don't recommend running without a console. If no window is required, use CREATE_NO_WINDOW or a hidden window instead.) First save the current console codepage, parsed from the output of running `chcp.com` without arguments. Then you can restore the original console codepage after the loop. After decoding the text, CMD's `echo` command writes to the console using the wide-character WriteConsoleW function, so there's no problem at this stage -- up to the limits of the console's text support. FYI, in lieu of Python getting the blame for this too, the Windows console can only render Basic Multilingual Plane (i.e. UCS-2) text, and it doesn't support automatic font fallback or complex scripts. If the console can't display a character, it displays the font's default glyph (e.g. an empty rectangle), or two default glyphs for a surrogate pair. However, we can still copy text from the console in this case. ---------- resolution: -> not a bug stage: -> resolved status: open -> closed type: -> behavior _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue35149> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com