Eryk Sun <eryk...@gmail.com> added the comment: > PS > [System.Console]::InputEncoding = $OutputEncoding
If changing the console input codepage to UTF-8 fixes the mojibake problem, then probably you're running Python in UTF-8 mode. pydoc.tempfilepager() encodes the temporary file with the preferred encoding, which normally would not be UTF-8. There are possible variations in how your system and the console are configured, so I can't say for sure. tempfilepager() could temporarily set the console's input codepage to UTF-8 via SetConsoleCP(65001). However, if python.exe is terminated or crashes before it can reset the codepage, the console will be left in a bad state. By bad state, I mean that leaving the input code page set to UTF-8 is broken. Legacy console applications rely on the input codepage for reading input via ReadFile() and ReadConsoleA(), but the console host (conhost.exe or openconsole.exe) doesn't support reading input as UTF-8. It simply replaces each non-ASCII character (i.e. characters that require 2-4 bytes as UTF-8) with a null byte, e.g. "abĀcd" is read as "ab\x00cd". If you think the risk of crashing is negligible, and the downside of breaking legacy applications in the console session is trivial, then paging with full Unicode support is easily possible. Implement _winapi.GetConsoleCP() and _winapi.SetConsoleCP(). Write UTF-8 text to the temporary file. Change the console input codepage to UTF-8 before spawning "more.com". Revert to the original input codepage in the finally block. A more conservative fix would be to change tempfilepager() to encode the file using the console's current input codepage, GetConsoleCP(). At least there's no mojibake. > PS > $OutputEncoding = [System.Text.Encoding]::GetEncoding("UTF-8") FYI, $OutputEncoding in PowerShell has nothing to do with the python.exe and more.com processes, nor the console session to which they're attached. > PS > [System.Console]::OutputEncoding = $OutputEncoding The console output code page is irrelevant since more.com writes wide-character text via WriteConsoleW() and decodes the file using the console input code page, GetConsoleCP(). The console output codepage from GetConsoleOutputCP() isn't used for anything here. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue44275> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com