Eryk Sun <[email protected]> added the comment:
> PS > [System.Console]::InputEncoding = $OutputEncoding
If changing the console input codepage to UTF-8 fixes the mojibake problem,
then probably you're running Python in UTF-8 mode. pydoc.tempfilepager()
encodes the temporary file with the preferred encoding, which normally would
not be UTF-8. There are possible variations in how your system and the console
are configured, so I can't say for sure.
tempfilepager() could temporarily set the console's input codepage to UTF-8 via
SetConsoleCP(65001). However, if python.exe is terminated or crashes before it
can reset the codepage, the console will be left in a bad state. By bad state,
I mean that leaving the input code page set to UTF-8 is broken. Legacy console
applications rely on the input codepage for reading input via ReadFile() and
ReadConsoleA(), but the console host (conhost.exe or openconsole.exe) doesn't
support reading input as UTF-8. It simply replaces each non-ASCII character
(i.e. characters that require 2-4 bytes as UTF-8) with a null byte, e.g.
"abĀcd" is read as "ab\x00cd".
If you think the risk of crashing is negligible, and the downside of breaking
legacy applications in the console session is trivial, then paging with full
Unicode support is easily possible. Implement _winapi.GetConsoleCP() and
_winapi.SetConsoleCP(). Write UTF-8 text to the temporary file. Change the
console input codepage to UTF-8 before spawning "more.com". Revert to the
original input codepage in the finally block.
A more conservative fix would be to change tempfilepager() to encode the file
using the console's current input codepage, GetConsoleCP(). At least there's no
mojibake.
> PS > $OutputEncoding = [System.Text.Encoding]::GetEncoding("UTF-8")
FYI, $OutputEncoding in PowerShell has nothing to do with the python.exe and
more.com processes, nor the console session to which they're attached.
> PS > [System.Console]::OutputEncoding = $OutputEncoding
The console output code page is irrelevant since more.com writes wide-character
text via WriteConsoleW() and decodes the file using the console input code
page, GetConsoleCP(). The console output codepage from GetConsoleOutputCP()
isn't used for anything here.
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue44275>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com