Eryk Sun <eryk...@gmail.com> added the comment:

> PS > [System.Console]::InputEncoding = $OutputEncoding

If changing the console input codepage to UTF-8 fixes the mojibake problem, 
then probably you're running Python in UTF-8 mode. pydoc.tempfilepager() 
encodes the temporary file with the preferred encoding, which normally would 
not be UTF-8. There are possible variations in how your system and the console 
are configured, so I can't say for sure.

tempfilepager() could temporarily set the console's input codepage to UTF-8 via 
SetConsoleCP(65001). However, if python.exe is terminated or crashes before it 
can reset the codepage, the console will be left in a bad state. By bad state, 
I mean that leaving the input code page set to UTF-8 is broken. Legacy console 
applications rely on the input codepage for reading input via ReadFile() and 
ReadConsoleA(), but the console host (conhost.exe or openconsole.exe) doesn't 
support reading input as UTF-8. It simply replaces each non-ASCII character 
(i.e. characters that require 2-4 bytes as UTF-8) with a null byte, e.g. 
"abĀcd" is read as "ab\x00cd". 

If you think the risk of crashing is negligible, and the downside of breaking 
legacy applications in the console session is trivial, then paging with full 
Unicode support is easily possible. Implement _winapi.GetConsoleCP() and 
_winapi.SetConsoleCP(). Write UTF-8 text to the temporary file. Change the 
console input codepage to UTF-8 before spawning "more.com". Revert to the 
original input codepage in the finally block.

A more conservative fix would be to change tempfilepager() to encode the file 
using the console's current input codepage, GetConsoleCP(). At least there's no 
mojibake.

> PS > $OutputEncoding =  [System.Text.Encoding]::GetEncoding("UTF-8")

FYI, $OutputEncoding in PowerShell has nothing to do with the python.exe and 
more.com processes, nor the console session to which they're attached.

> PS > [System.Console]::OutputEncoding = $OutputEncoding

The console output code page is irrelevant since more.com writes wide-character 
text via WriteConsoleW() and decodes the file using the console input code 
page, GetConsoleCP(). The console output codepage from GetConsoleOutputCP() 
isn't used for anything here.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue44275>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to