Eryk Sun <eryk...@gmail.com> added the comment:

> By default, the output of cmd is encoded with the "active" 
> codepage. In Python 3.6, you can decode this using 
> encoding='oem'.

FYI, the actual encoding is not necessarily "oem".

The console codepage may have been changed from the initial value by a 
SetConsoleCP call in the current process or another process (e.g. chcp.com, 
mode.com). For example, a batch script can switch to codepage 65001 to allow 
CMD to read a UTF-8 encoded batch file; or read UTF-8 from an external command 
in a `for /f` loop; or write UTF-8 to a disk file or pipe. 

(Only switch to codepage 65001 temporarily. Using UTF-8 for legacy console I/O 
is buggy. CMD, PowerShell, and Python 3.6+ aren't affected since they use the 
wide-character API for console I/O. But a legacy console application that uses 
the codepage implicitly with ReadFile and WriteFile for byte-based I/O may get 
invalid results such as reading a non-ASCII character as NUL, or the entire 
read failing, or writing garbage to the console after output that contains 
non-ASCII characters.)

To accommodate applications that use the current console codepage for standard 
I/O, Python could add two encodings that correspond to the current value of 
GetConsoleCP and GetConsoleOutputCP (e.g. named "conin" and "conout"). 

Additionally, we can't assume the console codepage is initially OEM. It depends 
on settings in the registry or the shell shortcut for the application that 
allocated the console. In particular, if a new console window is allocated by a 
process (either explicitly via AllocConsole or implicitly for a console app 
that either hasn't inherited a console or was created with the 
CREATE_NEW_CONSOLE or CREATE_NO_WINDOW creation flag), then the console loads 
custom settings from either the registry key "HKCU\Console\<window title>" or 
the shell shortcut (LNK file) that started the application. 

If the console uses the window-title registry key, it looks for a "CodePage" 
DWORD value. The key name is the normalized window title, which comes from the 
WindowTitle field of the process parameters. This can be set explicitly using 
the STARTUPINFO lpTitle field that's passed to CreateProcess. Otherwise the 
system uses the executable path as the default window title. The console 
normalizes the title string to create a valid key name by replacing backslash 
with underscore, and it also substitutes "%SystemRoot%" for the Windows 
directory, e.g. the default configuration key for CMD is 
"HKCU\Console\%SystemRoot%_system32_cmd.exe". 

The codepage can also be set in a shell shortcut (LNK file) [1]. When an 
application is started from a shell shortcut, the shell sets the STARTUPINFO 
flag STARTF_TITLEISLINKNAME and the lpTitle string to the fully-qualified path 
of the LNK file. In this case, the console reads the LNK file to load its 
settings, rather than using the window-title subkey in the registry. But the 
"HKCU\Console" root key is still used for the default settings.

Finally, if CMD is run without a console (i.e. using the DETACHED_PROCESS 
creation flag), the default codepage is ANSI, not OEM. This isn't hard-coded in 
CMD. It happens that GetConsoleCP returns 0 (i.e. CP_ACP) in this case.

[1]: https://msdn.microsoft.com/en-us/library/dd891330.aspx

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33780>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to