STINNER Victor <[email protected]> added the comment:
I wrote a small function to call WriteConsoleOutputA() and
WriteConsoleOutputW() in Python to do some tests. It works correclty, except if
I change the code page using chcp command. It looks like the problem is that
the chcp command changes the console code page and the ANSI code page, but it
should only changes the ANSI code page (and not the console code page).
chcp command
============
The chcp command changes the console code page, but in practice, the console
still expects the OEM code page (eg. cp850 on my french setup). Example:
C:\...> python.exe -c "import sys; print(sys.stdout.encoding")
cp850
C:\...> chcp 65001
C:\...> python.exe
Fatal Python error: Py_Initialize: can't initialize sys standard streams
LookupError: unknown encoding: cp65001
C:\...> SET PYTHONIOENCODING=utf-8
C:\...> python.exe
>>> import sys
>>> sys.stdout.write("\xe9\n")
é
2
>>> sys.stdout.buffer.write("\xe9\n".encode("utf8"))
é
3
>>> sys.stdout.buffer.write("\xe9\n".encode("cp850"))
é
2
os.device_encoding(1) uses GetConsoleOutputCP() which gives 65001. It should
maybe use GetOEMCP() instead? Or chcp command should be fixed?
Set the console code page looks to be a bad idea, because if I type "é" using
my keyboard, a random character (eg. U+0002) is displayed instead...
WriteConsoleOutputA() and WriteConsoleOutputW()
===============================================
Without touching the code page
------------------------------
If the character can be rendered by the current font (eg. U+00E9):
WriteConsoleOutputA() and WriteConsoleOutputW() work correctly.
If the character cannot be rendered by the current font, but there is a
replacment character (eg. U+0141 replaced by U+0041): WriteConsoleOutputA()
cannot be used (U+0141 cannot be encoded to the code page),
WriteConsoleOutputW() writes U+0141 but the console contains U+0041 (I checked
using ReadConsoleOutputW()) and U+0041 is displayed. It works like the mbcs
encoding, the behaviour looks correct.
If the character cannot be rendered by the current font, but there is a
replacment character (eg. U+042D): WriteConsoleOutputA() cannot be used (U+042D
cannot be encoded to the code page), WriteConsoleOutputW() writes U+042D but
U+003d (?) is displayed instead. The behaviour looks correct.
chcp 65001
----------
Using "chcp 65001" command (+ "set PYTHONIOENCODING=utf-8" to avoid the fatal
error), it becomes worse: the result depends on the font...
Using raster font:
- (ANSI) write "\xe9".encode("cp850") using WriteConsoleOutputA() displays
U+00e9 (é), whereas the console output code page is cp65001 (I checked using
GetConsoleOutputCP())
- (ANSI) write "\xe9".encode("utf-8") using WriteConsoleOutputA() displays é
(mojibake!)
- (UNICODE) write "\xe9" using WriteConsoleOutputW() displays... a random
character (U+0002, U+0008, U+0069, U+00b0, ...)
Using Lucida (TrueType font):
- (ANSI) write "\xe9".encode("cp850") using WriteConsoleOutputA() displays
U+0000 !?
- (UNICODE) write "\xe9" using WriteConsoleOutputW() works correctly (display
U+00e9), even with "\u0141", it works correctly (display U+0141)
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue1602>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com