[issue1602] windows console doesn't print utf8 (Py30a2)

STINNER Victor Thu, 04 Nov 2010 08:10:09 -0700

STINNER Victor <[email protected]> added the comment:

I wrote a small function to call WriteConsoleOutputA() and  
WriteConsoleOutputW() in Python to do some tests. It works correclty, except if 
I change the code page using chcp command. It looks like the problem is that 
the chcp command changes the console code page and the ANSI code page, but it 
should only changes the ANSI code page (and not the console code page).



chcp command
============

The chcp command changes the console code page, but in practice, the console 
still expects the OEM code page (eg. cp850 on my french setup). Example:

C:\...> python.exe -c "import sys; print(sys.stdout.encoding")
cp850
C:\...> chcp 65001
C:\...> python.exe
Fatal Python error: Py_Initialize: can't initialize sys standard streams
LookupError: unknown encoding: cp65001
C:\...> SET PYTHONIOENCODING=utf-8
C:\...> python.exe
>>> import sys
>>> sys.stdout.write("\xe9\n")
Ã©
2
>>> sys.stdout.buffer.write("\xe9\n".encode("utf8"))
Ã©
3
>>> sys.stdout.buffer.write("\xe9\n".encode("cp850"))
é
2

os.device_encoding(1) uses GetConsoleOutputCP() which gives 65001. It should 
maybe use GetOEMCP() instead? Or chcp command should be fixed?

Set the console code page looks to be a bad idea, because if I type "é" using 
my keyboard, a random character (eg. U+0002) is displayed instead...


WriteConsoleOutputA() and WriteConsoleOutputW()
===============================================

Without touching the code page
------------------------------

If the character can be rendered by the current font (eg. U+00E9): 
WriteConsoleOutputA() and WriteConsoleOutputW() work correctly.

If the character cannot be rendered by the current font, but there is a 
replacment character (eg. U+0141 replaced by U+0041): WriteConsoleOutputA() 
cannot be used (U+0141 cannot be encoded to the code page), 
WriteConsoleOutputW() writes U+0141 but the console contains U+0041 (I checked 
using ReadConsoleOutputW()) and U+0041 is displayed. It works like the mbcs 
encoding, the behaviour looks correct.

If the character cannot be rendered by the current font, but there is a 
replacment character (eg. U+042D): WriteConsoleOutputA() cannot be used (U+042D 
cannot be encoded to the code page), WriteConsoleOutputW() writes U+042D but 
U+003d (?) is displayed instead. The behaviour looks correct.

chcp 65001
----------

Using "chcp 65001" command (+ "set PYTHONIOENCODING=utf-8" to avoid the fatal 
error), it becomes worse: the result depends on the font...

Using raster font:
 - (ANSI) write "\xe9".encode("cp850") using WriteConsoleOutputA() displays 
U+00e9 (é), whereas the console output code page is cp65001 (I checked using 
GetConsoleOutputCP())
 - (ANSI) write "\xe9".encode("utf-8") using WriteConsoleOutputA() displays Ã© 
(mojibake!)
 - (UNICODE) write "\xe9" using WriteConsoleOutputW() displays... a random 
character (U+0002, U+0008, U+0069, U+00b0, ...)

Using Lucida (TrueType font): 
 - (ANSI) write "\xe9".encode("cp850") using WriteConsoleOutputA() displays 
U+0000 !?
 - (UNICODE) write "\xe9" using WriteConsoleOutputW() works correctly (display 
U+00e9), even with "\u0141", it works correctly (display U+0141)

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue1602>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1602] windows console doesn't print utf8 (Py30a2)

Reply via email to