Eryk Sun added the comment:

I'm closing this issue since Python's encodings in this case -- 852 (OEM) and 
1250 (ANSI) -- both correctly map U+0159:

    >>> u'\u0159'.encode('852')
    >>> u'\u0159'.encode('1250')

You must be using an encoding that doesn't map U+0159. If you're using the 
console's default codepage (i.e. you haven't run,, or called 
SetConsoleOutputCP), then Python started with stdout.encoding set to your 
locale's OEM codepage encoding. For example, if you're using a U.S. locale, 
it's cp437, and if you're using a Western Europe locale, it's cp850. Neither of 
these includes U+0159.

We're presented with this codepage hell because the WriteFile and WriteConsoleA 
functions write a stream of bytes to the console, and it needs to be told how 
to decode these bytes to get Unicode text. It would be nice if the console's 
UTF-8 implementation (codepage 65001) wasn't buggy, but Microsoft has never 
cared enough to fix it (at least not completely; it's still broken for input in 
Windows 10). 

That leaves the wide-character UTF-16 function, WriteConsoleW, as the best 
alternative. Using this function requires bypassing Python's normal standard 
I/O implementation. This has been implemented as of 3.6. But for older versions 
you'll need to install and enable win_unicode_console.

nosy: +eryksun
stage:  -> resolved
status: open -> closed

Python tracker <>
Python-bugs-list mailing list

Reply via email to