Eryk Sun added the comment:

I'm closing this issue since Python's encodings in this case -- 852 (OEM) and 
1250 (ANSI) -- both correctly map U+0159:

    >>> u'\u0159'.encode('852')
    '\xfd'
    >>> u'\u0159'.encode('1250')
    '\xf8'

You must be using an encoding that doesn't map U+0159. If you're using the 
console's default codepage (i.e. you haven't run chcp.com, mode.com, or called 
SetConsoleOutputCP), then Python started with stdout.encoding set to your 
locale's OEM codepage encoding. For example, if you're using a U.S. locale, 
it's cp437, and if you're using a Western Europe locale, it's cp850. Neither of 
these includes U+0159.

We're presented with this codepage hell because the WriteFile and WriteConsoleA 
functions write a stream of bytes to the console, and it needs to be told how 
to decode these bytes to get Unicode text. It would be nice if the console's 
UTF-8 implementation (codepage 65001) wasn't buggy, but Microsoft has never 
cared enough to fix it (at least not completely; it's still broken for input in 
Windows 10). 

That leaves the wide-character UTF-16 function, WriteConsoleW, as the best 
alternative. Using this function requires bypassing Python's normal standard 
I/O implementation. This has been implemented as of 3.6. But for older versions 
you'll need to install and enable win_unicode_console.

----------
nosy: +eryksun
stage:  -> resolved
status: open -> closed

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue29907>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to