Eryk Sun added the comment:
I'm closing this issue since Python's encodings in this case -- 852 (OEM) and
1250 (ANSI) -- both correctly map U+0159:
>>> u'\u0159'.encode('852')
'\xfd'
>>> u'\u0159'.encode('1250')
'\xf8'
You must be using an encoding that doesn't map U+0159. If you're using the
console's default codepage (i.e. you haven't run chcp.com, mode.com, or called
SetConsoleOutputCP), then Python started with stdout.encoding set to your
locale's OEM codepage encoding. For example, if you're using a U.S. locale,
it's cp437, and if you're using a Western Europe locale, it's cp850. Neither of
these includes U+0159.
We're presented with this codepage hell because the WriteFile and WriteConsoleA
functions write a stream of bytes to the console, and it needs to be told how
to decode these bytes to get Unicode text. It would be nice if the console's
UTF-8 implementation (codepage 65001) wasn't buggy, but Microsoft has never
cared enough to fix it (at least not completely; it's still broken for input in
Windows 10).
That leaves the wide-character UTF-16 function, WriteConsoleW, as the best
alternative. Using this function requires bypassing Python's normal standard
I/O implementation. This has been implemented as of 3.6. But for older versions
you'll need to install and enable win_unicode_console.
----------
nosy: +eryksun
stage: -> resolved
status: open -> closed
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue29907>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com