[issue29907] Unicode encoding failure

2017-03-26 Thread Eryk Sun

Eryk Sun added the comment:

I'm closing this issue since Python's encodings in this case -- 852 (OEM) and 
1250 (ANSI) -- both correctly map U+0159:

>>> u'\u0159'.encode('852')
'\xfd'
>>> u'\u0159'.encode('1250')
'\xf8'

You must be using an encoding that doesn't map U+0159. If you're using the 
console's default codepage (i.e. you haven't run chcp.com, mode.com, or called 
SetConsoleOutputCP), then Python started with stdout.encoding set to your 
locale's OEM codepage encoding. For example, if you're using a U.S. locale, 
it's cp437, and if you're using a Western Europe locale, it's cp850. Neither of 
these includes U+0159.

We're presented with this codepage hell because the WriteFile and WriteConsoleA 
functions write a stream of bytes to the console, and it needs to be told how 
to decode these bytes to get Unicode text. It would be nice if the console's 
UTF-8 implementation (codepage 65001) wasn't buggy, but Microsoft has never 
cared enough to fix it (at least not completely; it's still broken for input in 
Windows 10). 

That leaves the wide-character UTF-16 function, WriteConsoleW, as the best 
alternative. Using this function requires bypassing Python's normal standard 
I/O implementation. This has been implemented as of 3.6. But for older versions 
you'll need to install and enable win_unicode_console.

--
nosy: +eryksun
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29907] Unicode encoding failure

2017-03-26 Thread STINNER Victor

STINNER Victor added the comment:

For Python 2, there is https://pypi.python.org/pypi/win_unicode_console

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29907] Unicode encoding failure

2017-03-26 Thread Paul Moore

Paul Moore added the comment:

Also, you need to:

1. Ensure you are using characters that are available in the encoding that 
sys.stdout uses - in Python prior to 3.6, this would be your Windows *console* 
code page, and in 3.6+ would be UTF-8.
2. Declare the encoding of your source code if you are not using the default 
(which is ASCII in Python 2, and UTF-8 in Python 3).

Specifically, if you write your source in UTF-8, or use an encoding declaration 
or \u escapes, and you use Python 3.6, this problem will likely have gone away.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29907] Unicode encoding failure

2017-03-26 Thread Martin Panter

Martin Panter added the comment:

I presume you are trying to print to the normal Windows console. I understand 
the console was not well supported until Python 3.6 (see Issue 1602). Have you 
tried that version?

I’ll leave this open for someone more experienced to confirm, but I suspect 
what you want may not be possible with 2.7.

--
components: +Unicode, Windows
nosy: +ezio.melotti, haypo, martin.panter, paul.moore, steve.dower, tim.golden, 
zach.ware
resolution:  -> out of date
superseder:  -> windows console doesn't print or input Unicode

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29907] Unicode encoding failure

2017-03-25 Thread Robert Baker

New submission from Robert Baker:

Using Python 2.7 (not IDLE) on Windows 10.

I have tried to use a Python 2.7 program to print the name of Czech composer 
Antonín Dvořák. I remembered to add the "u" before the string, but regardless 
of whether I encode the caron-r as a literal character (pasted from Windows 
Character Map) or as \u0159, it gives the error that character 0159 is 
undefined. This is incorrect; that character has been defined as "lower case r 
with caron above" for several years now. (The interpreter has no problem with 
the ANSI characters in the string.)

--
messages: 290503
nosy: Robert Baker
priority: normal
severity: normal
status: open
title: Unicode encoding failure
type: behavior
versions: Python 2.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com