[issue22555] Tracking issue for adjustments to binary/text boundary handling

Steve Dower Mon, 16 Nov 2015 10:27:47 -0800

Steve Dower added the comment:

Right now all of the tests fail on Windows by default (cp437 for me).


If I change the default IO encoding to utf-8 (hacked into pylifecycle.c, since 
PYTHONIOENCODING is ignored by subprocesses using -E), the four "Misconfigured" 
tests crash at the os.fsencode() call (as "mbcs:strict" cannot encode the 
characters - this may be a real issue, haven't dug into it yet).

Adding more hacks to get past this point brings me back into the ASCII encoding 
performed by the test, and I'm not sure whether that's just an incorrect 
assumption for Windows or not.


Separate issue: if I run "chcp 437" before the tests, the output is garbage. If 
I run "chcp 65001" then it shows the characters in the font correctly. The std 
streams encoding is taken from this value, but it doesn't map back to UTF-8, 
which is probably another issue. If I add a separate check in fileutils.c at 
_Py_device_encoding then I get UTF-8 enabled streams when the console is set 
for cp65001.

However, there are still a number of places that use GetACP() to determine the 
locale and encoding to use, which is incorrect for Unicode-aware programs. In 
particular, this should not happen:

>>> f=open('test.txt', 'w')
>>> f.encoding
'cp1252'

There's no good reason for the default encoding to not be UTF-8 these days, but 
this is a much bigger change. It's probably worth doing for 3.6, but may need 
more discussion...

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue22555>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue22555] Tracking issue for adjustments to binary/text boundary handling

Reply via email to