STINNER Victor added the comment:

The support of the code page 65001 (CP_UTF8, "cp65001") was added in Python 
3.3. It is usually used for the OEM code page. The chcp command changes the 
Windows console encoding which is used by sys.{stdin,stdout,stderr).encoding. 
locale.getpreferredencoding() is the ANSI code page.

Read also:
http://unicodebook.readthedocs.org/operating_systems.html#code-pages
http://unicodebook.readthedocs.org/programming_languages.html#windows

> cp65001 is purported to be an alias for utf8.

No, cp65001 is not an alias of utf8: it handles surrogate characters 
differently. The behaviour of CP_UTF8 depends on the flags and the Windows 
version.

If you really want to use the UTF-8 codec: force the stdio encoding using 
PYTHONIOENCODING envrionment variable:
https://docs.python.org/dev/using/cmdline.html#envvar-PYTHONIOENCODING

Setting the Windows console encoding to cp65001 using the chcp command doesn't 
make the Windows console fully Unicode compliant. It is a little bit better 
using TTF fonts, but it's not enough. See the old issue #1602 opened 7 years 
ago and not fixed yet.

Backporting the cp65001 codec requires too many changes in the codec code. I 
made these changes between Python 3.1 and 3.3, I don't want to redo them in 
Python 2.7 because it may break backward compatibility. For example, in Python 
3.3, the "strict" mode really means "strict", whereas in Python 2.7, code page 
codecs use the default flags which is not strict. See:
http://unicodebook.readthedocs.org/operating_systems.html#encode-and-decode-functions

So I'm in favor of closing the issue as "wont fix". The fix is to upgrade to 
Python 3!

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue21808>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to