STINNER Victor added the comment:
The support of the code page 65001 (CP_UTF8, "cp65001") was added in Python
3.3. It is usually used for the OEM code page. The chcp command changes the
Windows console encoding which is used by sys.{stdin,stdout,stderr).encoding.
locale.getpreferredencoding() is the ANSI code page.
Read also:
http://unicodebook.readthedocs.org/operating_systems.html#code-pages
http://unicodebook.readthedocs.org/programming_languages.html#windows
> cp65001 is purported to be an alias for utf8.
No, cp65001 is not an alias of utf8: it handles surrogate characters
differently. The behaviour of CP_UTF8 depends on the flags and the Windows
version.
If you really want to use the UTF-8 codec: force the stdio encoding using
PYTHONIOENCODING envrionment variable:
https://docs.python.org/dev/using/cmdline.html#envvar-PYTHONIOENCODING
Setting the Windows console encoding to cp65001 using the chcp command doesn't
make the Windows console fully Unicode compliant. It is a little bit better
using TTF fonts, but it's not enough. See the old issue #1602 opened 7 years
ago and not fixed yet.
Backporting the cp65001 codec requires too many changes in the codec code. I
made these changes between Python 3.1 and 3.3, I don't want to redo them in
Python 2.7 because it may break backward compatibility. For example, in Python
3.3, the "strict" mode really means "strict", whereas in Python 2.7, code page
codecs use the default flags which is not strict. See:
http://unicodebook.readthedocs.org/operating_systems.html#encode-and-decode-functions
So I'm in favor of closing the issue as "wont fix". The fix is to upgrade to
Python 3!
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue21808>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com