2017-12-08 6:11 GMT+01:00 INADA Naoki <songofaca...@gmail.com>: > Or should we change loale.getpreferredencoding() to return UTF-8 > instead of ASCII always, regardless of PEP 538 and 540?
On the POSIX locale, if the locale coercion works (PEP 538), locale.getpreferredencoding() returns UTF-8. We are good. The question is for platforms like Centos 7 where the locale coercion (PEP 538) doesn't work and so Python uses UTF-8 (PEP 540), whereas the locale probably uses ASCII (or maybe Latin1). My current implementation of the PEP 540 is cheating for open(): if sys.flags.utf8_mode is non-zero, use the UTF-8 encoding rather than calling locale.getpreferredencoding(). I checked the stdlib, and I found many places where locale.getpreferredencoding() is used to get the user preferred encoding: * builtin open(): default encoding * cgi.FieldStorage: encode the query string * encoding._alias_mbcs(): check if the requested encoding is the ANSI code page * gettext.GNUTranslations: lgettext() and lngettext() methods * xml.etree.ElementTree: ElementTree.write(encoding='unicode') In the UTF-8 mode, I would expect that cgi, gettext and xml.etree all use the UTF-8 encoding by default. So locale.getpreferredencoding() should return UTF-8 if the UTF-8 mode is enabled. The private _alias_mbcs() method can be modified to call directly _locale._getdefaultlocale()[1] to get the ANSI code page. Question: do we need to add an option to getpreferredencoding() to return the locale encoding even if the UTF-8 mode is enabled. If yes, what should be the API? locale.getpreferredencoding(utf8_mode=False)? Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com