I updated my PEP: in the 4th version, locale.getpreferredencoding() now returns 'UTF-8' in the UTF-8 Mode.
https://www.python.org/dev/peps/pep-0540/ I also clarified the direct effects of the UTF-8 Mode, but also listed the most user visible changes as "Side effects". """ Effects of the UTF-8 Mode: * ``sys.getfilesystemencoding()`` returns ``'UTF-8'``. * ``locale.getpreferredencoding()`` returns ``UTF-8``, its *do_setlocale* argument and the locale encoding are ignored. * ``sys.stdin`` and ``sys.stdout`` error handler is set to ``surrogateescape`` Side effects: * ``open()`` uses the UTF-8 encoding by default. * ``os.fsdecode()`` and ``os.fsencode()`` use the UTF-8 encoding. * Command line arguments, environment variables and filenames use the UTF-8 encoding. """ Thank you Naokia INADA for your quick feedback, it was very helpful and I really like how the PEP evolves! IMHO the PEP 540 version 4 is just perfect and ready for pronouncement! (... until someone finds another flaw, obviously!) Victor 2017-12-08 13:58 GMT+01:00 Victor Stinner <victor.stin...@gmail.com>: > 2017-12-08 6:11 GMT+01:00 INADA Naoki <songofaca...@gmail.com>: >> Or should we change loale.getpreferredencoding() to return UTF-8 >> instead of ASCII always, regardless of PEP 538 and 540? > > On the POSIX locale, if the locale coercion works (PEP 538), > locale.getpreferredencoding() returns UTF-8. We are good. > > The question is for platforms like Centos 7 where the locale coercion > (PEP 538) doesn't work and so Python uses UTF-8 (PEP 540), whereas the > locale probably uses ASCII (or maybe Latin1). > > My current implementation of the PEP 540 is cheating for open(): if > sys.flags.utf8_mode is non-zero, use the UTF-8 encoding rather than > calling locale.getpreferredencoding(). > > I checked the stdlib, and I found many places where > locale.getpreferredencoding() is used to get the user preferred > encoding: > > * builtin open(): default encoding > * cgi.FieldStorage: encode the query string > * encoding._alias_mbcs(): check if the requested encoding is the ANSI code > page > * gettext.GNUTranslations: lgettext() and lngettext() methods > * xml.etree.ElementTree: ElementTree.write(encoding='unicode') > > In the UTF-8 mode, I would expect that cgi, gettext and xml.etree all > use the UTF-8 encoding by default. So locale.getpreferredencoding() > should return UTF-8 if the UTF-8 mode is enabled. > > The private _alias_mbcs() method can be modified to call directly > _locale._getdefaultlocale()[1] to get the ANSI code page. > > Question: do we need to add an option to getpreferredencoding() to > return the locale encoding even if the UTF-8 mode is enabled. If yes, > what should be the API? locale.getpreferredencoding(utf8_mode=False)? > > Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com