Hi Unicode and locales lovers, tl; dr Nick, Ned, INADA-san: I modified 3.7.1 to add a new "-X coerce_c_locale=value" option and make sure that the C locale coercion cannot be when Python in embedded: are you ok with these changes?
Before 3.7.0 release, during the implementation of the UTF-8 Mode (PEP 540), I changed two things in Nick Coghlan's implementation of the C locale coercion (PEP 538): (1) PYTHONCOERCECLOCALE environment variable is now ignored when -E or -I command line option is used. (2) When Python is embeded, the C locale coercion is now enabled if the LC_CTYPE locale is "C". Nick asked me to change the behavior: https://bugs.python.org/issue34589 I just pushed this change in the 3.7 branch which adds a new "-X coerce_c_locale=value" option: https://github.com/python/cpython/commit/144f1e2c6f4a24bd288c045986842c65cc289684 Examples using Pyhon 3.7 (future 3.7.1) with UTF-8 Mode disabled, to only test the C locale coercion: --- $ cat test.py import codecs, locale enc = locale.getpreferredencoding() enc = codecs.lookup(enc).name print(enc) $ export LC_ALL= LC_CTYPE=C LANG= # Disable C locale coercion: get ASCII as expected $ PYTHONCOERCECLOCALE=0 ./python -X utf8=0 test.py ascii # -E ignores PYTHONCOERCECLOCALE=0: # C locale is coerced, we get UTF-8 $ PYTHONCOERCECLOCALE=0 ./python -E -X utf8=0 test.py utf-8 # -X coerce_c_locale=0 is not affected by -E: # C locale coercion disabled as expected, get ASCII as expected $ ./python -E -X utf8=0 -X coerce_c_locale=0 test.py ascii --- For (1), Nick's use case is to get Python 3.6 behavior (C locale not coerced) on Python 3.7 using PYTHONCOERCECLOCALE. Nick proposed to use PYTHONCOERCECLOCALE even with -E or -I, but I dislike introducing a special case for -E option. I chose to add a new "-X coerce_c_locale=0" to Python 3.7.1 to provide a solution for this use case. (Python 3.7.0 and older ignore this option.) Note: Python 3.7.0 is fine with PYTHONCOERCECLOCALE=0, we are only talking about the special case of -E and -I options. For (2), I modified Python 3.7.1 to make sure the C locale is never coerced when the C API is used to embed Python inside an application: Py_Initialize() and Py_Main(). The C locale can only be coerced by the official Python program ("python3.7"). I don't know if it should be possible to enable C locale coercion when Python is embedded. So I just made the change requested by Nick :-) I dislike doing such late changes in 3.7.1, especially since PEP 538 has been designed by Nick Coghlan, and we disagree on the fix. But Ned Deily, our Python 3.7 release manager, wants to see last 3.7 fixes merged before Tuesday, so here we are. Nick, Ned, INADA-san: are you ok with these changes? The other choices for 3.7.1 are: * Revert my change: C locale coercion can still be enabled when Python is embedded, -E option ignores PYTHONCOERCECLOCALE env var. * Revert my change and apply Nick's PR 9257: C locale coercion cannot be enabled when Python is embedded and -E option doesn't ignore PYTHONCOERCECLOCALE env var. I spent months to fix the master branch to support all possible locales and encodings, and get a consistent CLI: https://vstinner.github.io/python3-locales-encodings.html So I'm not excited by Nick's PR which IMHO moves Python backward, especially it breaks the -E option contract: it doesn't ignore PYTHONCOERCECLOCALE env var. Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com