Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale
Hi, Nick. After thinking about relationship between PEP 538 and 540 in two days, I came up with idea which removes locale coercion by default from PEP 538, it does just enables UTF-8 mode and show warning about C locale. Of course, this idea is based on PEP 540. There are no "If PEP 540 is rejected". How do you think? If it make sense, I want to postpone PEP 538 until PEP 540 is accepted or rejected, or merge PEP 538 into PEP 540. ## Background Locale coercion in current PEP 538 has some downsides: * If user set `LANG=C LC_DATE=ja_JP.UTF-8`, locale coercion may overrides LC_DATE. * It makes behavior divergence between standalone and embedded Python. * Parent Python process may use utf-8:surrogateescape, but child process Python may use utf-8:strict. (Python 3.6 uses ascii:surrogateescape in both of parent and children). On the other hand, benefits from locale coercion is restricted: * When locale coercion succeeds, warning is always shown. To hide the warning, user must disable coercion in some way. (e.g. use UTF-8 locale explicitly, or set PYTHONCOERCECLOCALE=0). So I feel benefit / complexity ratio of locale coercion is less than UTF-8 mode. But locale coercion works nice on Android. And there are some Android-like Unix systems (container or small device) that C.UTF-8 is always proper locale. ## Rough spec * Make Android-style locale coercion (forced, no warning) is now build option. Some users who build Python for container or small device may like it. * Normal Python build doesn't change locale. When python executable is run in C locale, show locale warning. locale warning can be disabled as current PEP 538. * User can disable automatic UTF-8 mode by setting PYTHONUTF8=0 environment variables. User can hide warning by setting PYTHONUTF8=1 too. On Fri, May 5, 2017 at 10:21 PM, INADA Naokiwrote: > On Fri, May 5, 2017 at 1:25 AM, Antoine Pitrou wrote: >> On Thu, 4 May 2017 11:24:27 +0900 >> INADA Naoki wrote: >>> Hi, Nick and all core devs who are interested in this PEP. >>> >>> I'm reviewing PEP 538 and I want to accept it in this month. >>> It will reduces much UnicodeError pains which server-side OPs facing. >>> Thank you Nick for working on this PEP. >>> >>> If you have something worrying about this PEP, please post a comment >>> soon. If you don't have enough time to read entire this PEP, feel free to >>> ask a question about you're worrying. >> >> From my POV, it is problematic that the behaviour outlined in PEP 538 >> (see Abstract section) varies depending on the adoption of another PEP >> (PEP 540). >> >> If we want to adopt PEP 538 before pronouncing on PEP 540, then PEP 538 >> should remove all points conditional on PEP 540 adoption, and PEP 540 >> should later be changed to adopt those removed points as PEP >> 540-specific changes. >> >> Regards >> >> Antoine. >> > > Fair enough. I stop hurrying about PEP 538 and start reviewing PEP 540. > > Thanks, ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale
On 6 May 2017 at 18:33, Nick Coghlanwrote: > On 6 May 2017 at 18:00, Nick Coghlan wrote: >> On 5 March 2017 at 17:50, Nick Coghlan wrote: >>> Hi folks, >>> >>> Late last year I started working on a change to the CPython CLI (*not* the >>> shared library) to get it to coerce the legacy C locale to something based >>> on UTF-8 when a suitable locale is available. >>> >>> After a couple of rounds of iteration on linux-sig and python-ideas, I'm now >>> bringing it to python-dev as a concrete proposal for Python 3.7. >>> >>> For most folks, reading the Abstract plus the draft docs updates in the >>> reference implementation will tell you everything you need to know (if the >>> C.UTF-8, C.utf8 or UTF-8 locales are available, the CLI will automatically >>> attempt to coerce the legacy C locale to one of those rather than persisting >>> with the latter's default assumption of ASCII as the preferred text >>> encoding). >> >> I've just pushed a significant update to the PEP based on the >> discussions in this thread: >> https://github.com/python/peps/commit/2fb53e7c1bbb04e1321bca11cc0112aec69f6398 >> >> The main change at the technical level is to modify the handling of >> the coercion target locales such that they *always* lead to >> "surrogateescape" being used by default on the standard streams. That >> means we don't need to call "Py_SetStandardStreamEncoding" during >> startup, that subprocesses will behave the same way as their parent >> processes, and that Python in Linux containers will behave >> consistently regardless of whether the container locale is set to >> "C.UTF-8" explicitly, or is set to "C" and then coerced to "C.UTF-8" >> by CPython. > > Working on the revised implementation for this, I've ended up > refactoring it so that all the heavy lifting is done by a single > function exported from the shared library: "_Py_CoerceLegacyLocale()". > > The CLI code then just contains the check that says "Are we running in > the legacy C locale? If so, call _Py_CoerceLegacyLocale()", with all > the details of how the coercion actually works being hidden away > inside pylifecycle.c. > > That seems like a potential opportunity to make the 3.7 version of > this a public API, using the following pattern: > > if (Py_LegacyLocaleDetected()) { > Py_CoerceLegacyLocale(); > } > > That way applications embedding CPython that wanted to implement the > same locale coercion logic would have an easy way to do so. OK, the reference implementation has been updated to match the latest version of the PEP: https://github.com/ncoghlan/cpython/commit/188e7807b6d9e49377aacbb287c074e5cabf70c5 For now, the implementation in the standalone CLI looks like this: /* [snip] */ extern int _Py_LegacyLocaleDetected(void); extern void _Py_CoerceLegacyLocale(void); /* [snip] */ if (_Py_LegacyLocaleDetected()) { _Py_CoerceLegacyLocale(); } If we decide to make this a public API for 3.7, the necessary changes would be: - remove the leading underscore from the function names - add the function prototypes to the pylifecycle.h header - add the APIs to the C API documentation in the configuration & initialization section - define the APIs in the PEP - adjust the backport note in the PEP to say that backports should NOT expose the public C API, but keep it private Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale
On 6 May 2017 at 18:00, Nick Coghlanwrote: > On 5 March 2017 at 17:50, Nick Coghlan wrote: >> Hi folks, >> >> Late last year I started working on a change to the CPython CLI (*not* the >> shared library) to get it to coerce the legacy C locale to something based >> on UTF-8 when a suitable locale is available. >> >> After a couple of rounds of iteration on linux-sig and python-ideas, I'm now >> bringing it to python-dev as a concrete proposal for Python 3.7. >> >> For most folks, reading the Abstract plus the draft docs updates in the >> reference implementation will tell you everything you need to know (if the >> C.UTF-8, C.utf8 or UTF-8 locales are available, the CLI will automatically >> attempt to coerce the legacy C locale to one of those rather than persisting >> with the latter's default assumption of ASCII as the preferred text >> encoding). > > I've just pushed a significant update to the PEP based on the > discussions in this thread: > https://github.com/python/peps/commit/2fb53e7c1bbb04e1321bca11cc0112aec69f6398 > > The main change at the technical level is to modify the handling of > the coercion target locales such that they *always* lead to > "surrogateescape" being used by default on the standard streams. That > means we don't need to call "Py_SetStandardStreamEncoding" during > startup, that subprocesses will behave the same way as their parent > processes, and that Python in Linux containers will behave > consistently regardless of whether the container locale is set to > "C.UTF-8" explicitly, or is set to "C" and then coerced to "C.UTF-8" > by CPython. Working on the revised implementation for this, I've ended up refactoring it so that all the heavy lifting is done by a single function exported from the shared library: "_Py_CoerceLegacyLocale()". The CLI code then just contains the check that says "Are we running in the legacy C locale? If so, call _Py_CoerceLegacyLocale()", with all the details of how the coercion actually works being hidden away inside pylifecycle.c. That seems like a potential opportunity to make the 3.7 version of this a public API, using the following pattern: if (Py_LegacyLocaleDetected()) { Py_CoerceLegacyLocale(); } That way applications embedding CPython that wanted to implement the same locale coercion logic would have an easy way to do so. Thoughts? Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale
On 5 March 2017 at 17:50, Nick Coghlanwrote: > Hi folks, > > Late last year I started working on a change to the CPython CLI (*not* the > shared library) to get it to coerce the legacy C locale to something based > on UTF-8 when a suitable locale is available. > > After a couple of rounds of iteration on linux-sig and python-ideas, I'm now > bringing it to python-dev as a concrete proposal for Python 3.7. > > For most folks, reading the Abstract plus the draft docs updates in the > reference implementation will tell you everything you need to know (if the > C.UTF-8, C.utf8 or UTF-8 locales are available, the CLI will automatically > attempt to coerce the legacy C locale to one of those rather than persisting > with the latter's default assumption of ASCII as the preferred text > encoding). I've just pushed a significant update to the PEP based on the discussions in this thread: https://github.com/python/peps/commit/2fb53e7c1bbb04e1321bca11cc0112aec69f6398 The main change at the technical level is to modify the handling of the coercion target locales such that they *always* lead to "surrogateescape" being used by default on the standard streams. That means we don't need to call "Py_SetStandardStreamEncoding" during startup, that subprocesses will behave the same way as their parent processes, and that Python in Linux containers will behave consistently regardless of whether the container locale is set to "C.UTF-8" explicitly, or is set to "C" and then coerced to "C.UTF-8" by CPython. That change also eliminated the behaviour that was contingent on whether or not PEP 540 was accepted - PEP 540 may still want to have the coercion target locales imply full UTF-8 mode rather than just setting the stream error handler differently, but that will be a question to be considered when reviewing PEP 540 rather than needing to worry about it now. The second technical change is that the locale coercion and warning are now enabled on Android and Mac OS X. For Android, that's a matter of getting GNU readline to behave sensibly, while for Mac OS X, it's a matter of simplifying the implementation and improving cross-platform behavioural consistency (even though we don't expect the coercion to actually have much impact there). Beyond that, the PEP update focuses on clarifying a few other points without actually changing the proposal. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com