date:20170506

Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-05-06 Thread INADA Naoki

Hi, Nick.

After thinking about relationship between PEP 538 and 540 in two days,
I came up with idea which removes locale coercion by default from PEP 538,
it does just enables UTF-8 mode and show warning about C locale.

Of course, this idea is based on PEP 540.  There are no "If PEP 540 is
rejected".

How do you think?

If it make sense, I want to postpone PEP 538 until PEP 540 is
accepted or rejected, or merge PEP 538 into PEP 540.

## Background

Locale coercion in current PEP 538 has some downsides:

* If user set `LANG=C LC_DATE=ja_JP.UTF-8`, locale coercion may
  overrides LC_DATE.

* It makes behavior divergence between standalone and embedded
  Python.

* Parent Python process may use utf-8:surrogateescape, but child process
  Python may use utf-8:strict.  (Python 3.6 uses ascii:surrogateescape in
  both of parent and children).

On the other hand, benefits from locale coercion is restricted:

* When locale coercion succeeds, warning is always shown.
  To hide the warning, user must disable coercion in some way.
  (e.g. use UTF-8 locale explicitly, or set PYTHONCOERCECLOCALE=0).

So I feel benefit / complexity ratio of locale coercion is less than
UTF-8 mode.

But locale coercion works nice on Android.  And there are some Android-like
Unix systems (container or small device) that C.UTF-8 is always proper locale.

## Rough spec

* Make Android-style locale coercion (forced, no warning) is now
  build option.  Some users who build Python for container or small device
  may like it.

* Normal Python build doesn't change locale.  When python executable is
  run in C locale, show locale warning.  locale warning can be disabled
  as current PEP 538.

* User can disable automatic UTF-8 mode by setting PYTHONUTF8=0
  environment variables.  User can hide warning by setting
  PYTHONUTF8=1 too.

On Fri, May 5, 2017 at 10:21 PM, INADA Naoki  wrote:
> On Fri, May 5, 2017 at 1:25 AM, Antoine Pitrou  wrote:
>> On Thu, 4 May 2017 11:24:27 +0900
>> INADA Naoki  wrote:
>>> Hi, Nick and all core devs who are interested in this PEP.
>>>
>>> I'm reviewing PEP 538 and I want to accept it in this month.
>>> It will reduces much UnicodeError pains which server-side OPs facing.
>>> Thank you Nick for working on this PEP.
>>>
>>> If you have something worrying about this PEP, please post a comment
>>> soon.  If you don't have enough time to read entire this PEP, feel free to
>>> ask a question about you're worrying.
>>
>> From my POV, it is problematic that the behaviour outlined in PEP 538
>> (see Abstract section) varies depending on the adoption of another PEP
>> (PEP 540).
>>
>> If we want to adopt PEP 538 before pronouncing on PEP 540, then PEP 538
>> should remove all points conditional on PEP 540 adoption, and PEP 540
>> should later be changed to adopt those removed points as PEP
>> 540-specific changes.
>>
>> Regards
>>
>> Antoine.
>>
>
> Fair enough.  I stop hurrying about PEP 538 and start reviewing PEP 540.
>
> Thanks,
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-05-06 Thread Nick Coghlan

On 6 May 2017 at 18:33, Nick Coghlan  wrote:
> On 6 May 2017 at 18:00, Nick Coghlan  wrote:
>> On 5 March 2017 at 17:50, Nick Coghlan  wrote:
>>> Hi folks,
>>>
>>> Late last year I started working on a change to the CPython CLI (*not* the
>>> shared library) to get it to coerce the legacy C locale to something based
>>> on UTF-8 when a suitable locale is available.
>>>
>>> After a couple of rounds of iteration on linux-sig and python-ideas, I'm now
>>> bringing it to python-dev as a concrete proposal for Python 3.7.
>>>
>>> For most folks, reading the Abstract plus the draft docs updates in the
>>> reference implementation will tell you everything you need to know (if the
>>> C.UTF-8, C.utf8 or UTF-8 locales are available, the CLI will automatically
>>> attempt to coerce the legacy C locale to one of those rather than persisting
>>> with the latter's default assumption of ASCII as the preferred text
>>> encoding).
>>
>> I've just pushed a significant update to the PEP based on the
>> discussions in this thread:
>> https://github.com/python/peps/commit/2fb53e7c1bbb04e1321bca11cc0112aec69f6398
>>
>> The main change at the technical level is to modify the handling of
>> the coercion target locales such that they *always* lead to
>> "surrogateescape" being used by default on the standard streams. That
>> means we don't need to call "Py_SetStandardStreamEncoding" during
>> startup, that subprocesses will behave the same way as their parent
>> processes, and that Python in Linux containers will behave
>> consistently regardless of whether the container locale is set to
>> "C.UTF-8" explicitly, or is set to "C" and then coerced to "C.UTF-8"
>> by CPython.
>
> Working on the revised implementation for this, I've ended up
> refactoring it so that all the heavy lifting is done by a single
> function exported from the shared library: "_Py_CoerceLegacyLocale()".
>
> The CLI code then just contains the check that says "Are we running in
> the legacy C locale? If so, call _Py_CoerceLegacyLocale()", with all
> the details of how the coercion actually works being hidden away
> inside pylifecycle.c.
>
> That seems like a potential opportunity to make the 3.7 version of
> this a public API, using the following pattern:
>
> if (Py_LegacyLocaleDetected()) {
> Py_CoerceLegacyLocale();
> }
>
> That way applications embedding CPython that wanted to implement the
> same locale coercion logic would have an easy way to do so.

OK, the reference implementation has been updated to match the latest
version of the PEP:
https://github.com/ncoghlan/cpython/commit/188e7807b6d9e49377aacbb287c074e5cabf70c5

For now, the implementation in the standalone CLI looks like this:

/* [snip] */
extern int _Py_LegacyLocaleDetected(void);
extern void _Py_CoerceLegacyLocale(void);
/* [snip] */
 if (_Py_LegacyLocaleDetected()) {
 _Py_CoerceLegacyLocale();
 }

If we decide to make this a public API for 3.7, the necessary changes would be:

- remove the leading underscore from the function names
- add the function prototypes to the pylifecycle.h header
- add the APIs to the C API documentation in the configuration &
initialization section
- define the APIs in the PEP
- adjust the backport note in the PEP to say that backports should NOT
expose the public C API, but keep it private

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-05-06 Thread Nick Coghlan

On 6 May 2017 at 18:00, Nick Coghlan  wrote:
> On 5 March 2017 at 17:50, Nick Coghlan  wrote:
>> Hi folks,
>>
>> Late last year I started working on a change to the CPython CLI (*not* the
>> shared library) to get it to coerce the legacy C locale to something based
>> on UTF-8 when a suitable locale is available.
>>
>> After a couple of rounds of iteration on linux-sig and python-ideas, I'm now
>> bringing it to python-dev as a concrete proposal for Python 3.7.
>>
>> For most folks, reading the Abstract plus the draft docs updates in the
>> reference implementation will tell you everything you need to know (if the
>> C.UTF-8, C.utf8 or UTF-8 locales are available, the CLI will automatically
>> attempt to coerce the legacy C locale to one of those rather than persisting
>> with the latter's default assumption of ASCII as the preferred text
>> encoding).
>
> I've just pushed a significant update to the PEP based on the
> discussions in this thread:
> https://github.com/python/peps/commit/2fb53e7c1bbb04e1321bca11cc0112aec69f6398
>
> The main change at the technical level is to modify the handling of
> the coercion target locales such that they *always* lead to
> "surrogateescape" being used by default on the standard streams. That
> means we don't need to call "Py_SetStandardStreamEncoding" during
> startup, that subprocesses will behave the same way as their parent
> processes, and that Python in Linux containers will behave
> consistently regardless of whether the container locale is set to
> "C.UTF-8" explicitly, or is set to "C" and then coerced to "C.UTF-8"
> by CPython.

Working on the revised implementation for this, I've ended up
refactoring it so that all the heavy lifting is done by a single
function exported from the shared library: "_Py_CoerceLegacyLocale()".

The CLI code then just contains the check that says "Are we running in
the legacy C locale? If so, call _Py_CoerceLegacyLocale()", with all
the details of how the coercion actually works being hidden away
inside pylifecycle.c.

That seems like a potential opportunity to make the 3.7 version of
this a public API, using the following pattern:

if (Py_LegacyLocaleDetected()) {
Py_CoerceLegacyLocale();
}

That way applications embedding CPython that wanted to implement the
same locale coercion logic would have an easy way to do so.

Thoughts?

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-05-06 Thread Nick Coghlan

On 5 March 2017 at 17:50, Nick Coghlan  wrote:
> Hi folks,
>
> Late last year I started working on a change to the CPython CLI (*not* the
> shared library) to get it to coerce the legacy C locale to something based
> on UTF-8 when a suitable locale is available.
>
> After a couple of rounds of iteration on linux-sig and python-ideas, I'm now
> bringing it to python-dev as a concrete proposal for Python 3.7.
>
> For most folks, reading the Abstract plus the draft docs updates in the
> reference implementation will tell you everything you need to know (if the
> C.UTF-8, C.utf8 or UTF-8 locales are available, the CLI will automatically
> attempt to coerce the legacy C locale to one of those rather than persisting
> with the latter's default assumption of ASCII as the preferred text
> encoding).

I've just pushed a significant update to the PEP based on the
discussions in this thread:
https://github.com/python/peps/commit/2fb53e7c1bbb04e1321bca11cc0112aec69f6398

The main change at the technical level is to modify the handling of
the coercion target locales such that they *always* lead to
"surrogateescape" being used by default on the standard streams. That
means we don't need to call "Py_SetStandardStreamEncoding" during
startup, that subprocesses will behave the same way as their parent
processes, and that Python in Linux containers will behave
consistently regardless of whether the container locale is set to
"C.UTF-8" explicitly, or is set to "C" and then coerced to "C.UTF-8"
by CPython.

That change also eliminated the behaviour that was contingent on
whether or not PEP 540 was accepted - PEP 540 may still want to have
the coercion target locales imply full UTF-8 mode rather than just
setting the stream error handler differently, but that will be a
question to be considered when reviewing PEP 540 rather than needing
to worry about it now.

The second technical change is that the locale coercion and warning
are now enabled on Android and Mac OS X. For Android, that's a matter
of getting GNU readline to behave sensibly, while for Mac OS X, it's a
matter of simplifying the implementation and improving cross-platform
behavioural consistency (even though we don't expect the coercion to
actually have much impact there).

Beyond that, the PEP update focuses on clarifying a few other points
without actually changing the proposal.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

4 matches

Site Navigation

Mail list logo

Footer information