Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-05-09 Thread Nick Coghlan
On 9 May 2017 at 13:44, Nick Coghlan  wrote:
> On 8 May 2017 at 15:34, Nick Coghlan  wrote:
>> On 7 May 2017 at 15:22, INADA Naoki  wrote:
>>> ## Background
>>>
>>> Locale coercion in current PEP 538 has some downsides:
>>>
>>> * If user set `LANG=C LC_DATE=ja_JP.UTF-8`, locale coercion may
>>>   overrides LC_DATE.
>>
>> The fact it sets "LC_ALL" has previously been raised as a concern with
>> PEP 538, so it probably makes sense to drop that aspect and just
>> override "LANG". The scenarios where it makes a difference are
>> incredibly obscure (involving non-default SSH locale forwarding
>> settings for folks using SSH on Mac OS X to connect to remote Linux
>> systems), while just setting "LANG" will be sufficient to address the
>> "LANG=C" case that is the main driver for the PEP.
>
> It occurs to me we can even still handle the forwarded
> "LC_CTYPE=UTF-8" case by changing the locale coercion to set LC_CTYPE
> & LANG, rather than just setting LANG as I suggested above.
>
> That way `LANG=C LC_DATE=ja_JP.UTF-8` would still respect the explicit
> LC_DATE setting, `LC_CTYPE=C` would be handled the same way as
> `LANG=C`, and LC_ALL=C would continue to provide a way to force the C
> locale even for LC_CTYPE without needing to be aware of the Python
> specific PYTHONCOERCECLOCALE setting.

I've posted an updated reference implementation that works this way,
and it turned out to have some rather nice benefits: not only did it
make the handling of full locales (C.UTF-8, C.utf8) and partial
locales (UTF-8) more consistent (allowing for a net deletion of code),
it also meant I no longer needed a custom test case in _testembed to
check the locale warning. Instead, the affected test cases now just
set "LC_ALL" as a locale override that switches off CPython's locale
coercion without also switching off the locale warning.

Code changes: 
https://github.com/ncoghlan/cpython/commit/476a78133c94d82e19b89f50036cecd9b4214e7a

Rather than posting the PEP updates here though, I'll start a new
thread that explains what has changed since my initial posting to
python-dev back in March.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-05-08 Thread Nick Coghlan
On 8 May 2017 at 15:34, Nick Coghlan  wrote:
> On 7 May 2017 at 15:22, INADA Naoki  wrote:
>> ## Background
>>
>> Locale coercion in current PEP 538 has some downsides:
>>
>> * If user set `LANG=C LC_DATE=ja_JP.UTF-8`, locale coercion may
>>   overrides LC_DATE.
>
> The fact it sets "LC_ALL" has previously been raised as a concern with
> PEP 538, so it probably makes sense to drop that aspect and just
> override "LANG". The scenarios where it makes a difference are
> incredibly obscure (involving non-default SSH locale forwarding
> settings for folks using SSH on Mac OS X to connect to remote Linux
> systems), while just setting "LANG" will be sufficient to address the
> "LANG=C" case that is the main driver for the PEP.

It occurs to me we can even still handle the forwarded
"LC_CTYPE=UTF-8" case by changing the locale coercion to set LC_CTYPE
& LANG, rather than just setting LANG as I suggested above.

That way `LANG=C LC_DATE=ja_JP.UTF-8` would still respect the explicit
LC_DATE setting, `LC_CTYPE=C` would be handled the same way as
`LANG=C`, and LC_ALL=C would continue to provide a way to force the C
locale even for LC_CTYPE without needing to be aware of the Python
specific PYTHONCOERCECLOCALE setting.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-05-08 Thread INADA Naoki
>>> On platforms where they would have no effect (e.g. Mac OS X, iOS, Android,
>>> Windows) these preprocessor variables would always be undefined.
>>
>> Why ``--with[out]-c-locale-coercion`` have no effect on macOS, iOS and 
>> Android?
>
> On these three, we know the system encoding is UTF-8, so we never
> interpreted the C locale as meaning "ascii" in the first place.
>
>> On Android, locale coercion fixes readline.  Do you mean locale
>> coercion happen always
>> regardless this configuration option?
>
> Right, the change for Android is that we switch to calling
> 'setlocale(LC_ALL, "C.UTF-8")' during interpreter startup instead of
> 'setlocale(LC_ALL, "")'. That change is guarded by "#ifdef
> __ANDROID__", rather than either of the new conditionals.
>
>> On macOS, ``LC_ALL=C python`` doesn't make Python's stdio to
>> ``ascii:surrogateescape``?
>
> Similar to Android, CPython itself is hardcoded to assume UTF-8 on Mac
> OS X, since that's a platform API guarantee that users can't change.
>
>> Even so, locale coercion may fix libraries like readline, curses.
>> While C locale is less common on macOS, I don't understand any
>> reason to disable it on macOS.
>
> My understanding is that other libraries and applications also
> automatically use UTF-8 for system interfaces on Mac OS X and iOS. It
> could be that that understanding is wrong, and locale coercion would
> provide a benefit there as well.
>
> (Checking the draft implementation, it turns out I haven't actually
> implemented the configure logic to make those config settings platform
> dependent yet - they're currently only undefined on Windows by
> default, since that doesn't use the autotools based build system)
>

I tried Python 3.6 on macOS 10.11 El Capitan.

  $ LANG=C python3 -c 'import locale; print(locale.getpreferredencoding())'
  US-ASCII

And interactive shell (which uses readline by default) doesn't accept non-ASCII
input anymore.
https://www.dropbox.com/s/otshuzhnw7a71n5/macos-c-locale-readline.gif?dl=0

I think many problems with C locale are same on macOS too.
So I don't think no special casing is required on macOS.

Regards,
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-05-07 Thread Nick Coghlan
On 7 May 2017 at 15:22, INADA Naoki  wrote:
> Hi, Nick.
>
> After thinking about relationship between PEP 538 and 540 in two days,
> I came up with idea which removes locale coercion by default from PEP 538,
> it does just enables UTF-8 mode and show warning about C locale.
>
> Of course, this idea is based on PEP 540.  There are no "If PEP 540 is
> rejected".
>
> How do you think?

The main problems I see with this approach are:

1. There's no way to configure earlier Python versions to emulate PEP
540. It's a completely new mode of operation.
2. PEP 540 isn't actually defined yet (Victor is still working on it)
3. Due to 1&2, PEP 540 isn't something 3.6 redistributors can
experiment with backporting to a narrower target audience

By contrast, you can emulate PEP 538 all the way back to Python 3.1 by
setting the following environment variables:

LC_ALL=C.UTF-8
LANG=C.UTF-8
PYTHONIOENCODING=utf-8:surrogateescape

(assuming your platform provides a C.UTF-8 locale and you don't need
to run any Python 2.x components in that same environment)

I think the specific concerns you raise below are valid though, and
I'd be happy to amend PEP 538 to address them all.

> If it make sense, I want to postpone PEP 538 until PEP 540 is
> accepted or rejected, or merge PEP 538 into PEP 540.
>
>
> ## Background
>
> Locale coercion in current PEP 538 has some downsides:
>
> * If user set `LANG=C LC_DATE=ja_JP.UTF-8`, locale coercion may
>   overrides LC_DATE.

The fact it sets "LC_ALL" has previously been raised as a concern with
PEP 538, so it probably makes sense to drop that aspect and just
override "LANG". The scenarios where it makes a difference are
incredibly obscure (involving non-default SSH locale forwarding
settings for folks using SSH on Mac OS X to connect to remote Linux
systems), while just setting "LANG" will be sufficient to address the
"LANG=C" case that is the main driver for the PEP.

That means in the case above, the specific LC_DATE setting would still
take precedence.

> * It makes behavior divergence between standalone and embedded
>   Python.

Such divergence already exists, only in the other direction: embedding
applications may override the runtime's default settings, either by
setting a particular locale, or by using Py_SetStandardStreamEncoding
(which was added specifically to make it easy for Blender to force the
use of UTF-8 on the embedded Python's standard streams, regardless of
the currently locale)

That said, this is also the rationale for my suggestion that we expose
locale coercion as a public API:

if (Py_LegacyLocaleDetected()) {
Py_CoerceLegacyLocale();
}

That would make it straightforward for any embedding application that
wanted to do so to replicate the behaviour of the standard CLI.

The level of divergence is also mitigated by the point in the next section.

> * Parent Python process may use utf-8:surrogateescape, but child process
>   Python may use utf-8:strict.  (Python 3.6 uses ascii:surrogateescape in
>   both of parent and children).

This discrepancy is gone now thanks to your suggestion of making
"surrogateescape" the default standard stream handler when one of the
coercion target locales is explicitly configured - both parent
processes and child processes end up with "utf-8:surrogateescape"
configured on the standard streams.

> On the other hand, benefits from locale coercion is restricted:
>
> * When locale coercion succeeds, warning is always shown.
>   To hide the warning, user must disable coercion in some way.
>   (e.g. use UTF-8 locale explicitly, or set PYTHONCOERCECLOCALE=0).

The current warning is based on what we think is appropriate for
Fedora downstream, but that doesn't necessarily mean its the right
approach for Python upstream, especially if the LC_ALL override is
dropped. We could also opt for a model where Python 3.7 emits the
coercion warning, but Python 3.8 just does the coercion silently (that
rationale would then also apply to PEP 540 - we'd warn on stderr about
the change in default behaviour in 3.7, but take the new behaviour for
granted in 3.8).

The change to make the standard stream error handler setting depend
solely on the currently configured locale also helps here, since it
means it doesn't matter how a process reached the state of having the
locale set to "C.UTF-8". CPython will behave the same way regardless,
so it makes it less import to provide an explicit notice that coercion
took place.

> So I feel benefit / complexity ratio of locale coercion is less than
> UTF-8 mode.

It isn't an either/or though - we're entirely free to do both, one
based solely on the existing configuration options that have been
around since 3.1, and the other going beyond those to also adjust the
default behaviour of other interfaces (like "open()").

> But locale coercion works nice on Android.  And there are some Android-like
> Unix systems (container or small device) that C.UTF-8 is always proper 

Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-05-06 Thread INADA Naoki
Hi, Nick.

After thinking about relationship between PEP 538 and 540 in two days,
I came up with idea which removes locale coercion by default from PEP 538,
it does just enables UTF-8 mode and show warning about C locale.

Of course, this idea is based on PEP 540.  There are no "If PEP 540 is
rejected".

How do you think?

If it make sense, I want to postpone PEP 538 until PEP 540 is
accepted or rejected, or merge PEP 538 into PEP 540.


## Background

Locale coercion in current PEP 538 has some downsides:

* If user set `LANG=C LC_DATE=ja_JP.UTF-8`, locale coercion may
  overrides LC_DATE.

* It makes behavior divergence between standalone and embedded
  Python.

* Parent Python process may use utf-8:surrogateescape, but child process
  Python may use utf-8:strict.  (Python 3.6 uses ascii:surrogateescape in
  both of parent and children).

On the other hand, benefits from locale coercion is restricted:

* When locale coercion succeeds, warning is always shown.
  To hide the warning, user must disable coercion in some way.
  (e.g. use UTF-8 locale explicitly, or set PYTHONCOERCECLOCALE=0).

So I feel benefit / complexity ratio of locale coercion is less than
UTF-8 mode.

But locale coercion works nice on Android.  And there are some Android-like
Unix systems (container or small device) that C.UTF-8 is always proper locale.


## Rough spec

* Make Android-style locale coercion (forced, no warning) is now
  build option.  Some users who build Python for container or small device
  may like it.

* Normal Python build doesn't change locale.  When python executable is
  run in C locale, show locale warning.  locale warning can be disabled
  as current PEP 538.

* User can disable automatic UTF-8 mode by setting PYTHONUTF8=0
  environment variables.  User can hide warning by setting
  PYTHONUTF8=1 too.



On Fri, May 5, 2017 at 10:21 PM, INADA Naoki  wrote:
> On Fri, May 5, 2017 at 1:25 AM, Antoine Pitrou  wrote:
>> On Thu, 4 May 2017 11:24:27 +0900
>> INADA Naoki  wrote:
>>> Hi, Nick and all core devs who are interested in this PEP.
>>>
>>> I'm reviewing PEP 538 and I want to accept it in this month.
>>> It will reduces much UnicodeError pains which server-side OPs facing.
>>> Thank you Nick for working on this PEP.
>>>
>>> If you have something worrying about this PEP, please post a comment
>>> soon.  If you don't have enough time to read entire this PEP, feel free to
>>> ask a question about you're worrying.
>>
>> From my POV, it is problematic that the behaviour outlined in PEP 538
>> (see Abstract section) varies depending on the adoption of another PEP
>> (PEP 540).
>>
>> If we want to adopt PEP 538 before pronouncing on PEP 540, then PEP 538
>> should remove all points conditional on PEP 540 adoption, and PEP 540
>> should later be changed to adopt those removed points as PEP
>> 540-specific changes.
>>
>> Regards
>>
>> Antoine.
>>
>
> Fair enough.  I stop hurrying about PEP 538 and start reviewing PEP 540.
>
> Thanks,
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-05-06 Thread Nick Coghlan
On 6 May 2017 at 18:33, Nick Coghlan  wrote:
> On 6 May 2017 at 18:00, Nick Coghlan  wrote:
>> On 5 March 2017 at 17:50, Nick Coghlan  wrote:
>>> Hi folks,
>>>
>>> Late last year I started working on a change to the CPython CLI (*not* the
>>> shared library) to get it to coerce the legacy C locale to something based
>>> on UTF-8 when a suitable locale is available.
>>>
>>> After a couple of rounds of iteration on linux-sig and python-ideas, I'm now
>>> bringing it to python-dev as a concrete proposal for Python 3.7.
>>>
>>> For most folks, reading the Abstract plus the draft docs updates in the
>>> reference implementation will tell you everything you need to know (if the
>>> C.UTF-8, C.utf8 or UTF-8 locales are available, the CLI will automatically
>>> attempt to coerce the legacy C locale to one of those rather than persisting
>>> with the latter's default assumption of ASCII as the preferred text
>>> encoding).
>>
>> I've just pushed a significant update to the PEP based on the
>> discussions in this thread:
>> https://github.com/python/peps/commit/2fb53e7c1bbb04e1321bca11cc0112aec69f6398
>>
>> The main change at the technical level is to modify the handling of
>> the coercion target locales such that they *always* lead to
>> "surrogateescape" being used by default on the standard streams. That
>> means we don't need to call "Py_SetStandardStreamEncoding" during
>> startup, that subprocesses will behave the same way as their parent
>> processes, and that Python in Linux containers will behave
>> consistently regardless of whether the container locale is set to
>> "C.UTF-8" explicitly, or is set to "C" and then coerced to "C.UTF-8"
>> by CPython.
>
> Working on the revised implementation for this, I've ended up
> refactoring it so that all the heavy lifting is done by a single
> function exported from the shared library: "_Py_CoerceLegacyLocale()".
>
> The CLI code then just contains the check that says "Are we running in
> the legacy C locale? If so, call _Py_CoerceLegacyLocale()", with all
> the details of how the coercion actually works being hidden away
> inside pylifecycle.c.
>
> That seems like a potential opportunity to make the 3.7 version of
> this a public API, using the following pattern:
>
> if (Py_LegacyLocaleDetected()) {
> Py_CoerceLegacyLocale();
> }
>
> That way applications embedding CPython that wanted to implement the
> same locale coercion logic would have an easy way to do so.

OK, the reference implementation has been updated to match the latest
version of the PEP:
https://github.com/ncoghlan/cpython/commit/188e7807b6d9e49377aacbb287c074e5cabf70c5

For now, the implementation in the standalone CLI looks like this:

/* [snip] */
extern int _Py_LegacyLocaleDetected(void);
extern void _Py_CoerceLegacyLocale(void);
/* [snip] */
 if (_Py_LegacyLocaleDetected()) {
 _Py_CoerceLegacyLocale();
 }

If we decide to make this a public API for 3.7, the necessary changes would be:

- remove the leading underscore from the function names
- add the function prototypes to the pylifecycle.h header
- add the APIs to the C API documentation in the configuration &
initialization section
- define the APIs in the PEP
- adjust the backport note in the PEP to say that backports should NOT
expose the public C API, but keep it private

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-05-06 Thread Nick Coghlan
On 6 May 2017 at 18:00, Nick Coghlan  wrote:
> On 5 March 2017 at 17:50, Nick Coghlan  wrote:
>> Hi folks,
>>
>> Late last year I started working on a change to the CPython CLI (*not* the
>> shared library) to get it to coerce the legacy C locale to something based
>> on UTF-8 when a suitable locale is available.
>>
>> After a couple of rounds of iteration on linux-sig and python-ideas, I'm now
>> bringing it to python-dev as a concrete proposal for Python 3.7.
>>
>> For most folks, reading the Abstract plus the draft docs updates in the
>> reference implementation will tell you everything you need to know (if the
>> C.UTF-8, C.utf8 or UTF-8 locales are available, the CLI will automatically
>> attempt to coerce the legacy C locale to one of those rather than persisting
>> with the latter's default assumption of ASCII as the preferred text
>> encoding).
>
> I've just pushed a significant update to the PEP based on the
> discussions in this thread:
> https://github.com/python/peps/commit/2fb53e7c1bbb04e1321bca11cc0112aec69f6398
>
> The main change at the technical level is to modify the handling of
> the coercion target locales such that they *always* lead to
> "surrogateescape" being used by default on the standard streams. That
> means we don't need to call "Py_SetStandardStreamEncoding" during
> startup, that subprocesses will behave the same way as their parent
> processes, and that Python in Linux containers will behave
> consistently regardless of whether the container locale is set to
> "C.UTF-8" explicitly, or is set to "C" and then coerced to "C.UTF-8"
> by CPython.

Working on the revised implementation for this, I've ended up
refactoring it so that all the heavy lifting is done by a single
function exported from the shared library: "_Py_CoerceLegacyLocale()".

The CLI code then just contains the check that says "Are we running in
the legacy C locale? If so, call _Py_CoerceLegacyLocale()", with all
the details of how the coercion actually works being hidden away
inside pylifecycle.c.

That seems like a potential opportunity to make the 3.7 version of
this a public API, using the following pattern:

if (Py_LegacyLocaleDetected()) {
Py_CoerceLegacyLocale();
}

That way applications embedding CPython that wanted to implement the
same locale coercion logic would have an easy way to do so.

Thoughts?

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-05-06 Thread Nick Coghlan
On 5 March 2017 at 17:50, Nick Coghlan  wrote:
> Hi folks,
>
> Late last year I started working on a change to the CPython CLI (*not* the
> shared library) to get it to coerce the legacy C locale to something based
> on UTF-8 when a suitable locale is available.
>
> After a couple of rounds of iteration on linux-sig and python-ideas, I'm now
> bringing it to python-dev as a concrete proposal for Python 3.7.
>
> For most folks, reading the Abstract plus the draft docs updates in the
> reference implementation will tell you everything you need to know (if the
> C.UTF-8, C.utf8 or UTF-8 locales are available, the CLI will automatically
> attempt to coerce the legacy C locale to one of those rather than persisting
> with the latter's default assumption of ASCII as the preferred text
> encoding).

I've just pushed a significant update to the PEP based on the
discussions in this thread:
https://github.com/python/peps/commit/2fb53e7c1bbb04e1321bca11cc0112aec69f6398

The main change at the technical level is to modify the handling of
the coercion target locales such that they *always* lead to
"surrogateescape" being used by default on the standard streams. That
means we don't need to call "Py_SetStandardStreamEncoding" during
startup, that subprocesses will behave the same way as their parent
processes, and that Python in Linux containers will behave
consistently regardless of whether the container locale is set to
"C.UTF-8" explicitly, or is set to "C" and then coerced to "C.UTF-8"
by CPython.

That change also eliminated the behaviour that was contingent on
whether or not PEP 540 was accepted - PEP 540 may still want to have
the coercion target locales imply full UTF-8 mode rather than just
setting the stream error handler differently, but that will be a
question to be considered when reviewing PEP 540 rather than needing
to worry about it now.

The second technical change is that the locale coercion and warning
are now enabled on Android and Mac OS X. For Android, that's a matter
of getting GNU readline to behave sensibly, while for Mac OS X, it's a
matter of simplifying the implementation and improving cross-platform
behavioural consistency (even though we don't expect the coercion to
actually have much impact there).

Beyond that, the PEP update focuses on clarifying a few other points
without actually changing the proposal.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-05-05 Thread Nick Coghlan
On 5 May 2017 at 23:21, INADA Naoki  wrote:
> On Fri, May 5, 2017 at 1:25 AM, Antoine Pitrou  wrote:
>> If we want to adopt PEP 538 before pronouncing on PEP 540, then PEP 538
>> should remove all points conditional on PEP 540 adoption, and PEP 540
>> should later be changed to adopt those removed points as PEP
>> 540-specific changes.
>
> Fair enough.  I stop hurrying about PEP 538 and start reviewing PEP 540.

Don't forget that Victor's still working on the design of PEP 540, so
it isn't ready for pronouncement yet.

Antoine's request was for me to update PEP *538* to eliminate the
"this will need to change if PEP 540 is accepted" aspects, and I think
your suggestion to make the "C.UTF-8 -> surrogateescape on standard
streams by default" behaviour independent of the locale coercion will
achieve that.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-05-05 Thread INADA Naoki
On Fri, May 5, 2017 at 1:25 AM, Antoine Pitrou  wrote:
> On Thu, 4 May 2017 11:24:27 +0900
> INADA Naoki  wrote:
>> Hi, Nick and all core devs who are interested in this PEP.
>>
>> I'm reviewing PEP 538 and I want to accept it in this month.
>> It will reduces much UnicodeError pains which server-side OPs facing.
>> Thank you Nick for working on this PEP.
>>
>> If you have something worrying about this PEP, please post a comment
>> soon.  If you don't have enough time to read entire this PEP, feel free to
>> ask a question about you're worrying.
>
> From my POV, it is problematic that the behaviour outlined in PEP 538
> (see Abstract section) varies depending on the adoption of another PEP
> (PEP 540).
>
> If we want to adopt PEP 538 before pronouncing on PEP 540, then PEP 538
> should remove all points conditional on PEP 540 adoption, and PEP 540
> should later be changed to adopt those removed points as PEP
> 540-specific changes.
>
> Regards
>
> Antoine.
>

Fair enough.  I stop hurrying about PEP 538 and start reviewing PEP 540.

Thanks,
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-05-05 Thread Nick Coghlan
On 5 May 2017 at 19:45, Erik Bray  wrote:
> On Thu, May 4, 2017 at 6:25 PM, Antoine Pitrou  wrote:
>> If we want to adopt PEP 538 before pronouncing on PEP 540, then PEP 538
>> should remove all points conditional on PEP 540 adoption, and PEP 540
>> should later be changed to adopt those removed points as PEP
>> 540-specific changes.
>
> This is kind of an aside, but regardless of the dependency
> relationship between PEP 538 and 540, given that they kind of go
> hand-in-hand would it make sense to rename them--e.g. have PEP 539 and
> PEP 540 trade places, since PEP 539 has nothing to do with this and is
> awkwardly nestled between them.  Or would that only confuse matters at
> this point?

While we have renumbered PEPs in the past, it was only in cases where
the PEPs were relatively new, so there weren't many discussions
referencing them under their existing numbers.

In this case, both PEP 539 and 540 have already been discussed
extensively, so renumbering them would cause problems without
providing any corresponding benefit (Python's development is
sufficiently high volume that it isn't unusual for related PEPs to
have non-sequential PEP numbers)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-05-05 Thread Erik Bray
On Thu, May 4, 2017 at 6:25 PM, Antoine Pitrou  wrote:
> On Thu, 4 May 2017 11:24:27 +0900
> INADA Naoki  wrote:
>> Hi, Nick and all core devs who are interested in this PEP.
>>
>> I'm reviewing PEP 538 and I want to accept it in this month.
>> It will reduces much UnicodeError pains which server-side OPs facing.
>> Thank you Nick for working on this PEP.
>>
>> If you have something worrying about this PEP, please post a comment
>> soon.  If you don't have enough time to read entire this PEP, feel free to
>> ask a question about you're worrying.
>
> From my POV, it is problematic that the behaviour outlined in PEP 538
> (see Abstract section) varies depending on the adoption of another PEP
> (PEP 540).
>
> If we want to adopt PEP 538 before pronouncing on PEP 540, then PEP 538
> should remove all points conditional on PEP 540 adoption, and PEP 540
> should later be changed to adopt those removed points as PEP
> 540-specific changes.

This is kind of an aside, but regardless of the dependency
relationship between PEP 538 and 540, given that they kind of go
hand-in-hand would it make sense to rename them--e.g. have PEP 539 and
PEP 540 trade places, since PEP 539 has nothing to do with this and is
awkwardly nestled between them.  Or would that only confuse matters at
this point?

Thanks,
Erik
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-05-04 Thread Nick Coghlan
On 5 May 2017 at 02:25, Antoine Pitrou  wrote:
> On Thu, 4 May 2017 11:24:27 +0900
> INADA Naoki  wrote:
>> Hi, Nick and all core devs who are interested in this PEP.
>>
>> I'm reviewing PEP 538 and I want to accept it in this month.
>> It will reduces much UnicodeError pains which server-side OPs facing.
>> Thank you Nick for working on this PEP.
>>
>> If you have something worrying about this PEP, please post a comment
>> soon.  If you don't have enough time to read entire this PEP, feel free to
>> ask a question about you're worrying.
>
> From my POV, it is problematic that the behaviour outlined in PEP 538
> (see Abstract section) varies depending on the adoption of another PEP
> (PEP 540).
>
> If we want to adopt PEP 538 before pronouncing on PEP 540, then PEP 538
> should remove all points conditional on PEP 540 adoption, and PEP 540
> should later be changed to adopt those removed points as PEP
> 540-specific changes.

While I won't be certain until I update the PEP and reference
implementation, I'm pretty sure Inada-san's suggestion to replace the
call to Py_SetStandardStreamEncoding with defaulting to
surrogateescape on the standard streams in the C.UTF-8 locale will
remove this current dependency between the PEPs as well as making the
"C.UTF-8 locale" and "C locale coerced to C.UTF-8" behaviour
indistinguishable at runtime (aside from the stderr warning in the
latter case).

It will then be up to Victor to state in PEP 540 how locale coercion
will interact with Python UTF-8 mode (with my recommendation being the
one currently in PEP 538: it should implicitly set the environment
variable, so the mode activation is inherited by subprocesses)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-05-04 Thread Toshio Kuratomi
On Sat, Mar 4, 2017 at 11:50 PM, Nick Coghlan  wrote:
>
> Providing implicit locale coercion only when running standalone
> ---
>
> Over the course of Python 3.x development, multiple attempts have been made
> to improve the handling of incorrect locale settings at the point where the
> Python interpreter is initialised. The problem that emerged is that this is
> ultimately *too late* in the interpreter startup process - data such as
> command
> line arguments and the contents of environment variables may have already
> been
> retrieved from the operating system and processed under the incorrect ASCII
> text encoding assumption well before ``Py_Initialize`` is called.
>
> The problems created by those inconsistencies were then even harder to
> diagnose
> and debug than those created by believing the operating system's claim that
> ASCII was a suitable encoding to use for operating system interfaces. This
> was
> the case even for the default CPython binary, let alone larger C/C++
> applications that embed CPython as a scripting engine.
>
> The approach proposed in this PEP handles that problem by moving the locale
> coercion as early as possible in the interpreter startup sequence when
> running
> standalone: it takes place directly in the C-level ``main()`` function, even
> before calling in to the `Py_Main()`` library function that implements the
> features of the CPython interpreter CLI.
>
> The ``Py_Initialize`` API then only gains an explicit warning (emitted on
> ``stderr``) when it detects use of the ``C`` locale, and relies on the
> embedding application to specify something more reasonable.
>

It feels like having a short section on the caveats of this approach
would help to introduce this section.  Something that says that this
PEP can cause a split in how Python behaves in non-sandalone
applications (mod_wsgi, IDEs where libpython is compiled in, etc) vs
standalone (unless the embedders take similar steps as standalone
python is doing).  Then go on to state that this approach was still
chosen as coercing in Py_Initialize is too late, causing the
inconsistencies and problems listed here.

-Toshio
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-05-04 Thread Antoine Pitrou
On Thu, 4 May 2017 11:24:27 +0900
INADA Naoki  wrote:
> Hi, Nick and all core devs who are interested in this PEP.
> 
> I'm reviewing PEP 538 and I want to accept it in this month.
> It will reduces much UnicodeError pains which server-side OPs facing.
> Thank you Nick for working on this PEP.
> 
> If you have something worrying about this PEP, please post a comment
> soon.  If you don't have enough time to read entire this PEP, feel free to
> ask a question about you're worrying.

>From my POV, it is problematic that the behaviour outlined in PEP 538
(see Abstract section) varies depending on the adoption of another PEP
(PEP 540).

If we want to adopt PEP 538 before pronouncing on PEP 540, then PEP 538
should remove all points conditional on PEP 540 adoption, and PEP 540
should later be changed to adopt those removed points as PEP
540-specific changes.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-05-04 Thread Nick Coghlan
On 4 May 2017 at 12:24, INADA Naoki  wrote:
> [PEP 538]
>> * PEP 540 proposes to entirely decouple CPython's default text encoding from
>>   the C locale system in that case, allowing text handling inconsistencies to
>>   arise between CPython and other locale-aware components running in the same
>>   process and in subprocesses. This approach aims to make CPython behave less
>>   like a locale-aware application, and more like locale-independent language
>>   runtimes like the JVM, .NET CLR, Go, Node.js, and Rust
>
> https://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html says:
>
>> Every instance of the Java virtual machine has a default charset, which may 
>> or may not be one of the standard charsets. The default charset is 
>> determined during virtual-machine startup and typically depends upon the 
>> locale and charset being used by the underlying operating system.
>
> I don't know about .NET runtime on Unix much.  (mono and .NET Core).
> "Go, Node.js and Rust" seems enough examples.

I'll push an update to drop the JVM and .NET from the list of examples.

>> New build-time configuration options
>> 
[snip]
> In case of (b), while warning about C locale is not shown, warning
> about coercion
> is still shown.  So when people don't want to see warning under C
> locale and there is no
> (C.UTF-8, C.utf8, UTF-8) locales, there are three ways:
>
> * Set PYTHONUTF=1 (if PEP 540 is accepted)
> * Set PYTHONCOERCECLOCALE=0.
> * Use both of ``--without-c-locale-coercion`` and 
> ``--without-c-locale-warning``
>   configure options.
>
> Is my understanding right?

Yes, that sounds right.

> BTW, I prefer PEP 540 provides ``--with-utf8mode`` option which
> enables UTF-8 mode
> by default.  And if it is added, there are too few use cases for
> ``--without-c-locale-warning``.
>
> There are some use cases people want to use UTF-8 by default in system
> wide. (e.g.
> container, webserver in Cent OS, etc...)
>
> On the other hand, most of C locale usage are "per application" basis,
> rather than "system wide."
> configure option is not suitable for such per application setting, off course.

Yeah, in addition to Barry requesting such an option in one of the
earlier linux-sig reviews, my main rationale for including it is that
providing both config options offers a quick compatibility fix for any
distro where emitting the coercion and/or C locale warning on stderr
causes problems.

The only one of those that Fedora encountered in the F26 alpha was
deemed a bug in the affected application (something in autotools was
checking for "no output on stderr" instead of "subprocess exit code is
0", and the fix was to switch it to check the subprocess exit code),
but there are enough Linux distros and BSD variants out there that I'm
a lot more comfortable shipping the change with straightforward "off"
switches for the stderr output.

> But I don't propose removing the option from PEP 538.
> We can discuss about reducing configure options later.

+1.

>> On platforms where they would have no effect (e.g. Mac OS X, iOS, Android,
>> Windows) these preprocessor variables would always be undefined.
>
> Why ``--with[out]-c-locale-coercion`` have no effect on macOS, iOS and 
> Android?

On these three, we know the system encoding is UTF-8, so we never
interpreted the C locale as meaning "ascii" in the first place.

> On Android, locale coercion fixes readline.  Do you mean locale
> coercion happen always
> regardless this configuration option?

Right, the change for Android is that we switch to calling
'setlocale(LC_ALL, "C.UTF-8")' during interpreter startup instead of
'setlocale(LC_ALL, "")'. That change is guarded by "#ifdef
__ANDROID__", rather than either of the new conditionals.

> On macOS, ``LC_ALL=C python`` doesn't make Python's stdio to
> ``ascii:surrogateescape``?

Similar to Android, CPython itself is hardcoded to assume UTF-8 on Mac
OS X, since that's a platform API guarantee that users can't change.

> Even so, locale coercion may fix libraries like readline, curses.
> While C locale is less common on macOS, I don't understand any
> reason to disable it on macOS.

My understanding is that other libraries and applications also
automatically use UTF-8 for system interfaces on Mac OS X and iOS. It
could be that that understanding is wrong, and locale coercion would
provide a benefit there as well.

(Checking the draft implementation, it turns out I haven't actually
implemented the configure logic to make those config settings platform
dependent yet - they're currently only undefined on Windows by
default, since that doesn't use the autotools based build system)

>
> I know almost nothing about iOS, but it's similar to Android or macOS
> in my expectation.
>
>
>> Improving the handling of the C locale
>> --
>>
> ...
>> locale settings for locale-aware operations. Both the JVM and the .NET CLR
>> use UTF-16-LE as 

Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-05-03 Thread INADA Naoki
Hi, Nick and all core devs who are interested in this PEP.

I'm reviewing PEP 538 and I want to accept it in this month.
It will reduces much UnicodeError pains which server-side OPs facing.
Thank you Nick for working on this PEP.

If you have something worrying about this PEP, please post a comment
soon.  If you don't have enough time to read entire this PEP, feel free to
ask a question about you're worrying.


Here is my comments:

>
> Relationship with other PEPs
> 
>
> This PEP shares a common problem statement with PEP 540 (improving Python
> 3's
> behaviour in the default C locale), but diverges markedly in the proposed
> solution:
>
> * PEP 540 proposes to entirely decouple CPython's default text encoding from
>   the C locale system in that case, allowing text handling inconsistencies to
>   arise between CPython and other locale-aware components running in the same
>   process and in subprocesses. This approach aims to make CPython behave less
>   like a locale-aware application, and more like locale-independent language
>   runtimes like the JVM, .NET CLR, Go, Node.js, and Rust

https://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html says:

> Every instance of the Java virtual machine has a default charset, which may 
> or may not be one of the standard charsets. The default charset is determined 
> during virtual-machine startup and typically depends upon the locale and 
> charset being used by the underlying operating system.

I don't know about .NET runtime on Unix much.  (mono and .NET Core).
"Go, Node.js and Rust" seems enough examples.



> New build-time configuration options
> 
>
> While both of the above behaviours would be enabled by default, they would
> also have new associated configuration options and preprocessor definitions
> for the benefit of redistributors that want to override those default
> settings.
>
> The locale coercion behaviour would be controlled by the flag
> ``--with[out]-c-locale-coercion``, which would set the
> ``PY_COERCE_C_LOCALE``
> preprocessor definition.
>
> The locale warning behaviour would be controlled by the flag
> ``--with[out]-c-locale-warning``, which would set the
> ``PY_WARN_ON_C_LOCALE``
> preprocessor definition.

"locale warning" means warning printed when C locale is used, am I right?

As my understanding, "locale warning" is shown in these cases (all cases implies
under C locale and PYTHONUTF8 is not enabled).

a. C locale is used and locale coercion is disabled by
   ``--without-c-locale-coercion`` configure option.
b. locale coercion is failed since there is none of C.UTF-8, C.utf8,
nor UTF-8 locale.
c. Python is embedded. locale coercion can't be used in this case.

In case of (b), while warning about C locale is not shown, warning
about coercion
is still shown.  So when people don't want to see warning under C
locale and there is no
(C.UTF-8, C.utf8, UTF-8) locales, there are three ways:

* Set PYTHONUTF=1 (if PEP 540 is accepted)
* Set PYTHONCOERCECLOCALE=0.
* Use both of ``--without-c-locale-coercion`` and ``--without-c-locale-warning``
  configure options.

Is my understanding right?

BTW, I prefer PEP 540 provides ``--with-utf8mode`` option which
enables UTF-8 mode
by default.  And if it is added, there are too few use cases for
``--without-c-locale-warning``.

There are some use cases people want to use UTF-8 by default in system
wide. (e.g.
container, webserver in Cent OS, etc...)

On the other hand, most of C locale usage are "per application" basis,
rather than "system wide."
configure option is not suitable for such per application setting, off course.

But I don't propose removing the option from PEP 538.
We can discuss about reducing configure options later.


>
> On platforms where they would have no effect (e.g. Mac OS X, iOS, Android,
> Windows) these preprocessor variables would always be undefined.
>

Why ``--with[out]-c-locale-coercion`` have no effect on macOS, iOS and Android?

On Android, locale coercion fixes readline.  Do you mean locale
coercion happen always
regardless this configuration option?

On macOS, ``LC_ALL=C python`` doesn't make Python's stdio to
``ascii:surrogateescape``?
Even so, locale coercion may fix libraries like readline, curses.
While C locale is less common on macOS, I don't understand any
reason to disable it on macOS.

I know almost nothing about iOS, but it's similar to Android or macOS
in my expectation.


> Improving the handling of the C locale
> --
>
...
> locale settings for locale-aware operations. Both the JVM and the .NET CLR
> use UTF-16-LE as their primary encoding for passing text between applications
> and the underlying platform.

JVM and .NET examples are misleading again.
They just use UTF-16-LE for syscall on Windows, like Python.

I don't know about them much, but I believe they don't use UTF-16 for system
encoding on Linux.


> Defaulting to "surrogateescape" 

Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-03-15 Thread Nick Coghlan
On 16 March 2017 at 00:30, Barry Warsaw  wrote:

> On Mar 15, 2017, at 12:29 PM, Nick Coghlan wrote:
>
> >From a mainstream Linux point of view, it's not common - on
> systemd-managed
> >systems, for example, the only way to get the C locale these days is to
> >either specify it in /etc/locale.conf, or to set it specifically in the
> >environment.
>
> I think it's still the case that some isolation environments (e.g. Debian
> chroots) default to bare C locales.  Often it doesn't matter, but sometimes
> tests or other applications run inside those environments will fail in ways
> they don't in a normal execution environment.


Yeah, I think mock (the Fedora/RHEL/CentOS build environment for RPMs)
still defaults to a bare C locale, and Docker environments usually aren't
systemd-managed in the first place (since PID 1 inside a container
typically isn't an init system at all). The general trend for all of those
seems to be "they don't use C.UTF-8... yet", though (even though some of
them may not shift until the default changes at the level of the given
distro's libc implementation).

  The answer is almost always to
> explicitly coerce those environments to C.UTF-8 for Linuxes that support
> that.
>

I also double checked that "LANG=C ./python -m test" still worked with the
reference implementation.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-03-15 Thread Barry Warsaw
On Mar 15, 2017, at 12:29 PM, Nick Coghlan wrote:

>From a mainstream Linux point of view, it's not common - on systemd-managed
>systems, for example, the only way to get the C locale these days is to
>either specify it in /etc/locale.conf, or to set it specifically in the
>environment.

I think it's still the case that some isolation environments (e.g. Debian
chroots) default to bare C locales.  Often it doesn't matter, but sometimes
tests or other applications run inside those environments will fail in ways
they don't in a normal execution environment.  The answer is almost always to
explicitly coerce those environments to C.UTF-8 for Linuxes that support that.

-Barry
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-03-14 Thread Nick Coghlan
On 15 March 2017 at 06:22, Chris Barker  wrote:

> So the question nis -- is anyone counting on errors in this case? i.e., is
> a sysadmin thinking:
>
> "I want an ASCII-only system, so I'll set the locale, and now I can expect
> any program running on this system that is not ascii compatible to fail."
>
> I honestly don't know if this is common -- but I would argue that trying
> to run a unicode-aware program on an ASCII-only system could be considered
> a mis-configuration as well.
>

>From a mainstream Linux point of view, it's not common - on systemd-managed
systems, for example, the only way to get the C locale these days is to
either specify it in /etc/locale.conf, or to set it specifically in the
environment. Upstart was a little less reliable about that, and sysvinit
was less reliable still, but the trend is definitely towards making C.UTF-8
the assumed default, rather than "C". Even glibc itself would quite like to
get to a point where you only get the C locale if you explicitly ask for
it: https://sourceware.org/glibc/wiki/Proposals/C.UTF-8

The main practical objection that comes up in relation to "UTF-8
everywhere" isn't to do with UTF-8 per se, but rather with the size of the
collation tables needed to do "proper" sorting of Unicode code points.
However, there's a neat hack in the design of UTF-8 where sorting the
encoded bytes by byte value is equivalent to sorting the decoded text by
the Unicode code point values, which means that "LC_COLLATE=C" sorting by
byte value, and "LC_COLLATE=C.UTF-8" sorting by "Unicode code point value"
give the same results.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-03-14 Thread Nick Coghlan
On 15 March 2017 at 00:17, Nick Coghlan  wrote:

> On 13 March 2017 at 23:31, Random832  wrote:
>
>> On Mon, Mar 13, 2017, at 04:37, INADA Naoki wrote:
>> > But locale coercing works nice on platforms like android.
>> > So how about simplified version of PEP 538?  Just adding configure
>> > option for locale coercing
>> > which is disabled by default.  No envvar options and no warnings.
>>
>> A configure option just kicks the decision to packagers - either no-one
>> uses it (and thus it solves nothing) or people do use it (and any
>> problems it causes won't be mitigated at all)
>>
>
> Distro packagers have narrower user bases and a better known set of
> compatibility constraints than upstream, so kicking platform integration
> related config decisions downstream to us(/them) is actually a pretty
> reasonable thing for upstream to do :)
>
> For example, while I've been iterating on the reference implementation for
> 3.7, Charalampos Stratakis has been iterating on the backport patch for
> Fedora 26, and he's found that we really need the PEP's "disable the C
> locale warning" config option to turn off the CLI's coercion warning in
> addition to the warning in the shared library, as leaving it visible breaks
> build processes for other packages that check that there aren't any
> messages being emitted to stderr (or otherwise care about the exact output
> from build tools that rely on the system Python 3 runtime).
>

The build processes that broke due to the warning were judged to be a bug
in autoconf rather than a problem with the warning itself:
http://git.savannah.gnu.org/gitweb/?p=autoconf-archive.git;a=commit;h=883a2abd5af5c96be894d5ef7ee6e9a2b8e64307

So we're going to leave this as it is in the PEP for now (i.e. the locale
coercion warning always happens unless you preconfigure a locale other than
C), but keep an eye on it to see if it causes any other problems.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-03-14 Thread Chris Barker
There was a bunch of discussion about all this a while back, in which I
think these points were addressed:

However, in some cases the C locale is a normal environment for system
> services, cron scripts, distro package builds and whatnot.
>

Indeed it is. But:

if you run a Python (or any) program that is expecting an ASCII-only
locale, then it will work jsut fine with any ascii-compatible locale. -- so
no problem there.

On the other hand, if you run a program that is expectign a unicode-aware
locale, then it might barf unexpectently if run on a ASCII-only locale. A
lot of people do in fiact have these issues (which are due to
mis-configuration of the host system, which is indeed not properly Python's
problem).

So if we do all this, then:

A) mis-configured systems will magically work (sometimes)

 This is a Good Thing.

and

B) If someone runs a python program that is expecting Unicode support on an
properly configured ASCII-only system, then it will mostly "just work" --
after all a lot of C APIs are simply char*, who cares what the encoding is?
It would not, however, fail if when a non-ascii value is used somewhere it
shouldn't.

So the question nis -- is anyone counting on errors in this case? i.e., is
a sysadmin thinking:

"I want an ASCII-only system, so I'll set the locale, and now I can expect
any program running on this system that is not ascii compatible to fail."

I honestly don't know if this is common -- but I would argue that trying to
run a unicode-aware program on an ASCII-only system could be considered a
mis-configuration as well.

Also -- many programs will just be writing bytes to the system without
checking encoding anyway. So this would simply let Python3 programs behave
like most others...

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-03-14 Thread Random832
On Tue, Mar 14, 2017, at 10:17, Nick Coghlan wrote:
> It's not that you *can't* run Python 3 in that kind of environment, and
> it's not that there are never any valid reasons to do so. It's that lots
> of
> things that you'd typically expect to work are going to misbehave (one I
> discovered myself yesterday is that the GNU readline problems reported in
> interactive mode on Android also show up when you do either "LANG=C
> python2" or "LANG=C python3" on traditional Linux and attempt to *edit*
> lines containing multi-byte characters)

It occurs to me that (at least for readline... and maybe also as a
general proxy for whether the rest should be done) detecting the IUTF8
terminal flag (which, properly, controls basic non-readline-based line
editing such as backspace) may be worthwhile.

(And maybe Readline itself should be doing this, more or less
independent of Python. But that's a discussion for elsewhere)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-03-14 Thread Nick Coghlan
On 13 March 2017 at 23:31, Random832  wrote:

> On Mon, Mar 13, 2017, at 04:37, INADA Naoki wrote:
> > But locale coercing works nice on platforms like android.
> > So how about simplified version of PEP 538?  Just adding configure
> > option for locale coercing
> > which is disabled by default.  No envvar options and no warnings.
>
> A configure option just kicks the decision to packagers - either no-one
> uses it (and thus it solves nothing) or people do use it (and any
> problems it causes won't be mitigated at all)
>

Distro packagers have narrower user bases and a better known set of
compatibility constraints than upstream, so kicking platform integration
related config decisions downstream to us(/them) is actually a pretty
reasonable thing for upstream to do :)

For example, while I've been iterating on the reference implementation for
3.7, Charalampos Stratakis has been iterating on the backport patch for
Fedora 26, and he's found that we really need the PEP's "disable the C
locale warning" config option to turn off the CLI's coercion warning in
addition to the warning in the shared library, as leaving it visible breaks
build processes for other packages that check that there aren't any
messages being emitted to stderr (or otherwise care about the exact output
from build tools that rely on the system Python 3 runtime).

However, when it comes to choosing the upstream config defaults, it's
important to keep in mind that one of the explicit goals of the PEP is to
modify PEP 11 to *formally drop upstream support* for running Python 3 in
the legacy C locale without using PEP 538, PEP 540 or a combination of the
two to assume UTF-8 instead of ASCII for system interfaces.

It's not that you *can't* run Python 3 in that kind of environment, and
it's not that there are never any valid reasons to do so. It's that lots of
things that you'd typically expect to work are going to misbehave (one I
discovered myself yesterday is that the GNU readline problems reported in
interactive mode on Android also show up when you do either "LANG=C
python2" or "LANG=C python3" on traditional Linux and attempt to *edit*
lines containing multi-byte characters), so you really need to know what
you're doing in order to operate under those constraints.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-03-13 Thread INADA Naoki
On Mon, Mar 13, 2017 at 10:31 PM, Random832  wrote:
> On Mon, Mar 13, 2017, at 04:37, INADA Naoki wrote:
>> But locale coercing works nice on platforms like android.
>> So how about simplified version of PEP 538?  Just adding configure
>> option for locale coercing
>> which is disabled by default.  No envvar options and no warnings.
>
> A configure option just kicks the decision to packagers - either no-one
> uses it (and thus it solves nothing) or people do use it (and any
> problems it causes won't be mitigated at all)

Yes.  people who building Python understand about the platform than
users in most cases.

For android build, they know coercing is works well on android.

For Linux distros, they know the system supports locales like C.UTF-8
or not, and there are
any python- packages which may cause the problem and coercing solve it.

For people who building Python themselves (in docker, pyenv, etc...)
They knows how they
use the Python.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-03-13 Thread INADA Naoki
On Mon, Mar 13, 2017 at 8:01 PM, Nick Coghlan  wrote:
> On 13 March 2017 at 18:37, INADA Naoki  wrote:
>>
>> But locale coercing works nice on platforms like android.
>> So how about simplified version of PEP 538?  Just adding configure
>> option for locale coercing
>> which is disabled by default.  No envvar options and no warnings.
>
>
> That doesn't solve my original Linux distro problem, where locale
> misconfiguration problems show up as "Python 2 works, Python 3 doesn't work"
> behaviour and bug reports.

Sorry, I meant "PEP 540 + Simplified PEP 538 (coercing by configure option)".
distros can enable the configure option, off course.


>
> The problem is that where Python 2 was largely locale-independent by default
> (just passing raw bytes through) such that you'd only get immediate encoding
> or decoding errors if you had a Unicode literal or a decode() call somewhere
> in your code and would otherwise pass data corruption problems further down
> the chain, Python 3 is locale-*aware* by default, and eagerly decodes:
>
> - command line parameters
> - environment variables
> - responses from operating system API calls
> - standard stream input
> - file contents
>
> You *can* still write locale-independent Python 3 applications, but they
> involve sprinkling liberal doses of "b" prefixes and suffixes and mode
> settings and "surrogateescape" error handler declarations in various places
> - you can't just run python-modernize over a pre-existing Python 2
> application and expect it to behave the same way in the C locale as it did
> before.
>
> Once implemented, PEP 540 will partially solve the problem by introducing a
> locale independent UTF-8 mode, but that still leaves the inconsistency with
> other locale-aware components that are needing to deal with Python 3 API
> calls that accept or return Unicode objects where Python 2 allowed the use
> of 8-bit strings.

I feel problems PEP 538 solves, but PEP 540 doesn't solve are relatively small
compared with complexity introduced PEP 538.  As my understanding, PEP 538
solves problems only when:

* python executable is used.  (GUI applications linking Python for
plugin is not affected)
* One of C.UTF-8, C.utf8 or UTF8 is accepted for LC_CTYPE.
* The "locale aware components" uses something other than ASCII or
UTF-8 on C locale,
   but uses UTF-8 on UTF-8 locale.

Can't we reduce options from 3 (2 configure, 1 envvar) when PEP 540 is
accepted too?


>
> Folks that really want the old behaviour back will be able to set
> PYTHONCOERCECLOCALE=0 (as that no longer emits any warnings), or else build
> their own CPython from source using `--without-c-locale-coercion` and
> ``--without-c-locale-warning`. However, they'll also get the explicit
> support notification from PEP 11 that any Unicode handling bugs they run
> into in those configurations are entirely their own problem - we won't fix
> them, because we consider those configurations unsupportable in the general
> case.
>
> That puts the additional self-support burden on folks doing something
> unusual (i.e. insisting on running an ASCII-only environment in 2017),
> rather than on those with a more conventional use case (i.e. running an up
> to date \*nix OS using UTF-8 or another universal encoding for both local
> and remote interfaces).
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-03-13 Thread Nick Coghlan
On 13 March 2017 at 18:37, INADA Naoki  wrote:

> But locale coercing works nice on platforms like android.
> So how about simplified version of PEP 538?  Just adding configure
> option for locale coercing
> which is disabled by default.  No envvar options and no warnings.
>

That doesn't solve my original Linux distro problem, where locale
misconfiguration problems show up as "Python 2 works, Python 3 doesn't
work" behaviour and bug reports.

The problem is that where Python 2 was largely locale-independent by
default (just passing raw bytes through) such that you'd only get immediate
encoding or decoding errors if you had a Unicode literal or a decode() call
somewhere in your code and would otherwise pass data corruption problems
further down the chain, Python 3 is locale-*aware* by default, and eagerly
decodes:

- command line parameters
- environment variables
- responses from operating system API calls
- standard stream input
- file contents

You *can* still write locale-independent Python 3 applications, but they
involve sprinkling liberal doses of "b" prefixes and suffixes and mode
settings and "surrogateescape" error handler declarations in various places
- you can't just run python-modernize over a pre-existing Python 2
application and expect it to behave the same way in the C locale as it did
before.

Once implemented, PEP 540 will partially solve the problem by introducing a
locale independent UTF-8 mode, but that still leaves the inconsistency with
other locale-aware components that are needing to deal with Python 3 API
calls that accept or return Unicode objects where Python 2 allowed the use
of 8-bit strings.

Folks that really want the old behaviour back will be able to set
PYTHONCOERCECLOCALE=0 (as that no longer emits any warnings), or else build
their own CPython from source using `--without-c-locale-coercion` and
``--without-c-locale-warning`. However, they'll also get the explicit
support notification from PEP 11 that any Unicode handling bugs they run
into in those configurations are entirely their own problem - we won't fix
them, because we consider those configurations unsupportable in the general
case.

That puts the additional self-support burden on folks doing something
unusual (i.e. insisting on running an ASCII-only environment in 2017),
rather than on those with a more conventional use case (i.e. running an up
to date \*nix OS using UTF-8 or another universal encoding for both local
and remote interfaces).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-03-13 Thread INADA Naoki
> It seems to based on an assumption that the C locale is always some kind of
> pathology. Admittedly, it sometimes is a result of misconfiguration or a
> mistake. (But I don't see why it's the interpreter's job to correct such
> mistakes.) However, in some cases the C locale is a normal environment for
> system services, cron scripts, distro package builds and whatnot.

I think "C locale + use UTF-8 for stdio + fs" is common setup,
especially for servers.
It's not mistake or misconfiguration.  Perl, Ruby, Rust, Node.JS and
Go can use UTF-8
without any pain on C locale.  And current Python is painful for such cases.
So I strongly +1 for PEP 540 (UTF-8 mode).

On the other hand, PEP 538 is for for locale-dependent libraries (like
curses) and
subprocesses.
I agree C locale is misconfiguration if user want to use UTF-8 in
locale-dependent
libraries.  And I agree current PEP 538 seems carrying it a bit too far.

But locale coercing works nice on platforms like android.
So how about simplified version of PEP 538?  Just adding configure
option for locale coercing
which is disabled by default.  No envvar options and no warnings.

Regards,
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-03-12 Thread Nick Coghlan
On 12 March 2017 at 22:57, Nick Coghlan  wrote:

> However, I'm also open to having [PYTHONCOERCECLOCALE=0] also disable the
> runtime warning from the shared library.
>

Considering this a little further, I think this is going to be necessary in
order to sensibly handle the build time "--with[out]-c-locale-warning" flag
in the test suite.

Currently, there are a number of tests beyond the new ones in
Lib/test/test_locale_coercion.py that would need to know whether or not to
expect to see a warning in subprocesses in order to correctly handle the
"--without-c-locale-warning" case:
https://github.com/ncoghlan/cpython/commit/78c17a7cea04aed7cd1fce8ae5afb085a544a89c

If PYTHONCOERCECLOCALE=0 turned off the runtime warning as well, then the
behaviour of those tests would remain independent of the build flag as long
as they set the new environment variable in the child process - the warning
would be disabled either at build time via "--without-c-locale-warning" or
at runtime with "PYTHONCOERCECLOCALE=0".

The check for the runtime C locale warning would then be added to
_testembed rather than going through a normal Python subprocess, and that
test would be the only one that needed to know whether or not the locale
warning had been disabled at build time (which we could indicate simply by
compiling the embedding part of the test differently in that case).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-03-12 Thread Nick Coghlan
On 12 March 2017 at 08:36, Jakub Wilk  wrote:

> This is a very bad idea.
>
> It seems to based on an assumption that the C locale is always some kind
> of pathology. Admittedly, it sometimes is a result of misconfiguration or a
> mistake. (But I don't see why it's the interpreter's job to correct such
> mistakes.) However, in some cases the C locale is a normal environment for
> system services, cron scripts, distro package builds and whatnot.


An environment in which Python 3's eager decoding of operating system
provided values to Unicode fails.


> It's possible to write Python programs that are locale-agnostic.
>

If a program is genuinely locale-agnostic, it will be unaffected by this
PEP.


> It's also possible to write programs that are locale-dependent, but handle
> ASCII as locale encoding gracefully.
>

No, it is not generally feasible to write such programs in Python 3. That's
the essence of the problem, and why the PEP deprecates support for the
legacy C locale in Python 3.


> Or you might want to write a program that intentionally aborts with an
> explanatory error message when the locale encoding doesn't have sufficient
> Unicode coverage. ("Errors should never pass silently" anyone?)
>

This is what click does, but it only does it because that isn't possible
for click to do the right thing given Python 3's eager decoding of various
values as ASCII.


> With this proposal, none of the above seems possible to correctly
> implement in Python.
>

The first case remains unchanged, the other two will need to use Python 2.7
or Tauthon. I'm fine with that.


> * Nick Coghlan , 2017-03-05, 17:50:
>
> While this PEP ensures that developers that need to do so can still opt-in
>> to running their Python code in the legacy C locale,
>>
>
> Yeah, no, it doesn't.
>
> It's impossible do disable coercion from Python code, because it happens
> to early. The best you can do is to write a wrapper script in a different
> language that sets PYTHONCOERCECLOCALE=0; but then you still get a spurious
> warning.
>

It's not a spurious warning, as Python 3's Unicode handling for
environmental interactions genuinely doesn't work properly in the legacy C
locale (unless you're genuinely promising to only ever feed it ASCII
values, but that isn't a realistic guarantee to make).

However, I'm also open to having that particular setting also disable the
runtime warning from the shared library.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-03-11 Thread Jakub Wilk

This is a very bad idea.

It seems to based on an assumption that the C locale is always some kind of 
pathology. Admittedly, it sometimes is a result of misconfiguration or a 
mistake. (But I don't see why it's the interpreter's job to correct such 
mistakes.) However, in some cases the C locale is a normal environment for 
system services, cron scripts, distro package builds and whatnot.


It's possible to write Python programs that are locale-agnostic.
It's also possible to write programs that are locale-dependent, but handle 
ASCII as locale encoding gracefully.
Or you might want to write a program that intentionally aborts with an 
explanatory error message when the locale encoding doesn't have sufficient 
Unicode coverage. ("Errors should never pass silently" anyone?)


With this proposal, none of the above seems possible to correctly implement in 
Python.


* Nick Coghlan , 2017-03-05, 17:50:
Another common failure case is developers specifying ``LANG=C`` in order to 
see otherwise translated user interface messages in English, rather than the 
more narrowly scoped ``LC_MESSAGES=C``.


Setting LANGUAGE=en might be better, because it doesn't affect locale encoding 
either, and it works even when LC_ALL is set.



Three such locales will be tried:

* ``C.UTF-8`` (available at least in Debian, Ubuntu, and Fedora 25+, and 
expected to be available by default in a future version of glibc)

* ``C.utf8`` (available at least in HP-UX)
* ``UTF-8`` (available in at least some \*BSD variants)


Calling the C locale "legacy" is a bit unfair, when there's even no agreement 
what the name of the successor is supposed to be...


NB, both "C.UTF-8" and "C.utf8" work on Fedora, thanks to glibc normalizing the 
encoding part. Only "C.UTF-8" works on Debian, though, for whatever reason.


For ``C.UTF-8`` and ``C.utf8``, the coercion will be implemented by actually 
setting the ``LANG`` and ``LC_ALL`` environment variables to the candidate 
locale name,


Sounds wrong. This will override all LC_*, even if they were originally set to 
something different that C.


Python detected LC_CTYPE=C, LC_ALL & LANG set to C.UTF-8 (set another locale 
or PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour).


Comma splice.

s/set/was set/ would probably make it clearer.

Python detected LC_CTYPE=C, LC_CTYPE set to UTF-8 (set another locale or 
PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour).


Ditto.

The second sentence providing recommendations would be conditionally compiled 
based on the operating system (e.g. recommending ``LC_CTYPE=UTF-8`` on \*BSD 
systems.


Note that at least OpenBSD supports both "C.UTF-8" and "UTF-8" locales.

While this PEP ensures that developers that need to do so can still opt-in to 
running their Python code in the legacy C locale,


Yeah, no, it doesn't.

It's impossible do disable coercion from Python code, because it happens to 
early. The best you can do is to write a wrapper script in a different language 
that sets PYTHONCOERCECLOCALE=0; but then you still get a spurious warning.


--
Jakub Wilk
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-03-09 Thread Nick Coghlan
On 9 March 2017 at 07:58, Guido van Rossum  wrote:

> On Wed, Mar 8, 2017 at 4:35 AM, Nick Coghlan  wrote:
>
>>
>> On 5 March 2017 at 17:50, Nick Coghlan  wrote:
>>
>>> Late last year I started working on a change to the CPython CLI (*not*
>>> the shared library) to get it to coerce the legacy C locale to something
>>> based on UTF-8 when a suitable locale is available.
>>>
>>> After a couple of rounds of iteration on linux-sig and python-ideas, I'm
>>> now bringing it to python-dev as a concrete proposal for Python 3.7.
>>>
>>
>> In terms of resolving this PEP, if Guido doesn't feel inclined to wade
>> into the intricacies of legacy C locale handling, Barry has indicated he'd
>> be happy to act as BDFL-Delegate :)
>>
>
> Hi Nick and Barry, I'd very much appreciate if you two could resolve this
> without involving me.
>

OK, I've added Barry to the PEP as BDFL-Delegate:
https://github.com/python/peps/commit/4c46c5710031cac03a8d1ab7639272957998a1cc

Thanks for the quick response!

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-03-08 Thread Guido van Rossum
On Wed, Mar 8, 2017 at 4:35 AM, Nick Coghlan  wrote:

>
> On 5 March 2017 at 17:50, Nick Coghlan  wrote:
>
>> Late last year I started working on a change to the CPython CLI (*not*
>> the shared library) to get it to coerce the legacy C locale to something
>> based on UTF-8 when a suitable locale is available.
>>
>> After a couple of rounds of iteration on linux-sig and python-ideas, I'm
>> now bringing it to python-dev as a concrete proposal for Python 3.7.
>>
>
> In terms of resolving this PEP, if Guido doesn't feel inclined to wade
> into the intricacies of legacy C locale handling, Barry has indicated he'd
> be happy to act as BDFL-Delegate :)
>

Hi Nick and Barry, I'd very much appreciate if you two could resolve this
without involving me. Godspeed!

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-03-08 Thread Nick Coghlan
On 5 March 2017 at 17:50, Nick Coghlan  wrote:

> Hi folks,
>
> Late last year I started working on a change to the CPython CLI (*not* the
> shared library) to get it to coerce the legacy C locale to something based
> on UTF-8 when a suitable locale is available.
>
> After a couple of rounds of iteration on linux-sig and python-ideas, I'm
> now bringing it to python-dev as a concrete proposal for Python 3.7.
>

In terms of resolving this PEP, if Guido doesn't feel inclined to wade into
the intricacies of legacy C locale handling, Barry has indicated he'd be
happy to act as BDFL-Delegate :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-03-06 Thread Nick Coghlan
On 6 March 2017 at 00:39, INADA Naoki  wrote:
> I prefer just "locale-aware" / "locale-independent" (application |
> library | function)
> to "locale-aware C/C++ application" / "C/C++ independent" here.

Good point, I'll fix that in the next update.

> Backporting to Python 3.6.0
> > ---
> >
> > If this PEP is accepted for Python 3.7, redistributors backporting the
> > change
> > specifically to their initial Python 3.6.0 release will be both allowed
> and
> > encouraged. However, such backports should only be undertaken either in
> > conjunction with the changes needed to also provide a suitable locale by
> > default, or else specifically for platforms where such a locale is
> already
> > consistently available.
> >
>
> If it's really encouraged, how about providing patch officially, or
> backport it in 3.6.2
> but disabled by default?
> Some Python users (including my company) uses pyenv or pythonz to
> build Python from source. This PEP and PEP 540 are important for them too.
>

For PEP 540, the changes are too intrusive to consider it a reasonable
candidate for backporting to an earlier feature release, so for that
aspect, we'll *all* be waiting for 3.7.

For this PEP, while it's deliberately unobtrusive to make it more
backporting friendly, 3.7 isn't *that* far away, and I didn't think to
seriously pursue this approach until well after the 3.6 beta deadline for
new features had passed. With it being clearly outside the normal bounds of
what's appropriate for a cross-platform maintenance release, that means the
only folks that can consider it for earlier releases are those building
their own binaries for more constrained target environments.

I can definitely make sure the patch is readily available for anyone that
wants to apply it to their own builds, though (I'll upload it to both the
Python tracker issue and the downstream Fedora Bugzilla entry).

I also wouldn't completely close the door on the idea of classifying the
change as a bug fix in CPython's handling of the C locale (and hence adding
to a latter 3.6.x feature release), but I think the time to pursue that
would be *after* we've had a chance to see how folks react to the
redistributor customizations. I *think* it will be universally positive
(because the status quo really is broken), but it also wouldn't be the
first time I've learned something new and confusing about the locale
subsystem only after releasing software that relied on an incorrect
assumption about it :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-03-05 Thread INADA Naoki
LGTM and I love this PEP and PEP 540.

Some comments:

...

> * PEP 540 proposes to entirely decouple CPython's default text encoding from
>   the C locale system in that case, allowing text handling inconsistencies
> to
>   arise between CPython and other C/C++ components running in the same
> process
>   and in subprocesses. This approach aims to make CPython behave less like a
>   locale-aware C/C++ application, and more like C/C++ independent language
>   runtimes like the JVM, .NET CLR, Go, Node.js, and Rust

I prefer just "locale-aware" / "locale-independent" (application |
library | function)
to "locale-aware C/C++ application" / "C/C++ independent" here.

Both of Rust and Node.JS are linked with libc.  And Node.JS (v8) is
written in C++.
They just demonstrates many people prefer "always UTF-8" to "LC_CTYPE aware
encoding" in real world application.

And C/C++ can be used for locale-aware and locale-independent application.
I can print "こんにちは、世界" in C locale, because stdio is byte transparent.
There are many locale independent libraries written in C (zlib, libjpeg, etc..),
and some functions in libc are locale-independent or LC_CTYPE independent
(printf is locale-aware, but it uses LC_NUMERIC, not LC_CTYPE).

...


> Backporting to Python 3.6.0
> ---
>
> If this PEP is accepted for Python 3.7, redistributors backporting the
> change
> specifically to their initial Python 3.6.0 release will be both allowed and
> encouraged. However, such backports should only be undertaken either in
> conjunction with the changes needed to also provide a suitable locale by
> default, or else specifically for platforms where such a locale is already
> consistently available.
>

If it's really encouraged, how about providing patch officially, or
backport it in 3.6.2
but disabled by default?
Some Python users (including my company) uses pyenv or pythonz to
build Python from source. This PEP and PEP 540 are important for them too.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-03-04 Thread Nick Coghlan
Hi folks,

Late last year I started working on a change to the CPython CLI (*not* the
shared library) to get it to coerce the legacy C locale to something based
on UTF-8 when a suitable locale is available.

After a couple of rounds of iteration on linux-sig and python-ideas, I'm
now bringing it to python-dev as a concrete proposal for Python 3.7.

For most folks, reading the Abstract plus the draft docs updates in the
reference implementation will tell you everything you need to know (if the
C.UTF-8, C.utf8 or UTF-8 locales are available, the CLI will automatically
attempt to coerce the legacy C locale to one of those rather than
persisting with the latter's default assumption of ASCII as the preferred
text encoding).

However, the full PEP goes into a lot more detail on:

* exactly what's broken about CPython's behaviour in the legacy C locale
* why I'm in favour of this particular approach to fixing it (i.e. it
integrates better with other C/C++ components, as well as being amenable to
redistributor backports for 3.6, and environment based configuration for
3.5 and earlier)
* why I think implementing both this change *and* Victor's more
comprehensive "PYTHONUTF8 mode" proposal in PEP 540 will be better than
implementing just one or the other (in some situations, ignoring the
platform locale subsystem entirely really is the right approach, and that's
the aspect PEP 540 tackles, while this PEP tackles the situations where the
C locale behaviour is broken, but you still need to be consistent with the
platform settings).

Cheers,
Nick.

==
PEP: 538
Title: Coercing the legacy C locale to a UTF-8 based locale
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 28-Dec-2016
Python-Version: 3.7
Post-History: 03-Jan-2017 (linux-sig),
  07-Jan-2017 (python-ideas),
  05-Mar-2017 (python-dev)


Abstract


An ongoing challenge with Python 3 on \*nix systems is the conflict between
needing to use the configured locale encoding by default for consistency
with
other C/C++ components in the same process and those invoked in
subprocesses,
and the fact that the standard C locale (as defined in POSIX:2001) typically
implies a default text encoding of ASCII, which is entirely inadequate for
the
development of networked services and client applications in a multilingual
world.

PEP 540 proposes a change to CPython's handling of the legacy C locale such
that CPython will assume the use of UTF-8 in such environments, rather than
persisting with the demonstrably problematic assumption of ASCII as an
appropriate encoding for communicating with operating system interfaces.
This is a good approach for cases where network encoding interoperability
is a more important concern than local encoding interoperability.

However, it comes at the cost of making CPython's encoding assumptions
diverge
from those of other C and C++ components in the same process, as well as
those
of components running in subprocesses that share the same environment.

It also requires changes to the internals of how CPython itself works,
rather
than using existing configuration settings that are supported by Python
versions prior to Python 3.7.

Accordingly, this PEP proposes that independently of the UTF-8 mode proposed
in PEP 540, the way the CPython implementation handles the default C locale
be
changed such that:

* unless the new ``PYTHONCOERCECLOCALE`` environment variable is set to
``0``,
  the standalone CPython binary will automatically attempt to coerce the
``C``
  locale to the first available locale out of ``C.UTF-8``, ``C.utf8``, or
  ``UTF-8``
* if the locale is successfully coerced, and PEP 540 is not accepted, then
  ``PYTHONIOENCODING`` (if not otherwise set) will be set to
  ``utf-8:surrogateescape``.
* if the locale is successfully coerced, and PEP 540 *is* accepted, then
  ``PYTHONUTF8`` (if not otherwise set) will be set to ``1``
* if the subsequent runtime initialization process detects that the legacy
  ``C`` locale remains active (e.g. none of ``C.UTF-8``, ``C.utf8`` or
``UTF-8``
  are available, locale coercion is disabled, or the runtime is embedded in
an
  application other than the main CPython binary), and the ``PYTHONUTF8``
  feature defined in PEP 540 is also disabled (or not implemented), it  will
  emit a warning on stderr that use of the legacy ``C`` locale's default
ASCII
  text encoding may cause various Unicode compatibility issues

With this change, any \*nix platform that does *not* offer at least one of
the
``C.UTF-8``, ``C.utf8`` or ``UTF-8`` locales as part of its standard
configuration would only be considered a fully supported platform for
CPython
3.7+ deployments when either the new ``PYTHONUTF8`` mode defined in PEP 540
is
used, or else a suitable locale other than the default ``C`` locale is
configured explicitly (e.g. `en_AU.UTF-8`,