This is a very bad idea.
It seems to based on an assumption that the C locale is always some kind of
pathology. Admittedly, it sometimes is a result of misconfiguration or a
mistake. (But I don't see why it's the interpreter's job to correct such
mistakes.) However, in some cases the C locale is a normal environment for
system services, cron scripts, distro package builds and whatnot.
It's possible to write Python programs that are locale-agnostic.
It's also possible to write programs that are locale-dependent, but handle
ASCII as locale encoding gracefully.
Or you might want to write a program that intentionally aborts with an
explanatory error message when the locale encoding doesn't have sufficient
Unicode coverage. ("Errors should never pass silently" anyone?)
With this proposal, none of the above seems possible to correctly implement in
Python.
* Nick Coghlan <ncogh...@gmail.com>, 2017-03-05, 17:50:
Another common failure case is developers specifying ``LANG=C`` in order to
see otherwise translated user interface messages in English, rather than the
more narrowly scoped ``LC_MESSAGES=C``.
Setting LANGUAGE=en might be better, because it doesn't affect locale encoding
either, and it works even when LC_ALL is set.
Three such locales will be tried:
* ``C.UTF-8`` (available at least in Debian, Ubuntu, and Fedora 25+, and
expected to be available by default in a future version of glibc)
* ``C.utf8`` (available at least in HP-UX)
* ``UTF-8`` (available in at least some \*BSD variants)
Calling the C locale "legacy" is a bit unfair, when there's even no agreement
what the name of the successor is supposed to be...
NB, both "C.UTF-8" and "C.utf8" work on Fedora, thanks to glibc normalizing the
encoding part. Only "C.UTF-8" works on Debian, though, for whatever reason.
For ``C.UTF-8`` and ``C.utf8``, the coercion will be implemented by actually
setting the ``LANG`` and ``LC_ALL`` environment variables to the candidate
locale name,
Sounds wrong. This will override all LC_*, even if they were originally set to
something different that C.
Python detected LC_CTYPE=C, LC_ALL & LANG set to C.UTF-8 (set another locale
or PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour).
Comma splice.
s/set/was set/ would probably make it clearer.
Python detected LC_CTYPE=C, LC_CTYPE set to UTF-8 (set another locale or
PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour).
Ditto.
The second sentence providing recommendations would be conditionally compiled
based on the operating system (e.g. recommending ``LC_CTYPE=UTF-8`` on \*BSD
systems.
Note that at least OpenBSD supports both "C.UTF-8" and "UTF-8" locales.
While this PEP ensures that developers that need to do so can still opt-in to
running their Python code in the legacy C locale,
Yeah, no, it doesn't.
It's impossible do disable coercion from Python code, because it happens to
early. The best you can do is to write a wrapper script in a different language
that sets PYTHONCOERCECLOCALE=0; but then you still get a spurious warning.
--
Jakub Wilk
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com