Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

Jakub Wilk Sat, 11 Mar 2017 17:15:00 -0800

This is a very bad idea.

It seems to based on an assumption that the C locale is always some kind ofpathology. Admittedly, it sometimes is a result of misconfiguration or amistake. (But I don't see why it's the interpreter's job to correct suchmistakes.) However, in some cases the C locale is a normal environment forsystem services, cron scripts, distro package builds and whatnot.


It's possible to write Python programs that are locale-agnostic.

It's also possible to write programs that are locale-dependent, but handleASCII as locale encoding gracefully.Or you might want to write a program that intentionally aborts with anexplanatory error message when the locale encoding doesn't have sufficientUnicode coverage. ("Errors should never pass silently" anyone?)

With this proposal, none of the above seems possible to correctly implement inPython.


* Nick Coghlan <ncogh...@gmail.com>, 2017-03-05, 17:50:

Another common failure case is developers specifying ``LANG=C`` in order tosee otherwise translated user interface messages in English, rather than themore narrowly scoped ``LC_MESSAGES=C``.

Setting LANGUAGE=en might be better, because it doesn't affect locale encodingeither, and it works even when LC_ALL is set.

Three such locales will be tried:
* ``C.UTF-8`` (available at least in Debian, Ubuntu, and Fedora 25+, andexpected to be available by default in a future version of glibc)
* ``C.utf8`` (available at least in HP-UX)
* ``UTF-8`` (available in at least some \*BSD variants)

Calling the C locale "legacy" is a bit unfair, when there's even no agreementwhat the name of the successor is supposed to be...

NB, both "C.UTF-8" and "C.utf8" work on Fedora, thanks to glibc normalizing theencoding part. Only "C.UTF-8" works on Debian, though, for whatever reason.

For ``C.UTF-8`` and ``C.utf8``, the coercion will be implemented by actuallysetting the ``LANG`` and ``LC_ALL`` environment variables to the candidatelocale name,

Sounds wrong. This will override all LC_*, even if they were originally set tosomething different that C.

Python detected LC_CTYPE=C, LC_ALL & LANG set to C.UTF-8 (set another localeor PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour).


Comma splice.

s/set/was set/ would probably make it clearer.

Python detected LC_CTYPE=C, LC_CTYPE set to UTF-8 (set another locale orPYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour).


Ditto.

The second sentence providing recommendations would be conditionally compiledbased on the operating system (e.g. recommending ``LC_CTYPE=UTF-8`` on \*BSDsystems.


Note that at least OpenBSD supports both "C.UTF-8" and "UTF-8" locales.

While this PEP ensures that developers that need to do so can still opt-in torunning their Python code in the legacy C locale,


Yeah, no, it doesn't.

It's impossible do disable coercion from Python code, because it happens toearly. The best you can do is to write a wrapper script in a different languagethat sets PYTHONCOERCECLOCALE=0; but then you still get a spurious warning.


--
Jakub Wilk
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

Reply via email to