Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

Nick Coghlan Tue, 14 Mar 2017 19:32:16 -0700

On 15 March 2017 at 06:22, Chris Barker <[email protected]> wrote:


> So the question nis -- is anyone counting on errors in this case? i.e., is
> a sysadmin thinking:
>
> "I want an ASCII-only system, so I'll set the locale, and now I can expect
> any program running on this system that is not ascii compatible to fail."
>
> I honestly don't know if this is common -- but I would argue that trying
> to run a unicode-aware program on an ASCII-only system could be considered
> a mis-configuration as well.
>

>From a mainstream Linux point of view, it's not common - on systemd-managed
systems, for example, the only way to get the C locale these days is to
either specify it in /etc/locale.conf, or to set it specifically in the
environment. Upstart was a little less reliable about that, and sysvinit
was less reliable still, but the trend is definitely towards making C.UTF-8
the assumed default, rather than "C". Even glibc itself would quite like to
get to a point where you only get the C locale if you explicitly ask for
it: https://sourceware.org/glibc/wiki/Proposals/C.UTF-8

The main practical objection that comes up in relation to "UTF-8
everywhere" isn't to do with UTF-8 per se, but rather with the size of the
collation tables needed to do "proper" sorting of Unicode code points.
However, there's a neat hack in the design of UTF-8 where sorting the
encoded bytes by byte value is equivalent to sorting the decoded text by
the Unicode code point values, which means that "LC_COLLATE=C" sorting by
byte value, and "LC_COLLATE=C.UTF-8" sorting by "Unicode code point value"
give the same results.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia

_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

Reply via email to