[issue20087] Mismatch between glibc and X11 locale.alias

2021-02-17 Thread Marc-Andre Lemburg


Marc-Andre Lemburg  added the comment:

I believe we can close this old issue.

The discussion was certainly a useful one. I guess we should stop updating the 
alias table automatically and instead add new aliases or change existing ones 
based on more research and using the X11 files as well as glibc and other 
resources to help.

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2018-05-06 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Thanks, Serhiy.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2018-05-06 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:


New changeset a55ac801f749a731250f3c7c1db7d546d22ae032 by Serhiy Storchaka in 
branch '2.7':
[2.7] bpo-20087: Update locale alias mapping with glibc 2.27 supported locales. 
(GH-6708). (GH-6717)
https://github.com/python/cpython/commit/a55ac801f749a731250f3c7c1db7d546d22ae032


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2018-05-06 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:


New changeset 6049bda21b607acc90bbabcc604997e794e8aee1 by Serhiy Storchaka 
(Miss Islington (bot)) in branch '3.7':
[3.7] bpo-20087: Update locale alias mapping with glibc 2.27 supported locales. 
(GH-6708) (GH-6713)
https://github.com/python/cpython/commit/6049bda21b607acc90bbabcc604997e794e8aee1


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2018-05-06 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:


New changeset b1c70d0ffbb235def1deab62a744ffd9b5253924 by Serhiy Storchaka 
(Miss Islington (bot)) in branch '3.6':
[3.6] bpo-20087: Update locale alias mapping with glibc 2.27 supported locales. 
(GH-6708) (GH-6714)
https://github.com/python/cpython/commit/b1c70d0ffbb235def1deab62a744ffd9b5253924


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2018-05-06 Thread Serhiy Storchaka

Change by Serhiy Storchaka :


--
pull_requests: +6409

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2018-05-05 Thread miss-islington

Change by miss-islington :


--
pull_requests: +6407

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2018-05-05 Thread miss-islington

Change by miss-islington :


--
pull_requests: +6406

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2018-05-05 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:


New changeset cedc9b74202d8c1ae39bca261cbb45d42ed54d45 by Serhiy Storchaka in 
branch 'master':
bpo-20087: Update locale alias mapping with glibc 2.27 supported locales. 
(ПР-6708)
https://github.com/python/cpython/commit/cedc9b74202d8c1ae39bca261cbb45d42ed54d45


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2018-05-05 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

Benjamin's patch did two things: 1) made the glibc alias table taking 
precedence over the X11 one; 2) updated the alias mapping with new glibc. The 
first part is controversial, but updating the alias mapping with new glibc is 
made regularly. PR 6708 updates it with glibc 2.27. This adds 39 new aliases 
and fixes issue32781 and issue33432.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2018-05-05 Thread Serhiy Storchaka

Change by Serhiy Storchaka :


--
keywords: +patch
pull_requests: +6401
stage: test needed -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2018-05-05 Thread Licht Takeuchi

Licht Takeuchi  added the comment:

Hi all,

The locale in the latest Ubuntu 18.04 contains en_IL as valid locale, but 
Python cannot resolve this.
This makes test failure in pandas.
https://github.com/pandas-dev/pandas/issues/20957

en_IL has significant impact because this is English locale and now supported 
in the latest Ubuntu. Is there any plan to add only en_IL?

(Note that I've already created the PR. ( 
https://github.com/python/cpython/pull/6707 ))

```
(pandas-dev) [pandas] locale -a
C
C.UTF-8
en_AG
en_AG.utf8
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
en_HK.utf8
en_IE.utf8
en_IL
en_IL.utf8
en_IN
en_IN.utf8
en_NG
en_NG.utf8
en_NZ.utf8
en_PH.utf8
en_SG.utf8
en_US.utf8
en_ZA.utf8
en_ZM
en_ZM.utf8
en_ZW.utf8
ja_JP.utf8
POSIX
```

--
nosy: +licht-t

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-24 Thread Benjamin Peterson

Benjamin Peterson added the comment:


New changeset 02371e0ed1ee82ec73e7d363bcf2ed40cde1397a by Benjamin Peterson in 
branch 'master':
make the glibc alias table take precedence over the X11 one (#422)
https://github.com/python/cpython/commit/02371e0ed1ee82ec73e7d363bcf2ed40cde1397a


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-24 Thread Benjamin Peterson

Benjamin Peterson added the comment:


New changeset df8280838f52d6ec45ba03ef734b0dec8a9c43fb by Benjamin Peterson in 
branch 'master':
bpo-20087: Revert "make the glibc alias table take precedence over the X11 one 
(#422)" (#713)
https://github.com/python/cpython/commit/df8280838f52d6ec45ba03ef734b0dec8a9c43fb


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-19 Thread Benjamin Peterson

Changes by Benjamin Peterson :


--
pull_requests: +633

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-18 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
pull_requests:  -602

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-17 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

The main purpose of the alias table is to support normalization and this is 
used for getdefaultencoding() which was created to be able to determine the 
default encoding based on what X.org uses as default without doing temporary 
setlocale() tricks.

Now, normalization also happens when passing a locale value to the underlying 
setlocale(), mainly to avoid many common bugs due to setlocale() being 
extremely picky about the locale value. A side effect of this is that 
normalization will also kick in to add the encoding in case no encoding is 
given in the parameter.

Note that no normalization is necessary to simply set the configured default 
locale configured on the system. In such a case, you'd run setlocale('LC_ALL') 
and get what's configured.

If you run the lib C setlocale() with a locale without encoding, the encoding 
used by the system entirely on what's configured on the system. The SUPPORTED 
file only gives a hint at what glibc think it should install per default, but 
any admin or distributor could change these settings simply by running 
localedef with some other encoding (charmap in locale speak).

I suppose that we could resolve some of the confusion by adding a parameter to 
disable this normalization in setlocale().

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-17 Thread Larry Hastings

Changes by Larry Hastings :


--
pull_requests: +602

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-10 Thread Benjamin Peterson

Benjamin Peterson added the comment:

I'm still confused about what getlocale() is supposed to do. Why do we attempt 
to return an encoding anyway if the underlying setlocale call doesn't return 
one? Is getlocale() not supposed to a simple wrapper over the C locale? If not, 
how is one supposed to get the encoding associated with the C locale?

The old alias table code meant that the encoding returned from getlocale() 
could be related to or completely unrelated to the actual C locale. 
Misunderstanding this results in issues like #29571.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-10 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I'm feeling there is something wrong with the current locale design. See issues 
issue504219, issue10466, issue20088, issue25191, issue29571.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-10 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

On 10.03.2017 08:37, Benjamin Peterson wrote:
> 
> Do you believe this program should work?
> 
> import locale, os
> for l in open("/usr/share/i18n/SUPPORTED"):
> alias, encoding = l.strip().split()
> locale.setlocale(locale.LC_ALL, alias)
> try:
> enc = locale.getlocale()[1]
> except ValueError:
> continue # not in table
> normalized = enc.replace("ISO", "ISO-"). \
>  replace("_", "-"). \
>  replace("euc", "EUC-"). \
>  replace("big5", "big5-").upper()
> assert normalized == locale.nl_langinfo(locale.CODESET)
> 
> After my change it does—the encoding returned from getlocale() is the one 
> actually being used by glibc. It fails dramatically on earlier versions of 
> Python (for example on the en_IN example from #29571.) I don't understand why 
> Python needs to editorialize whatever choices libc or the system 
> administrator has made.

Your program essentially tests what alias is configured
on your particular system. It will fail on older systems
(with a different or no version of SUPPORTED), it will fail on
systems that do not have all locales installed, it will
fail on systems that use the X.org aliases table as basis
rather than some list of supported locales of glibc, or
custom alias tables.

What we want in Python is a consistent mapping of aliases to locales
across all (Unix based) Python installations, just like what we
have for encoding aliases and those mappings should be taken
from a support alias database, not a list of default installations
on some glibc version.

Also note that a lot of these discussions are really academic,
since locales should always be specified with encoding.

While Unix gravitates to UTF-8 for all system related things,
users still use other encodings a lot for their daily operations,
as you can see in the X.org aliases file.

This is why defaulting to UTF-8 for locales (as e.g.
is done for many locales in the glibc default installs) is not
a good idea. Locales affect user work products. What's fine for
command line interfacing or piping, is not necessarily for
fine for e.g. documents created by users.

So to answer your question: No, I don't believe that SUPPORTED
has any authority for our purposes and thus don't think that
the program can be considered a valid test case.

The SUPPORTED file can server as extra resource for fixing bugs
in the table, but nothing more.

> Is getlocale() expected to return something different from the underlying C 
> locale?

getlocale() will return whatever is currently configured via
setlocale().

Of course, it can return something different from what some glibc
SUPPORTED lists as default installation encoding, if you don't provide
the encoding when using setlocale(), but it will always default
to the same locale and encoding on all platforms where you
run Python.

> In fact, why have this table at all instead of using nl_langinfo to return 
> the encoding for the current locale?

The table is meant to normalize locale names and enrich
them with default encodings from a well known database of
such aliases, where necessary. As mentioned above the locale setting
should ideally include the encoding as well, so that any such
guesses are not necessary.

Regarding nl_langinfo():

nl_langinfo() will only work if you have called
setlocale() already, since a process always starts up in
the C locale without this call.

If you don't have a problem with calling setlocale() for
testing the default locale settings (e.g. Python is not
embedded, you don't have other threads running, no
APIs which use locale information called yet, setlocale()
was already called to setup the locale, etc.),
you can use the approach taken by getpreferredencoding(),
which is to temporarily set the locale to the default.

Going forward, I think that the following changes make
sense:

* from ISO8859-1 to ISO8859-15 (the -15 version adds
  the Euro sign)

* casing changes e.g. 'zh_CN.gb2312' to 'zh_CN.GB2312'

* fixes which undo removal of modifiers such as
  'uz_uz@cyrillic' -> 'uz_UZ.UTF-8' to 'uz_UZ.UTF-8@cyrillic'

As for the other changes: please undo them and also
revert the unconditional use of glibc mappings overriding
the X.org ones, as mentioned earlier in the thread.

We can readd some of the modifications later on if there's
evidence that they actually do make sense.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-09 Thread Benjamin Peterson

Benjamin Peterson added the comment:

Do you believe this program should work?

import locale, os
for l in open("/usr/share/i18n/SUPPORTED"):
alias, encoding = l.strip().split()
locale.setlocale(locale.LC_ALL, alias)
try:
enc = locale.getlocale()[1]
except ValueError:
continue # not in table
normalized = enc.replace("ISO", "ISO-"). \
 replace("_", "-"). \
 replace("euc", "EUC-"). \
 replace("big5", "big5-").upper()
assert normalized == locale.nl_langinfo(locale.CODESET)

After my change it does—the encoding returned from getlocale() is the one 
actually being used by glibc. It fails dramatically on earlier versions of 
Python (for example on the en_IN example from #29571.) I don't understand why 
Python needs to editorialize whatever choices libc or the system administrator 
has made.

Is getlocale() expected to return something different from the underlying C 
locale?

In fact, why have this table at all instead of using nl_langinfo to return the 
encoding for the current locale?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-09 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

The original issue is issue29571. The locale module returned encoding ISO8859-1 
for locale en_IN (as in the X11 locale alias map), but glibc uses UTF-8 (as in 
glibc SUPPORT file).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-09 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

On 09.03.2017 11:47, Serhiy Storchaka wrote:
> 
> The SUPPORTED file from glibc is used for determining the default encoding  
> for locales that don't include it explicitly. For example en_IN uses UTF-8 
> rather than ISO8859-1.

No, the glibc locales don't say anything about default encodings
used in a locale:

http://manpages.ubuntu.com/manpages/wily/en/man5/locale.5.html

These encodings are just used for determining the default
set of locale.encoding variants to install on the system,
nothing more:

https://github.com/bminor/glibc/blob/73dfd088936b9237599e4ab737c7ae2ea7d710e1/localedata/Makefile#L204

glibc does have a locale.alias file:

https://github.com/bminor/glibc/blob/73dfd088936b9237599e4ab737c7ae2ea7d710e1/intl/locale.alias

which uses the X.org format, but this is completely out of
date and declared obsolete.

Serhiy: If you believe that there's anything authoritative about
the glibc SUPPORTED file in terms of defining the commonly
used encoding in a locale, please provide references. These
should also clarify why the glibc encoding is the correct one
compared to the X.org mapping.

It doesn't help, trying to interpret things into such build
files. We need a database that is being actively maintained
and has a track record of representing what people actually
use in their locales. The only one I know is the X.org one.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-09 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

The SUPPORTED file from glibc is used for determining the default encoding  for 
locales that don't include it explicitly. For example en_IN uses UTF-8 rather 
than ISO8859-1.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-09 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

> Why is the X11 locale alias map used at all? It seems like it can only create 
> confusion with libc.

Originally only the X11 locale alias map was used. The support of the glibc 
locale alias map was added 2.5 years ago (issue20079).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-09 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

On 09.03.2017 08:15, Benjamin Peterson wrote:
> 
> "eo_XX" is just something that appears in the X11 locale.alias file. My 
> change doesn't add that; it was already there. (for Esperanto, which I 
> suppose explains the "XX")

Yes, I know. That was an example of a bug in the X.org list.

> Most of the changes you identify the glibc aliases taking precedence over the 
> X11 ones. e.g., glibc has "fi_FI ISO-8859-1" while the X11 locale list has 
> "fi_FI.ISO8859-15". That seems correct to me as far as the intent of this 
> change is concerned.

No, it's not correct. ISO-8859-1 is the older version of Latin-1
without the Euro sign. ISO8859-15 adds it.

> How do you propose to pick and choose what we use from the X11 locale alias 
> list?

We have to go through the list one by one to check whether
the mapping update makes sense and is correct.

This will be difficult in a few cases where the glibc mapping
switches to UTF-8 from an ISO encoding. We'll have to find
evidence that this change does indeed make sense.

My take on this is that the X.org folks know better than the
glibc folks, since the former have to deal with end users that
rely on the locale settings a lot more than applications
using glibc for getting an initial locale setting right.

Also note that you are parsing the SUPPORTED file from
glibc (in slightly processed form):

https://github.com/bminor/glibc/blob/master/localedata/SUPPORTED

This file does not provide a locale alias mapping as
the routine in makelocalealias.py suggests. Instead it's
a list of locales to install by default:

https://github.com/bminor/glibc/blob/73dfd088936b9237599e4ab737c7ae2ea7d710e1/localedata/Makefile

In glibc you can define both the locale and the encoding separately
when creating a locale using localedef and the file simply provides
the default parameters to pass to this tool.

As such, I don't see how you can derive a default alias
meaning from the file.

It's simply an indication of what glibc would have installed
in case it were installed from source, but that's hardly ever
the case. On today's systems only a bare subset of locales
is installed and more added as necessary, so you rarely have
all the locales defined in SUPPORTED installed on a system.

So the file doesn't even provide a hint at what could
be installed on the system ("locale -a" gives you that list).

Here's the history:

https://github.com/bminor/glibc/commits/master/localedata/SUPPORTED

It's merely a list of additions and removals from the
default set. Nothing more. It does provide a list of
known and supported locales, but no usable or authoritative
encoding information (locales are defined using Unicode, so
the encoding is a parameter and not predefined).

Overall, I believe the file is pretty useless to use as
basis for an alias table providing encoding information.
It may provide some ideas for corrections, but should not
override the X.org one by default.

On the other hand, you have the local.alias master file:

https://cgit.freedesktop.org/xorg/lib/libX11/tree/nls/locale.alias.pre

together with the history of why changes were made and when.
This is an authoritative resource and people are making changes
against it from the user perspective.

I'd suggest to make the override optional in makelocalealias.py
via a command line switch and to use this for manually adding
or fixing X.org entries.

If you absolutely want to parse the glibc file per default as
well, please only let it add new entries, not override existing
ones. As we've seen in the patch, those overrides need to be
carefully reviewed.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-08 Thread Benjamin Peterson

Benjamin Peterson added the comment:

"eo_XX" is just something that appears in the X11 locale.alias file. My change 
doesn't add that; it was already there. (for Esperanto, which I suppose 
explains the "XX")

Most of the changes you identify the glibc aliases taking precedence over the 
X11 ones. e.g., glibc has "fi_FI ISO-8859-1" while the X11 locale list has 
"fi_FI.ISO8859-15". That seems correct to me as far as the intent of this 
change is concerned.

How do you propose to pick and choose what we use from the X11 locale alias 
list?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-08 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

Why was the PR merged while we were still discussing it ?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-08 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

On 08.03.2017 10:37, Serhiy Storchaka wrote:
> 
> The problem is that that table can get incorrect result for non-Linux 
> platforms (or for Linux with old glibc).

Sure, it's a best effort approach.

Also note that on today's systems you often don't have the full set of
locales available anymore - instead these have to either be installed
separately or generated on the target system.

Our locale database works on all these system, regardless of
what's installed or not.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-08 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

On 08.03.2017 07:27, Benjamin Peterson wrote:
> 
> Why is the X11 locale alias map used at all? It seems like it can only create 
> confusion with libc.

Because it was the only such maintained mapping available at the
time. It's also used for the X.org system, which has a rather strong
focus on user interfaces where locale matter a lot, unlike
the lib C :-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-08 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

The problem is that that table can get incorrect result for non-Linux platforms 
(or for Linux with old glibc).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-08 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

On 08.03.2017 08:20, Serhiy Storchaka wrote:
> 
> Serhiy Storchaka added the comment:
> 
> Not all platforms use glibc 2.24 as libc.

True. Many don't even use glibc.

> Ideally most of entries should even not exist. We should ask libc for the 
> default encoding if it is not included in the locale name. The aliases table 
> should be used only for mapping commonly used but unsupported by libc locales 
> to supported by libc locales.

I think you have a wrong understanding of what this alias table
is used for: we need it to determine the lib C compatible locale
name without using lib C APIs such as setlocale(), since these are
not thread safe and have side-effects for the whole process.

The alias table is there to avoid having to go to the lib C
to ask it indirectly for more details. Unfortunately, there are
no cross-platform lib C APIs which would allow querying these
details without also changing the local settings of the process.

I know that Python still plays the usual "save current locale,
run setlocale(), revert to previous locale" trick in a couple
of places and this works if Python is the only thread running,
but it doesn't when embedded into other applications.

Regarding the patch: we cannot simply use the output from the
script to set new values. The changes have to be manually
reviewed as well.

E.g. this entry in the table is clearly a typo:

'en_zw.utf8':   'en_ZS.UTF-8',

(it should read en_ZW.UTF-8)

This entry appears wrong as well:

'eo':   'eo_XX.ISO8859-3',

(XX is not a valid country ISO code)

How should we go about this ? Mark all the problems in the PR ?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-07 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Not all platforms use glibc 2.24 as libc.

Ideally most of entries should even not exist. We should ask libc for the 
default encoding if it is not included in the locale name. The aliases table 
should be used only for mapping commonly used but unsupported by libc locales 
to supported by libc locales.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-07 Thread Benjamin Peterson

Benjamin Peterson added the comment:

Why is the X11 locale alias map used at all? It seems like it can only create 
confusion with libc.

--
nosy: +benjamin.peterson

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-07 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

On 07.03.2017 18:23, Serhiy Storchaka wrote:
> 
> Serhiy Storchaka added the comment:
> 
>> 'cy_GB.ISO8859-1' to 'cy_GB.ISO8859-14'
> 
> Looks as just fixing an error. The default West-European ISO8859-1 is changed 
> to Celtic cy_GB.ISO8859-14. This looks better option for Welsh.
> 
>> 'tg_TJ.KOI8-C' to 'tg_TJ.KOI8-T'
> 
> KOI8-C is not supported by Python, but KOI8-T is supported. I don't know what 
> KOI8-C means, there are several rarely used incompatible encodings with this 
> name.

While all this may make sense, I'm missing some more reasoning
behind the differences between X.org and glibc.

This change also looks strange:

-'ka_ge':'ka_GE.GEORGIAN-ACADEMY',
+'ka_ge':'ka_GE.GEORGIAN_PS',
 'ka_ge.georgianacademy':'ka_GE.GEORGIAN-ACADEMY',
 'ka_ge.georgianps': 'ka_GE.GEORGIAN-PS',
 'ka_ge.georgianrs': 'ka_GE.GEORGIAN-ACADEMY',

Why is GEORGIAN_PS written with an underscore whereas the other
mappings use dashes ?

Or this one:

-'fi_fi':'fi_FI.ISO8859-15',
+'fi_fi':'fi_FI.ISO8859-1',

Why would a locale switch away from an encoding having
the Euro sign to one without it ?

Or why is this latin variant removed:

-'nan_tw@latin': 'nan_TW.UTF-8@latin',

Why should Russians switch back to ISO ?

-'ru_ru':'ru_RU.UTF-8',
+'ru_ru':'ru_RU.ISO8859-5',

or from ISO to KOI ?

-'russian':  'ru_RU.ISO8859-5',
+'russian':  'ru_RU.KOI8-R',

The more I look at these changes, the more I believe we
should not simply take everything we find in the files
for granted. They obviously both have bugs.

>> I also don't understand why some "xx.utf-8" locale mappings were removed - I 
>> don't think we should remove those, unless they are no longer needed due to 
>> some other logic implying these mappings.
> 
> The aliases table is a table of exceptions. Removed entries no longer are 
> exceptional.

It's not a table of exceptions, it's a table mapping commonly
used locale settings to ones which the lib C understands :-)

But regardless, I checked the code and it is already
smart enough to convert lib C incompatible spellings such
as "utf8" to "UTF-8", so these entries can indeed be
removed, but only if the locale is otherwise listed.

In some cases, it's probably better to drop the ".utf8"
to have more generic mappings, e.g.

+'bhb_in.utf8':  'bhb_IN.UTF-8',

or

 'de_li.utf8':   'de_LI.UTF-8',

though I'd expect that mapping to be:

 'de_li':   'de_LI.ISO8859-1',

as for all other "de" entries.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-07 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

> 'cy_GB.ISO8859-1' to 'cy_GB.ISO8859-14'

Looks as just fixing an error. The default West-European ISO8859-1 is changed 
to Celtic cy_GB.ISO8859-14. This looks better option for Welsh.

> 'tg_TJ.KOI8-C' to 'tg_TJ.KOI8-T'

KOI8-C is not supported by Python, but KOI8-T is supported. I don't know what 
KOI8-C means, there are several rarely used incompatible encodings with this 
name.

> I also don't understand why some "xx.utf-8" locale mappings were removed - I 
> don't think we should remove those, unless they are no lot needed due to some 
> other logic implying these mappings.

The aliases table is a table of exceptions. Removed entries no longer are 
exceptional.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-07 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

I agree that it's reasonable to have glibc's aliases override
the X.org ones, but this patch makes some pretty significant changes to 
Python's default assumptions with respect to default encodings for several 
locales.

While some changes obviously make sense (e.g. 'ca_AD.ISO8859-1' to 
'ca_AD.ISO8859-15'), others are less clear (e.g. 'cy_GB.ISO8859-1' to 
'cy_GB.ISO8859-14' or 'tg_TJ.KOI8-C' to 'tg_TJ.KOI8-T' or several of the moves 
from ISO encodings to UTF-8). Is there some reference for why glibc chose 
different values than X.org for these ?

I also don't understand why some "xx.utf-8" locale mappings were removed - I 
don't think we should remove those, unless they are no lot needed due to some 
other logic implying these mappings.

Since these are major changes, we need an appropriate warning in the NEWS file 
(and the "What's New" document), an update of the top comment (under "### 
Database") to mention that the glibc database takes precedence and where to 
find it,

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-06 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
stage:  -> test needed
versions: +Python 3.6, Python 3.7 -Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-03 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Needed a test for few common locales (en_IN, ru_RU) and maybe for unusual 
locales (uz_uz, uz_uz@cyrillic).

I would prefer to have a separate issue that updates the aliases table to glibc 
2.24.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2017-03-02 Thread Benjamin Peterson

Changes by Benjamin Peterson :


--
pull_requests: +351

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2014-09-30 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
versions: +Python 3.5 -Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20087
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2013-12-28 Thread Serhiy Storchaka

New submission from Serhiy Storchaka:

The locale module uses locale alias table derived from X11 locale.alias file 
for mapping bare locale names without encodings to locale names with encodings. 
However sometimes glibc default encoding for a locale differs from that used in 
X11 locale.alias.

Here is full differences table:

 GLibc X11 locale.alias

az_azaz_AZ.UTF-8   az_AZ.ISO8859-9E
ca_adca_AD.ISO8859-15  ca_AD.ISO8859-1
ca_frca_FR.ISO8859-15  ca_FR.ISO8859-1
ca_itca_IT.ISO8859-15  ca_IT.ISO8859-1
cy_gbcy_GB.ISO8859-14  cy_GB.ISO8859-1
en_inen_IN.UTF-8   en_IN.ISO8859-1
et_eeet_EE.ISO8859-1   et_EE.ISO8859-15
fi_fifi_FI.ISO8859-1   fi_FI.ISO8859-15
gd_gbgd_GB.ISO8859-15  gd_GB.ISO8859-1
hi_inhi_IN.UTF-8   hi_IN.ISCII-DEV
iu_caiu_CA.UTF-8   iu_CA.NUNACOM-8
iw_iliw_IL.ISO8859-8   he_IL.ISO8859-8
ka_geka_GE.GEORGIAN_PS ka_GE.GEORGIAN-ACADEMY
lo_lalo_LA.UTF-8   lo_LA.MULELAO-1
mi_nzmi_NZ.ISO8859-13  mi_NZ.ISO8859-1
nr_zanr_ZA.UTF-8   nr_ZA.ISO8859-1
nso_za   nso_ZA.UTF-8  nso_ZA.ISO8859-15
ru_ruru_RU.ISO8859-5   ru_RU.UTF-8
rw_rwrw_RW.UTF-8   rw_RW.ISO8859-1
sq_alsq_AL.ISO8859-1   sq_AL.ISO8859-2
ss_zass_ZA.UTF-8   ss_ZA.ISO8859-1
ta_inta_IN.UTF-8   ta_IN.TSCII-0
tg_tjtg_TJ.KOI8_T  tg_TJ.KOI8-C
th_thth_TH.TIS_620 th_TH.ISO8859-11
tn_zatn_ZA.UTF-8   tn_ZA.ISO8859-15
ts_zats_ZA.UTF-8   ts_ZA.ISO8859-1
tt_rutt_RU.UTF-8   tt_RU.TATAR-CYR
ur_pkur_PK.UTF-8   ur_PK.CP1256
uz_uzuz_UZ.ISO8859-1   uz_UZ.UTF-8
uz_uz@cyrillic   uz_UZ.UTF-8@cyrillic  uz_UZ.UTF-8
vi_vnvi_VN.UTF-8   vi_VN.TCVN
zh_cnzh_CN.GB2312  zh_CN.gb2312
zh_twzh_TW.BIG5zh_TW.big5
zh_tw.euctw  zh_TW.EUC_TW  zh_TW.eucTW

For example with the en_IN encoding:

 import locale, _locale
 _locale.setlocale(locale.LC_CTYPE)
'en_IN'
 locale.getlocale()
('en_IN', 'ISO8859-1')
 locale.nl_langinfo(locale.CODESET)
'UTF-8'
 locale.setlocale(locale.LC_CTYPE, locale.getlocale())
Traceback (most recent call last):
  File stdin, line 1, in module
  File /home/serhiy/py/cpython/Lib/locale.py, line 592, in setlocale
return _setlocale(category, locale)
locale.Error: unsupported locale setting

--
components: Library (Lib)
messages: 207025
nosy: lemburg, loewis, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Mismatch between glibc and X11 locale.alias
type: behavior
versions: Python 2.7, Python 3.3, Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20087
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20087] Mismatch between glibc and X11 locale.alias

2013-12-28 Thread Arfrever Frehtes Taifersar Arahesis

Changes by Arfrever Frehtes Taifersar Arahesis arfrever@gmail.com:


--
nosy: +Arfrever

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20087
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com