[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2018-11-28 Thread STINNER Victor


STINNER Victor  added the comment:

The initial bug has been fixed, I close the issue.

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2018-11-28 Thread STINNER Victor


STINNER Victor  added the comment:

See also bpo-28604: localeconv() doesn't support LC_MONETARY encoding different 
than LC_CTYPE encoding.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2018-10-17 Thread STINNER Victor


STINNER Victor  added the comment:

Victor:
> The technical issue here is that the libc has no "stateless" function to 
> process bytes and text with one specific locale.

Andreas Schwab:
> That's not true.  There is a rich set of *_l functions that take a locale_t 
> object and operate on that locale.

Oh. Do you want to work on a patch to use these functions? If yes, please open 
a new issue to enhance the code.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2018-01-28 Thread Andreas Schwab

Andreas Schwab  added the comment:

> The technical issue here is that the libc has no "stateless" function to 
> process bytes and text with one specific locale.

That's not true.  There is a rich set of *_l functions that take a locale_t 
object and operate on that locale.

--
nosy: +schwab

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2018-01-15 Thread STINNER Victor

STINNER Victor  added the comment:


New changeset 5f959c4f9eca404b8bc4bc6348fed27c4b907b89 by Victor Stinner in 
branch '3.6':
[3.6] bpo-31900: Fix localeconv() encoding for LC_NUMERIC (#4174) (#5192)
https://github.com/python/cpython/commit/5f959c4f9eca404b8bc4bc6348fed27c4b907b89


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2018-01-15 Thread STINNER Victor

Change by STINNER Victor :


--
pull_requests: +5046

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2018-01-15 Thread STINNER Victor

STINNER Victor  added the comment:

lc_numeric.py contains a typo, used fixed lc_numeric2.py instead to test my PR 
5191 which fixes decimal.Decimal.

--
Added file: https://bugs.python.org/file47386/lc_numeric2.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2018-01-15 Thread STINNER Victor

Change by STINNER Victor :


--
pull_requests: +5045

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2018-01-15 Thread STINNER Victor

STINNER Victor  added the comment:


New changeset cb064fc2321ce8673fe365e9ef60445a27657f54 by Victor Stinner in 
branch 'master':
bpo-31900: Fix localeconv() encoding for LC_NUMERIC (#4174)
https://github.com/python/cpython/commit/cb064fc2321ce8673fe365e9ef60445a27657f54


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2018-01-15 Thread STINNER Victor

STINNER Victor  added the comment:

On macOS 10.13.2, I failed to find any non-ASCII decimal_point or thousands_sep 
in localeconv(). I wrote a script to find all non-ASCII data in all locales:
https://github.com/vstinner/misc/blob/master/python/all_locales.py

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2018-01-15 Thread STINNER Victor

STINNER Victor  added the comment:

Test on Linux (Fedora 27, glibc 2.26):

locale.setlocale(locale.LC_ALL, "fr_FR")
locale.setlocale(locale.LC_NUMERIC, "es_MX.utf8")

It works as expected, result:

decimal_point: '.'
thousands_sep: '\u2009'

Python 3.6 returns mojibake:

decimal_point: '.'
thousands_sep: '\xe2\x80\x89'

Python 2.7 raw strings, thousands_sep = b'\xE2\x80\x89'.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2018-01-15 Thread STINNER Victor

STINNER Victor  added the comment:

I tested localeconv() with PR 4174 on FreeBSD:
--
locale.setlocale(locale.LC_ALL, "C")
locale.setlocale(locale.LC_NUMERIC, "ar_SA.UTF-8")
--

It works as expected, result:
--
decimal_point: '\u066b'
thousands_sep: '\u066c'
--

Compare it to Python 3.6 which returns mojibake, it seems like bytes are 
decoded from Latin1:
--
decimal_point: '\xd9\xab'
thousands_sep: '\xd9\xac'
--

Raw byte strings, Python 2.7:

* decimal_point: b'\xd9\xab'
* thousands_sep: b'\xd9\xac'

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2018-01-15 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Sounds like a good compromise :-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2018-01-15 Thread STINNER Victor

STINNER Victor  added the comment:

> I would not consider this a bug in Python, but rather in the locale settings 
> passed to setlocale().

Past 10 years, I repeated to every single user I met that "Python 3 is right, 
your system setup is wrong". But that's a waste of time. People continue to 
associate Python3 and Unicode to annoying bugs, because they don't understand 
how locales work.

Instead of having to repeat to each user that "hum, maybe your config is 
wrong", I prefer to support this non convential setup and work as expected ("it 
just works"). With my latest implementation, setlocale() is only done when 
LC_CTYPE and LC_NUMERIC are different, which is the corner case which 
"shouldn't occur in practice".

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2018-01-15 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Indeed. The major problem with all libc locale functions is that they are not 
thread safe. The GIL does help a bit protecting against corrupted data, though.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2018-01-15 Thread STINNER Victor

STINNER Victor  added the comment:

The technical issue here is that the libc has no "stateless" function to 
process bytes and text with one specific locale. All functions rely on the 
*current* locales. To decode byte strings, we use mbstowcs(), and this function 
relies on the current LC_CTYPE locale, whereas decimal_point and thousands_sep 
should be decoded from the current LC_NUMERIC locale.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2018-01-15 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Ok, it seems that the C setlocale() itself does not follow the conventions set 
forth for environment variables:

http://pubs.opengroup.org/onlinepubs/7908799/xsh/setlocale.html

(see the example at the bottom)

So the behavior shown by Python's setlocale() is fine.

However, that still doesn't magically make this work:

locale.setlocale(locale.LC_ALL, 'C.UTF-8')
locale.setlocale(locale.LC_NUMERIC, 'fr_FR.ISO8859-1')

If LC_NUMERIC uses a different encoding than LC_ALL, there's really no surprise 
in having numeric formatting fail. localeconv() will output the set encoding 
for the numeric string conversion and Python will decode this using the locale 
encoding set by LC_ALL. If those two are different, you run into problems.

I would not consider this a bug in Python, but rather in the locale settings 
passed to setlocale().

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2018-01-15 Thread STINNER Victor

STINNER Victor  added the comment:

Example of Fedora 27 and Python 3.6:

vstinner@apu$ env -i LC_NUMERIC=uk_UA.koi8u python3 -c 'import locale; 
print(locale.setlocale(locale.LC_ALL, "")); 
print(locale.getpreferredencoding(), 
ascii(locale.localeconv()["thousands_sep"]))'
LC_CTYPE=C.UTF-8;LC_NUMERIC=uk_UA.koi8u;LC_TIME=C;LC_COLLATE=C;LC_MONETARY=C;LC_MESSAGES=C;LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C;LC_IDENTIFICATION=C
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib64/python3.6/locale.py", line 110, in localeconv
d = _localeconv()
UnicodeDecodeError: 'locale' codec can't decode byte 0x9a in position 0: 
Invalid or incomplete multibyte or wide character

"env -i" starts Python in an empty environment. It seems like LC_CTYPE defaults 
to C.UTF-8 in this case.

* LC_CTYPE = C.UTF-8, encoding = UTF-8
* LC_NUMERIC = uk_UA.koi8u, encoding = KOI8-U


With my PR, it works:

vstinner@apu$ env -i LC_NUMERIC=uk_UA.koi8u ./python -c 'import locale; 
print(locale.setlocale(locale.LC_ALL, "")); 
print(locale.getpreferredencoding(), 
ascii(locale.localeconv()["thousands_sep"]))'
LC_CTYPE=C.UTF-8;LC_NUMERIC=uk_UA.koi8u;LC_TIME=C;LC_COLLATE=C;LC_MONETARY=C;LC_MESSAGES=C;LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C;LC_IDENTIFICATION=C
UTF-8 '\xa0'

=> thousands_sep byte string b'\x9A' is decoded as the Uniode string '\xa0'.


vstinner@apu$ env -i LC_NUMERIC=uk_UA.koi8u ./python -c 'import locale; 
locale.setlocale(locale.LC_ALL, ""); print(ascii(f"{1234:n}"))'
'1\xa0234'

=> the number is properly formatted


vstinner@apu$ env -i LC_NUMERIC=uk_UA.koi8u ./python -c 'import locale; 
locale.setlocale(locale.LC_ALL, ""); print(f"{1234:n}")'
1 234

It's possible to display the result using print().

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2018-01-15 Thread STINNER Victor

STINNER Victor  added the comment:

Marc-Andre Lemburg: "If you first set LC_ALL and then one of the other 
categories such as LC_NUMERIC, locale C functions will still use the LC_ALL 
setting for everything. LC_NUMERIC does not override the LC_ALL setting."

The root of this issue is 
https://bugzilla.redhat.com/show_bug.cgi?id=1484497#c0:

Petr Viktorin reproducer scripts uses Python locale.setlocale(), not 
environment variables:
https://gist.github.com/encukou/70b3d3f9ef3e29ac1dbc23a5f7bd6431
---
locale.setlocale(locale.LC_ALL, 'C.UTF-8')
locale.setlocale(locale.LC_NUMERIC, 'fr_FR.ISO8859-1')
---

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2018-01-15 Thread Stefan Krah

Stefan Krah  added the comment:

On Mon, Jan 15, 2018 at 12:37:28PM +, Marc-Andre Lemburg wrote:
> If you first set LC_ALL and then one of the other categories such as 
> LC_NUMERIC, locale C functions will still use the LC_ALL setting for 
> everything. LC_NUMERIC does not override the LC_ALL setting.

I have the exact same questions as Marc-Andre.  This is one of the reasons why I
blocked the _decimal change.  I don't fully understand the role of the new 
glibc,
since #7442 has existed for ages -- and it is a open question whether it is a 
bug
or not.

Both views are reasonable IMO.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2018-01-15 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

I just wanted to note that the description and title may cause a wrong 
interpretation of what should happen:

If you first set LC_ALL and then one of the other categories such as 
LC_NUMERIC, locale C functions will still use the LC_ALL setting for 
everything. LC_NUMERIC does not override the LC_ALL setting.

I tested this on OpenSUSE and get the same wrong results. Apparently, 
locale.localeconv() does not respect the above order. That's a bug.

I'm not sure whether the OP's quoted behavior is a bug, though, since if the 
locale encoding is not UTF-8, you cannot really expect using UTF-8 numeric 
separators to output correctly.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2018-01-15 Thread STINNER Victor

STINNER Victor  added the comment:

> Please confirm the bug without having LC_ALL or LANG set.

lc_numeric.py uses:

  locale.setlocale(locale.LC_ALL, "fr_FR")

Are you talking about that? What is the problem with this configuration?

I'm sure that there is a bug :-) You aren't able to reproduce it? What is your 
operating system?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2018-01-15 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Just FYI: LC_ALL has precedence over all other more specific LC_* settings:

http://pubs.opengroup.org/onlinepubs/7908799/xbd/envvar.html
http://man7.org/linux/man-pages/man7/locale.7.html

Please confirm the bug without having LC_ALL or LANG set. Thanks.

--
nosy: +lemburg

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2018-01-15 Thread STINNER Victor

STINNER Victor  added the comment:

Oops lc_numeric.py contains a typo:

d = decimal.Decimal(1234)
print("Decimal.__format__: %a" % f"{i:n}")

=> it should be f"{d:n}"

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2018-01-15 Thread STINNER Victor

STINNER Victor  added the comment:

Update: I pushed a large change to fix locale encodings in bpo-29240: commit 
7ed7aead9503102d2ed316175f198104e0cd674c.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2018-01-10 Thread STINNER Victor

STINNER Victor  added the comment:

I completed my change. It now fixes locale.localeconv(), str.format() for int, 
float, complex and decimal.Decimal:

vstinner@apu$ ./python lc_numeric.py 
LC_CTYPE: ('fr_FR', 'ISO8859-1')
LC_NUMERIC: ('es_MX', 'UTF-8')
decimal_point: '.'
thousands_sep: '\u2009'
grouping: [3, 3, 0]
int.__format__: '1\u2009234'
float.__format__: '1\u2009234'
complex.__format__: '1\u2009234+0j'
Decimal.__format__: '1\u2009234'

--
Added file: https://bugs.python.org/file47377/lc_numeric.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2017-12-18 Thread STINNER Victor

STINNER Victor  added the comment:

Oh. Another Python function is impacted by the bug, str.format:

$ env -i python3 -c 'import locale; locale.setlocale(locale.LC_ALL, "fr_FR"); 
locale.setlocale(locale.LC_NUMERIC, "es_MX.utf8"); print(ascii(f"{1000:n}"))'
'1\xe2\x80\x89000'

It should be '1\u2009000' ('1', '\u2009', '000').

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2017-11-29 Thread Charalampos Stratakis

Charalampos Stratakis  added the comment:

Pinging here. Is there some way I can help to move the issue forward?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2017-10-30 Thread STINNER Victor

STINNER Victor  added the comment:

inconsistent_locale_encodings.py of closed issue #7442 is interesting: I copy 
it here.

--
Added file: https://bugs.python.org/file47246/inconsistent_locale_encodings.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2017-10-30 Thread STINNER Victor

STINNER Victor  added the comment:

Oh wow, this bug is older than what I expected :-) I added support for 
non-ASCII thousands separator in 2012:

https://bugs.python.org/issue13706#msg151733

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2017-10-30 Thread Stefan Krah

Stefan Krah  added the comment:

Same as #7442, I think.

--
nosy: +skrah

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2017-10-30 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

This is a duplicate of issue28604. See also issue25812.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31900] localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding

2017-10-30 Thread STINNER Victor

Change by STINNER Victor :


--
title: localeconv() should decide numeric fields from LC_NUMERIC encoding, not 
from LC_CTYPE encoding -> localeconv() should decode numeric fields from 
LC_NUMERIC encoding, not from LC_CTYPE encoding

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com