STINNER Victor <vstin...@redhat.com> added the comment:

Ah, I can reproduce the bug on Fedora 29 using "LANG=en_IN ./python -m test -v 
test_re".

The problem is that locale.getlocale() is not reliable: it pretends that the 
locale encoding is ISO8859-1, whereas the real encoding is UTF-8:

$ LANG=en_IN ./python 
Python 3.8.0a2+ (heads/master:4cbea518a0, Feb 28 2019, 18:19:44) 
>>> chr(224).encode('ISO8859-1')
b'\xe0'
>>> import _testcapi
>>> _testcapi.DecodeLocaleEx(b'\xe0', 0, 'strict')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: decode error: pos=0, reason=decoding error

>>> import locale

# Wrong encoding
>>> locale.getlocale(locale.LC_CTYPE)
('en_IN', 'ISO8859-1')
>>> locale.setlocale(locale.LC_CTYPE, None)
'en_IN'
>>> locale._parse_localename('en_IN')
('en_IN', 'ISO8859-1')

# Real encoding
>>> locale.getpreferredencoding()
'UTF-8'
>>> locale.nl_langinfo(locale.CODESET)
'UTF-8'


Attached PR 12099 fix the issue.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue29571>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to