[issue30647] CODESET error on AMD64 FreeBSD 10.x Shared 3.x caused by the PEP 538

2017-06-29 Thread Nick Coghlan

Nick Coghlan added the comment:

I was able to fix the test_readline failure by restoring the locale based on 
the environment settings with `setlocale(LC_CTYPE, "")` rather than the return 
value from a preceding call to `setlocale(LC_CTYPE, NULL)`.

That means we can leave the runtime coercion checks enabled on *BSD systems, 
and if/when any given BSD variant adds working Linux-style C.UTF-8 or 
OS-X-style UTF-8 locales, we'll automatically start using them.

--
resolution:  -> fixed
stage:  -> resolved
status: open -> closed
type:  -> behavior

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30647] CODESET error on AMD64 FreeBSD 10.x Shared 3.x caused by the PEP 538

2017-06-29 Thread Nick Coghlan

Nick Coghlan added the comment:


New changeset 18974c35ad9d25ffea041dc0363dc01889f4a595 by Nick Coghlan in 
branch 'master':
bpo-30647: Check nl_langinfo(CODESET) in locale coercion (GH-2374)
https://github.com/python/cpython/commit/18974c35ad9d25ffea041dc0363dc01889f4a595


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30647] CODESET error on AMD64 FreeBSD 10.x Shared 3.x caused by the PEP 538

2017-06-25 Thread Nick Coghlan

Nick Coghlan added the comment:

Current status of the PR:

- testing suggests that "nl_langinfo(CODESET)" fails with LC_CTYPE=UTF-8 on Mac 
OS X as well, but that doesn't matter for Python start-up, since we hardcode 
UTF-8 as the locale encoding and never call nl_langinfo
- on Linux however, "nl_langingo(CODESET)" succeeds as expected

Accordingly, I've revised the tests as follows:

- on Linux and Mac OS X, having setlocale() succeed gets a locale added to the 
"available target locales" set for the tests. This reflects the fact that we 
skip the nl_langinfo(CODESET) check on Mac OS X, and expect it to always 
succeed on Linux if setlocale() succeeds
- on other platforms where "locale.nl_langinfo(locale.CODESET)" is supported, 
we only consider a locale an available target locale if that call returns a 
non-empty answer

At the locale coercion level, I've added an extra check where we save the 
initial locale (i.e. before we change anything), and if setlocale() succeeds, 
but nl_langinfo(CODESET) fails, we do setlocale(LC_CTYPE, initial_locale) to 
try to get things back to their original state.

This seems to *mostly* work on FreeBSD, but doesn't quite get readline back to 
where it is by default, so test_non_ascii in test_readline fails with the error:

```
==
FAIL: test_nonascii (test.test_readline.TestReadline)
--
Traceback (most recent call last):
  File 
"/usr/home/buildbot/python/custom.koobs-freebsd10/build/Lib/test/test_readline.py",
 line 203, in test_nonascii
self.assertIn(b"text 't\\xeb'\r\n", output)
AssertionError: b"text 't\\xeb'\r\n" not found in 
bytearray(b"^A^B^B^B^B^B^B^B\t\tx\t\r\n[\\357nserted]|t\x07\x08\x08\x08\x08\x08\x08\x08\x07\x07xrted]|t\x08\x08\x08\x08\x08\x08\x08\x07\r\nresult
 \'[\\udcefnsexrted]|t\'\r\nhistory \'[\\xefnsexrted]|t\'\r\n")

```

My two current guesses as to what may be going wrong there are:

* doing the equivalent of "setlocale(LC_CTYPE, setlocale(LC_CTYPE, NULL))" may 
be taking libc out of the weird initial state where it claims to be using 
ASCII, but is really using latin-1; or
* setting "surrogateescape" on "stdin" is causing some unexpected behaviour in 
the affected test case

I'm leaning towards the former, as if it was the latter, I'd expect to have 
already seen the same error *without* locale coercion.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30647] CODESET error on AMD64 FreeBSD 10.x Shared 3.x caused by the PEP 538

2017-06-24 Thread Nick Coghlan

Changes by Nick Coghlan :


--
pull_requests: +2421

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30647] CODESET error on AMD64 FreeBSD 10.x Shared 3.x caused by the PEP 538

2017-06-17 Thread Nick Coghlan

Changes by Nick Coghlan :


--
assignee:  -> ncoghlan

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30647] CODESET error on AMD64 FreeBSD 10.x Shared 3.x caused by the PEP 538

2017-06-15 Thread Nick Coghlan

Nick Coghlan added the comment:

Note that the coercion logic includes a runtime check to see if 
'setlocale(LC_CTYPE, "")' succeeds. That's how we skip over the 
non-existent C.UTF-8 and C.utf8 to get to "LC_CTYPE=UTF-8" on Mac OS X and 
FreeBSD.

That *appears* to work (and really does work on Mac OS X as far as CPython's 
test suite is concerned), but on FreeBSD we subsequently get the CODESET 
failure when we try to call `nl_langinfo` later in the interpreter startup 
process.

Victor's suggestion, which seems reasonable to me, is that we could also add 
the `nl_langinfo` call in the coercion logic, so that we never implicitly 
configure a locale setting that breaks nl_langinfo.

That way, instead of the interpreter failing to start, we'd just skip the 
locale coercion logic in that case (and update the test suite's expectations 
accordingly).

--
nosy: +ncoghlan

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30647] CODESET error on AMD64 FreeBSD 10.x Shared 3.x caused by the PEP 538

2017-06-13 Thread bapt

bapt added the comment:

More details here:
C.UTF-8 is a glibc only thing: 
https://sourceware.org/glibc/wiki/Proposals/C.UTF-8 not even mainstream.

The closest thing to a C locale with unicode would be to set everything to 
locale C but LC_CTYPE which would be set to en_US.UTF-8.

The problem is if your data for ctype comes from CLDR they are different per 
locales. On FreeBSD, Dragonfly and Illumos, we have extected it so LC_CTYPE is 
the same on all locales.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30647] CODESET error on AMD64 FreeBSD 10.x Shared 3.x caused by the PEP 538

2017-06-13 Thread Ned Deily

Ned Deily added the comment:

macOS is also BSD-like with regard to locales: it also does not have any C.* 
locales other than plain C.  See, for example, the discussion at bpo-18378.

--
nosy: +ned.deily

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30647] CODESET error on AMD64 FreeBSD 10.x Shared 3.x caused by the PEP 538

2017-06-13 Thread bapt

bapt added the comment:

Per POSIX, the C locale is only expected to be ASCII. C.UTF-8 is a linux only 
thing (actually I thought it was a debian only thing, but maybe not).

I was thinking about creating a C.utf8 locale on FreeBSD but it is not that 
simple to do (still doable and an interesting idea).

Note that if it fails here, it is probably due also failing on other OS. At 
minimum: Dragonfly and Illumos for sure, maybe NetBSD and OpenBSD as well.

haypo, do not hesitate to ping me on irc as usual if you want to discuss the 
issue.

--
nosy: +bapt

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30647] CODESET error on AMD64 FreeBSD 10.x Shared 3.x caused by the PEP 538

2017-06-13 Thread STINNER Victor

Changes by STINNER Victor :


--
nosy: +koobs

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30647] CODESET error on AMD64 FreeBSD 10.x Shared 3.x caused by the PEP 538

2017-06-13 Thread STINNER Victor

STINNER Victor added the comment:

On my FreeBSD 11 VM, I only have the "C" locale, not "UTF-8 C" locale:

[haypo@freebsd ~/prog/python/master]$ locale -a|grep ^C
C


But CPython still asks me to use a non existent locale (newlines added for 
readability):

[haypo@freebsd ~/prog/python/master]$ ./python

Python runtime initialized with LC_CTYPE=C (a locale with default ASCII 
encoding), which may cause Unicode compatibility problems. Using C.UTF-8, 
C.utf8, or UTF-8 (if available) as alternative Unicode-compatible locales is 
recommended.

Python 3.7.0a0 (heads/master:d79c1d4a94, Jun 13 2017, 10:59:23) 
[GCC 4.2.1 Compatible FreeBSD Clang 3.8.0 (tags/RELEASE_380/final 262564)] on 
freebsd11
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.setlocale(locale.LC_CTYPE, None)
'C'

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30647] CODESET error on AMD64 FreeBSD 10.x Shared 3.x caused by the PEP 538

2017-06-13 Thread STINNER Victor

New submission from STINNER Victor:

Regression caused by the commit 6ea4186de32d65b1f1dc1533b6312b798d300466, 
bpo-28180: Implementation for PEP 538.

http://buildbot.python.org/all/builders/AMD64%20FreeBSD%2010.x%20Shared%203.x/builds/412/steps/compile/logs/stdio

Python detected LC_CTYPE=C: LC_CTYPE coerced to UTF-8 (set another locale or 
PYTHONCOERCECLOCALE=0 to disable this locale coercion behavior).
Fatal Python error: Py_Initialize: Unable to get the locale encoding
ValueError: CODESET is not set or empty

Current thread 0x000802006400 (most recent call first):
Abort trap (core dumped)

--
components: Unicode
messages: 295870
nosy: ezio.melotti, haypo
priority: normal
severity: normal
status: open
title: CODESET error on AMD64 FreeBSD 10.x Shared 3.x caused by the PEP 538
versions: Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com