[issue24968] Python 3 raises Unicode errors with the xxx.UTF-8 locale

2015-09-01 Thread Roberto Sánchez

Roberto Sánchez added the comment:

Ok, that makes sense, besides David pointed me about another opened issue that 
could help to solve cases like this: http://bugs.python.org/issue15216 If the 
encoding is wrong because the environment but we can change the initial stream 
encodings (in stdin/out) easily we have a powerful tool to adapt our scripts 
and patch broken locales like the generated with SSH sessions.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24968] Python 3 raises Unicode errors with the xxx.UTF-8 locale

2015-08-31 Thread R. David Murray

R. David Murray added the comment:

I believe there is at least one open issue about Python adopting utf8 as the 
default instead of ASCII, and in any case, several conversations about how to 
deal with all this better.  This is just one example of a class of issues 
caused by the ASCII/C posix default locale, in different contexts.

--
nosy: +r.david.murray

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24968] Python 3 raises Unicode errors with the xxx.UTF-8 locale

2015-08-31 Thread Nick Coghlan

Nick Coghlan added the comment:

Looking again at the *specific* bug report here, I'm moving the resolution to 
"out of date", as it's actually the one we addressed in 3.5 by enabling 
surrogateescape by default on all of the standard streams when the OS claims 
the locale encoding is ASCII, not just stderr: http://bugs.python.org/issue19977

That allows us to at least correctly roundtrip data, even if the OS has given 
has bad encoding settings.

The problem with forcing UTF-8 more generally when the OS claims ASCII is that 
it may be the wrong thing to do and result in data corruption, especially on 
systems using East Asian codecs. Querying /etc/locale.conf [1] instead of 
relying on the nominal glibc locale settings should reliably give us correct 
encoding/locale information on modern Linux systems in cases like this one, 
where SSH has forwarded mismatched locale settings from a client system to a 
server shell session.

Another issue with relevant background discussion is issue #23993, which 
speculated on extending the "default to surrogateescape" idea to all open() 
calls when glibc claims the locale encoding is ASCII.

[1] http://www.freedesktop.org/software/systemd/man/locale.conf.html

--
resolution: not a bug -> out of date

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24968] Python 3 raises Unicode errors with the xxx.UTF-8 locale

2015-08-31 Thread Roberto Sánchez

New submission from Roberto Sánchez:

System: Python 3.4.2 on Linux Fedora 22

This issues is strongly related with: http://bugs.python.org/issue19846 But It 
isn't exactly the same case.

When I connect from my Mac OSX (using Terminal.app) to a Linux host with Fedora 
through ssh, the terminal session is forced to the OSX locale (default behavior 
in Terminal.app):

[rob@fedora22 ~]$ locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=es_ES.UTF-8
LC_CTYPE="es_ES.UTF-8"
LC_NUMERIC="es_ES.UTF-8"
LC_TIME="es_ES.UTF-8"
LC_COLLATE="es_ES.UTF-8"
LC_MONETARY="es_ES.UTF-8"
LC_MESSAGES="es_ES.UTF-8"
LC_PAPER="es_ES.UTF-8"
LC_NAME="es_ES.UTF-8"
LC_ADDRESS="es_ES.UTF-8"
LC_TELEPHONE="es_ES.UTF-8"
LC_MEASUREMENT="es_ES.UTF-8"
LC_IDENTIFICATION="es_ES.UTF-8"
LC_ALL=

However the installed locales in Fedora are:

[rob@fedora22 ~]$ localectl list-locales
en_US
en_US.iso88591
en_US.iso885915
en_US.utf8   <-- This is the default one

And if a launch python3 I get:

[rob@fedora22 ~]$ python3
Python 3.4.2 (default, Jul  9 2015, 17:24:30) 
[GCC 5.1.1 20150618 (Red Hat 5.1.1-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, codecs, sys, locale
>>> locale.getpreferredencoding()
'ANSI_X3.4-1968'
>>> codecs.lookup(locale.getpreferredencoding()).name
'ascii'
>>> locale.getdefaultlocale()
('es_ES', 'UTF-8')
>>> sys.stdout.encoding
'ANSI_X3.4-1968'
>>> sys.getfilesystemencoding()
'ascii'
>>> print('España')
  File "", line 0

^
SyntaxError: 'ascii' codec can't decode byte 0xc3 in position 11: ordinal 
not in range(128)


So, If I'm understanding correctly, If the current locale is not supported by 
the system then python fallback to ascii.

I can understand this behavior when the supported locales and the current one 
has different encoding, but if both of them are 'utf-8' It sounds reasonable 
that locale.getpreferredencoding() is set to 'utf-8'.

This case is causing that programs with CLI (Command Line Interface) fails, if 
you are using a third party like click lib, a RuntimeException is thrown by the 
own lib, I learned it by the hard way, the python3 CLI programs need a valid 
encoding to deal with stdin/stdout, and in this case all systems seems 
correctly configured about the encoding, I mean, this is a real case, there is 
no manual locale config modification, IMHO the current behavior seems a bit 
strict.

--
components: Unicode
messages: 249390
nosy: ezio.melotti, haypo, rsc1975
priority: normal
severity: normal
status: open
title: Python 3 raises Unicode errors with the xxx.UTF-8 locale
type: behavior
versions: Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24968] Python 3 raises Unicode errors with the xxx.UTF-8 locale

2015-08-31 Thread STINNER Victor

STINNER Victor added the comment:

It's not a bug on Python, but a bug on your system.

> New submission from Roberto Sánchez:
>     [rob@fedora22 ~]$ locale
>     locale: Cannot set LC_CTYPE to default locale: No such file or directory

This message means that the chosen locale doesn't exist.

>     LANG=es_ES.UTF-8
...
>     [rob@fedora22 ~]$ localectl list-locales
> 
>     en_US.utf8       <-- This is the default one

LANG must be en_US.utf8.

--
resolution:  -> not a bug
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24968] Python 3 raises Unicode errors with the xxx.UTF-8 locale

2015-08-31 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
nosy: +lemburg, loewis, ncoghlan, serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24968] Python 3 raises Unicode errors with the xxx.UTF-8 locale

2015-08-31 Thread Roberto Sánchez

Roberto Sánchez added the comment:

OK, I already knew that "It is not a bug", but the scenario seems quite common, 
connection to a Linux host from a Mac with Terminal.app and different locales 
(default behavior), so a bit of "magic" when the locale's encoding part is 
correct would help to deal with some Unicode issues in python3 scripts.

I just say that It would be a desirable enhancement, but I have no idea how to 
complex can be to change the current behavior, maybe It isn't worth the effort.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24968] Python 3 raises Unicode errors with the xxx.UTF-8 locale

2015-08-31 Thread Nick Coghlan

Nick Coghlan added the comment:

CPython inherits this behaviour from glibc's locale handling, so it's 
potentially worth raising the question further upstream. If anyone wanted to 
pursue that, looking at http://www.gnu.org/software/libc/development.html 
suggests to me that the appropriate starting point would be to email 
libc-h...@sourceware.org and ask for advice.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com