[issue19846] Setting LANG=C breaks Python 3 on Linux

2013-12-09 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: And yet, in Python 2, people could do that, and Python didn't care. *That's* the regression I'm worried about. If it hadn't round-tripped cleanly in Python 2, I wouldn't care here either. $ python2.7 -c print u'\u20ac' € $ LANG=C python2.7 -c print

[issue19846] Setting LANG=C breaks Python 3 on Linux

2013-12-09 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: sworddragon@ubuntu:~$ LANG=C sworddragon@ubuntu:~$ ä bash: $'\303\244': command not found - The terminal doesn't pseudo-crash with an exception because it doesn't matter about encodings. - It allows to change the encoding at runtime. This is not a

[issue19846] Setting LANG=C breaks Python 3 on Linux

2013-12-09 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: The C locale is part of the ANSI C standard. The POSIX locale is an alias for the C locale and a POSIX standard, so we cannot just replace the ASCII encoding with UTF-8 as we wish, so Antoine's patch won't work. See e.g.

[issue19846] Setting LANG=C breaks Python 3 on Linux

2013-12-09 Thread STINNER Victor
STINNER Victor added the comment: I didn't understand Serhiy's ls example. I tried: $ mkdir unicode $ cd unicode $ python3 -c 'open(ab\xe9.txt, w).close()' $ python3 -c 'open(euro\u20ac.txt, w).close()' $ ls abé.txt euro€.txt $ LANG=C ls ab??.txt euro???.txt Ah yes, I didn't remember that

[issue19846] Setting LANG=C breaks Python 3 on Linux

2013-12-09 Thread STINNER Victor
STINNER Victor added the comment: Nick testing applications for POSIX compliance Sorry but what do you mean by POSIX compliance? The POSIX standard only specify the ASCII encoding. http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html The tables in Locale Definition describe

[issue19846] Setting LANG=C breaks Python 3 on Linux

2013-12-09 Thread STINNER Victor
STINNER Victor added the comment: Marc-Andre AFAIK, Python 3 does work with ASCII data in the C locale, so I'm not sure whether this is a bug at all. What do you mean? Python uses the surrogateescape encoding since Python 3.1, undecodable bytes are stored as surrogate characters. Many bugs

[issue19846] Setting LANG=C breaks Python 3 on Linux

2013-12-09 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 09.12.2013 11:19, STINNER Victor wrote: STINNER Victor added the comment: Marc-Andre AFAIK, Python 3 does work with ASCII data in the C locale, so I'm not sure whether this is a bug at all. What do you mean? Python uses the surrogateescape

[issue19846] Setting LANG=C breaks Python 3

2013-12-08 Thread Nick Coghlan
Changes by Nick Coghlan ncogh...@gmail.com: -- title: print() and write() are relying on sys.getfilesystemencoding() instead of sys.getdefaultencoding() - Setting LANG=C breaks Python 3 ___ Python tracker rep...@bugs.python.org

[issue19846] Setting LANG=C breaks Python 3

2013-12-08 Thread STINNER Victor
Changes by STINNER Victor victor.stin...@gmail.com: -- title: print() and write() are relying on sys.getfilesystemencoding() instead of sys.getdefaultencoding() - Setting LANG=C breaks Python 3 ___ Python tracker rep...@bugs.python.org

[issue19846] Setting LANG=C breaks Python 3

2013-12-08 Thread STINNER Victor
STINNER Victor added the comment: Or said differently, the filesystem encoding is different than the locale encoding. Indeed, but the FS encoding and the IO encoding are the same. locale encoding doesn't really matter here, as we are assuming that it's wrong. Oh, I realized that FS

[issue19846] Setting LANG=C breaks Python 3

2013-12-08 Thread Antoine Pitrou
Antoine Pitrou added the comment: On dim., 2013-12-08 at 22:22 +, STINNER Victor wrote: (b) for technical reasons, Python reuses the C codec during Python initialization to decode and encode OS data, and so currently Python *must* use the locale encoding for its filesystem encoding Ahhh!

[issue19846] Setting LANG=C breaks Python 3

2013-12-08 Thread STINNER Victor
STINNER Victor added the comment: It seems there is more work to do to get this right, but I'm not terribly interested either. Feel free to take over. If you are talking to me: I'm currently opposed to change anything, so I'm not interested to work on a patch. IMO Python works fine and you

[issue19846] Setting LANG=C breaks Python 3 on Linux

2013-12-08 Thread Nick Coghlan
Nick Coghlan added the comment: End users tripping over this by setting LANG=C is one of the pain points of Python 3 relative to Python 2 for Fedora, so I've added a couple of Fedora folks to the nosy list. My current understanding of the situation: - we should leave Windows and Mac OS X

[issue19846] Setting LANG=C breaks Python 3 on Linux

2013-12-08 Thread STINNER Victor
STINNER Victor added the comment: End users tripping over this by setting LANG=C is one of the pain points of Python 3 relative to Python 2 for Fedora, so I've added a couple of Fedora folks to the nosy list. Sorry, I'm not aware of such issue. Do you have examples? - the main problem is

[issue19846] Setting LANG=C breaks Python 3 on Linux

2013-12-08 Thread Nick Coghlan
Nick Coghlan added the comment: On 9 December 2013 12:08, STINNER Victor rep...@bugs.python.org wrote: STINNER Victor added the comment: End users tripping over this by setting LANG=C is one of the pain points of Python 3 relative to Python 2 for Fedora, so I've added a couple of Fedora

[issue19846] Setting LANG=C breaks Python 3 on Linux

2013-12-08 Thread Sworddragon
Sworddragon added the comment: You should keep things more simple: - Python and the operation system/filesystem are in a client-server relationship and Python should validate all. - It doesn't matter what you will finally decide to be the default encoding on various places - all will provide