[issue13643] 'ascii' is a bad filesystem default encoding

2017-12-18 Thread STINNER Victor
STINNER Victor added the comment: Follow-up: the PEP 538 (bpo-28180) and PEP 540 (bpo-29240) have been accepted and implemented in Python 3.7! -- ___ Python tracker

[issue13643] 'ascii' is a bad filesystem default encoding

2016-12-20 Thread Nick Coghlan
Nick Coghlan added the comment: Also see http://bugs.python.org/issue28180 for a more recent proposal to tackle this by coercing the C locale to the C.UTF-8 locale -- nosy: +ncoghlan ___ Python tracker

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-23 Thread Terry J. Reedy
Terry J. Reedy tjre...@udel.edu added the comment: Martin, after reading most all of the unusually large sequence of messages, I am closing this because three of the core developers with the most experience in this area are dead-set against your proposal. That does not make it 'wrong', but

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-23 Thread Martin Pool
Martin Pool m...@sourcefrog.net added the comment: Terry, that's fine. Thanks to everyone who contributed to the discussion. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13643 ___

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-22 Thread akira
Changes by akira 4kir4...@gmail.com: -- nosy: +akira ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13643 ___ ___ Python-bugs-list mailing list

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread vila
Changes by vila v.ladeuil+bugs-pyt...@free.fr: -- nosy: +vila ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13643 ___ ___ Python-bugs-list mailing

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Having more than one encoding on unix is already a reality, there's nothing to stop someone setting LANG=de_DE.UTF-8 and LC_MESSAGES=C say. Nope. The locale encoding is chosen using LC_ALL, LC_CTYPE or LANG variable: use the

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread Martin
Martin gzl...@googlemail.com added the comment: Nope. The locale encoding is chosen using LC_ALL, LC_CTYPE or LANG variable: use the first non-empty variable. LC_MESSAGES doesn't affect the encoding. Example: That's good to know, thanks. Only leaves the case where setlocale is called

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: it will still be passing values that can't be interpreted by other processes as you highlighed earlier. On UNIX, data going outside Python has be be encoded: you pass byte strings, not directly Unicode. Surrogates are encoded

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: But currently everything handling filenames as unicode on nix needs to worry about surrogates (that can't be encoded as ascii) already, or it will still be passing values that can't be interpreted by other processes as you highlighed

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread Martin Pool
Martin Pool m...@sourcefrog.net added the comment: On 21 December 2011 12:41, Antoine Pitrou rep...@bugs.python.org wrote: Antoine Pitrou pit...@free.fr added the comment: The standard encoding is UTF-8. How so? I don't know of any Linux or Unix spec which says so. If you get the Linux

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: It is a de facto, not de jure standard: UTF-8 is how things are typically stored. Other software (eg gnome file handling utilities) makes this assumption. See eg http://www.cl.cam.ac.uk/~mgk25/unicode.html#linux. So should we specifically

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: This discussion is becoming very long, I didn't remember the original purpose. You want to use UTF-8 instead of ASCII, so what? What do you want to do with your nicely well decoded filenames? You cannot print it to your terminal

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Nope. The locale encoding is chosen using LC_ALL, LC_CTYPE or LANG variable: use the first non-empty variable. LC_MESSAGES doesn't affect the encoding. Example: That's good to know, thanks. Only leaves the case where setlocale

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread Martin Pool
Martin Pool m...@sourcefrog.net added the comment: On 22 December 2011 11:21, STINNER Victor rep...@bugs.python.org wrote: This discussion is becoming very long, I didn't remember the original purpose. The proposal is that in some cases where Python currently assumes filenames are ascii on

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: On 22/12/2011 02:16, Martin Pool wrote: The proposal is that in some cases where Python currently assumes filenames are ascii on Linux, it ought to instead assume they are utf-8. Oh, I expected a use case describing the problem,

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread Martin Pool
Martin Pool m...@sourcefrog.net added the comment: On 22 December 2011 12:32, STINNER Victor rep...@bugs.python.org wrote: STINNER Victor victor.stin...@haypocalc.com added the comment: On 22/12/2011 02:16, Martin Pool wrote: The proposal is that in some cases where Python currently assumes

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: The problem as I see it is this: On Linux, filenames are generally (but not always) in UTF-8; people fairly commonly end up with no locale configured, which causes Python to decode filenames as ascii. It is easy for this to end

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread Martin Pool
Martin Pool m...@sourcefrog.net added the comment: On 22 December 2011 13:15, STINNER Victor rep...@bugs.python.org wrote: You cannot pass directly h\xe9.txt, but if you know the correct file system encoding, you can encode it explicitly using str.encode(utf-8). My recollection was that

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-21 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: _My_ locale is set properly. The problem is all the other people in the world who do not have their locale set to match their files on disk; telling them each to fix it is tedious. But perhaps the OS is the best place to address that,

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread Martin
New submission from Martin gzl...@googlemail.com: Currently when running Python on a non-OSX posix environment under either the C locale, or with an invalid or missing locale, it's not possible to operate using unicode filenames outside the ascii range. Using bytes works, as does reading

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: I'm not sure why having a locale set to C or something invalid should be considered a Python bug. You have to handle un-decodable filenames no matter what you do, since things aren't always encoded in utf-8 on non-OSX unix even when

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Currently when running Python on a non-OSX posix environment under either the C locale, or with an invalid or missing locale, it's not possible to operate using unicode filenames outside the ascii range. It was already

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: under either the C locale, or with an invalid or missing locale The right fix is to fix your locale, not Python. -- ___ Python tracker rep...@bugs.python.org

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread Martin
Martin gzl...@googlemail.com added the comment: I'm not sure why having a locale set to C or something invalid should be considered a Python bug. You have to handle un-decodable filenames no matter what you do, since things aren't always encoded in utf-8 on non-OSX unix even when that is

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread Martin
Martin gzl...@googlemail.com added the comment: It was already discussed: using a different encoding for filenames and for other things is really not a good idea. The main problem is the interaction with other programs. Yes, for many programs, a change like this will mean they create the

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread Martin Pool
Martin Pool m...@sourcefrog.net added the comment: I'm not sure why having a locale set to C or something invalid should be considered a Python bug. Programs like bzr that hit these problems can tell their users, either in the docs or an error message, change your locale to a UTF-8 one.

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: If there was a separate LC_FILENAMES then Python could respect that and insist people set it, but there isn't. During 1 month, we had PYTHONFSENCODING environment variable. It was not a good idea. Again: please read the

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: There are two problems with this: one is just the practical one that it scales poorly to have to tell every user to do this and to take them through working out how to set this in a way that covers cron jobs, daemons, things run

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread Martin Pool
Martin Pool m...@sourcefrog.net added the comment: On 21 December 2011 11:01, STINNER Victor rep...@bugs.python.org wrote: Again: please read the discussion (in closed issues) explaing why we removed it (and which problems it introduced). There's a lot of history, so I'm not sure exactly

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread Martin Pool
Martin Pool m...@sourcefrog.net added the comment: On 21 December 2011 11:26, STINNER Victor rep...@bugs.python.org wrote: I never checked which locale is used by default for programs called by cron. So I checked: on Fedora 16, programs start with a very few environment variables, and LANG

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: The main problem I see being discussed is that changing the encoding after Python starts would be dangerous, which I agree with, but we're not proposing to do that. Not after Python start. Using two encodings at the same would

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: I should not write comments so late :-p Not after Python start. Using two encodings at the same would just ... at the same time ... because I would like to inconsistency. because it would lead to inconsistencies --

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread Martin
Martin gzl...@googlemail.com added the comment: During 1 month, we had PYTHONFSENCODING environment variable. It was not a good idea. I strongly agree. There is no sense in having a separate configurable value, anyone who would think about using a PYTHONFSENCODING should just change their

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: So, you're complaining about something which works, kind of: $ touch héhé $ LANG=C python3 -c import os; print(os.listdir()) ['h\udcc3\udca9h\udcc3\udca9'] This makes robustly working with non-ascii filenames on different platforms needlessly

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread Martin Pool
Martin Pool m...@sourcefrog.net added the comment: Thanks for the example. Like you say, realistically, all data exchanged with other programs and with the system needs to be in the same encoding. (User document content may be in something else.) On modern systems, this problem is solved by

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread Martin Pool
Martin Pool m...@sourcefrog.net added the comment: On 21 December 2011 12:16, Antoine Pitrou rep...@bugs.python.org wrote: Antoine Pitrou pit...@free.fr added the comment: So, you're complaining about something which works, kind of: $ touch héhé $ LANG=C python3 -c import os;

[issue13643] 'ascii' is a bad filesystem default encoding

2011-12-20 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: The standard encoding is UTF-8. How so? I don't know of any Linux or Unix spec which says so. If you get the Linux heads to standardize this then I'll certainly be very happy (and countless others will, too). But AFAIK this it not the case and I