STINNER Victor added the comment:
Follow-up: the PEP 538 (bpo-28180) and PEP 540 (bpo-29240) have been accepted
and implemented in Python 3.7!
--
___
Python tracker
Nick Coghlan added the comment:
Also see http://bugs.python.org/issue28180 for a more recent proposal to tackle
this by coercing the C locale to the C.UTF-8 locale
--
nosy: +ncoghlan
___
Python tracker
Terry J. Reedy tjre...@udel.edu added the comment:
Martin, after reading most all of the unusually large sequence of messages, I
am closing this because three of the core developers with the most experience
in this area are dead-set against your proposal. That does not make it 'wrong',
but
Martin Pool m...@sourcefrog.net added the comment:
Terry, that's fine. Thanks to everyone who contributed to the discussion.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13643
___
Changes by akira 4kir4...@gmail.com:
--
nosy: +akira
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13643
___
___
Python-bugs-list mailing list
Changes by vila v.ladeuil+bugs-pyt...@free.fr:
--
nosy: +vila
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13643
___
___
Python-bugs-list mailing
STINNER Victor victor.stin...@haypocalc.com added the comment:
Having more than one encoding on unix is already a reality, there's nothing
to stop someone setting LANG=de_DE.UTF-8 and LC_MESSAGES=C say.
Nope. The locale encoding is chosen using LC_ALL, LC_CTYPE or LANG
variable: use the
Martin gzl...@googlemail.com added the comment:
Nope. The locale encoding is chosen using LC_ALL, LC_CTYPE or LANG
variable: use the first non-empty variable. LC_MESSAGES doesn't affect
the encoding. Example:
That's good to know, thanks. Only leaves the case where setlocale is called
STINNER Victor victor.stin...@haypocalc.com added the comment:
it will still be passing values that can't be
interpreted by other processes as you highlighed earlier.
On UNIX, data going outside Python has be be encoded: you pass byte strings,
not directly Unicode. Surrogates are encoded
R. David Murray rdmur...@bitdance.com added the comment:
But currently everything handling filenames as unicode on
nix needs to worry about surrogates (that can't be encoded
as ascii) already, or it will still be passing values that
can't be interpreted by other processes as you highlighed
Martin Pool m...@sourcefrog.net added the comment:
On 21 December 2011 12:41, Antoine Pitrou rep...@bugs.python.org wrote:
Antoine Pitrou pit...@free.fr added the comment:
The standard encoding is UTF-8.
How so? I don't know of any Linux or Unix spec which says so. If you get
the Linux
Antoine Pitrou pit...@free.fr added the comment:
It is a de facto, not de jure standard: UTF-8 is how things are
typically stored. Other software (eg gnome file handling utilities)
makes this assumption. See eg
http://www.cl.cam.ac.uk/~mgk25/unicode.html#linux.
So should we specifically
STINNER Victor victor.stin...@haypocalc.com added the comment:
This discussion is becoming very long, I didn't remember the original
purpose. You want to use UTF-8 instead of ASCII, so what? What do you
want to do with your nicely well decoded filenames? You cannot print it
to your terminal
STINNER Victor victor.stin...@haypocalc.com added the comment:
Nope. The locale encoding is chosen using LC_ALL, LC_CTYPE or LANG
variable: use the first non-empty variable. LC_MESSAGES doesn't affect
the encoding. Example:
That's good to know, thanks. Only leaves the case where setlocale
Martin Pool m...@sourcefrog.net added the comment:
On 22 December 2011 11:21, STINNER Victor rep...@bugs.python.org wrote:
This discussion is becoming very long, I didn't remember the original
purpose.
The proposal is that in some cases where Python currently assumes
filenames are ascii on
STINNER Victor victor.stin...@haypocalc.com added the comment:
On 22/12/2011 02:16, Martin Pool wrote:
The proposal is that in some cases where Python currently assumes
filenames are ascii on Linux, it ought to instead assume they are
utf-8.
Oh, I expected a use case describing the problem,
Martin Pool m...@sourcefrog.net added the comment:
On 22 December 2011 12:32, STINNER Victor rep...@bugs.python.org wrote:
STINNER Victor victor.stin...@haypocalc.com added the comment:
On 22/12/2011 02:16, Martin Pool wrote:
The proposal is that in some cases where Python currently assumes
STINNER Victor victor.stin...@haypocalc.com added the comment:
The problem as I see it is this:
On Linux, filenames are generally (but not always) in UTF-8; people
fairly commonly end up with no locale configured, which causes Python
to decode filenames as ascii. It is easy for this to end
Martin Pool m...@sourcefrog.net added the comment:
On 22 December 2011 13:15, STINNER Victor rep...@bugs.python.org wrote:
You cannot pass directly h\xe9.txt, but if you know the correct file
system encoding, you can encode it explicitly using str.encode(utf-8).
My recollection was that
R. David Murray rdmur...@bitdance.com added the comment:
_My_ locale is set properly. The problem is all the other
people in the world who do not have their locale set to match
their files on disk; telling them each to fix it is tedious.
But perhaps the OS is the best place to address that,
New submission from Martin gzl...@googlemail.com:
Currently when running Python on a non-OSX posix environment under either the C
locale, or with an invalid or missing locale, it's not possible to operate
using unicode filenames outside the ascii range. Using bytes works, as does
reading
R. David Murray rdmur...@bitdance.com added the comment:
I'm not sure why having a locale set to C or something invalid should be
considered a Python bug. You have to handle un-decodable filenames no matter
what you do, since things aren't always encoded in utf-8 on non-OSX unix even
when
STINNER Victor victor.stin...@haypocalc.com added the comment:
Currently when running Python on a non-OSX posix environment
under either the C locale, or with an invalid or missing locale,
it's not possible to operate using unicode filenames outside
the ascii range.
It was already
STINNER Victor victor.stin...@haypocalc.com added the comment:
under either the C locale, or with an invalid or missing locale
The right fix is to fix your locale, not Python.
--
___
Python tracker rep...@bugs.python.org
Martin gzl...@googlemail.com added the comment:
I'm not sure why having a locale set to C or something invalid should be
considered a Python bug. You have to handle un-decodable filenames no
matter what you do, since things aren't always encoded in utf-8 on non-OSX
unix even when that is
Martin gzl...@googlemail.com added the comment:
It was already discussed: using a different encoding for filenames and for
other things is really not a good idea. The main problem is the interaction
with other programs.
Yes, for many programs, a change like this will mean they create the
Martin Pool m...@sourcefrog.net added the comment:
I'm not sure why having a locale set to C or something invalid should be
considered a Python bug.
Programs like bzr that hit these problems can tell their users, either in the
docs or an error message, change your locale to a UTF-8 one.
STINNER Victor victor.stin...@haypocalc.com added the comment:
If there was a separate LC_FILENAMES then Python could respect
that and insist people set it, but there isn't.
During 1 month, we had PYTHONFSENCODING environment variable. It was not a good
idea. Again: please read the
STINNER Victor victor.stin...@haypocalc.com added the comment:
There are two problems with this: one is just the practical
one that it scales poorly to have to tell every user to do this
and to take them through working out how to set this in a way
that covers cron jobs, daemons, things run
Martin Pool m...@sourcefrog.net added the comment:
On 21 December 2011 11:01, STINNER Victor rep...@bugs.python.org wrote:
Again: please read the discussion (in closed issues) explaing why we removed
it (and which problems it introduced).
There's a lot of history, so I'm not sure exactly
Martin Pool m...@sourcefrog.net added the comment:
On 21 December 2011 11:26, STINNER Victor rep...@bugs.python.org wrote:
I never checked which locale is used by default for programs called by cron.
So I checked: on Fedora 16, programs start with a very few environment
variables, and LANG
STINNER Victor victor.stin...@haypocalc.com added the comment:
The main problem I see being discussed is that
changing the encoding after Python starts would
be dangerous, which I agree with, but we're not
proposing to do that.
Not after Python start. Using two encodings at the same would
STINNER Victor victor.stin...@haypocalc.com added the comment:
I should not write comments so late :-p
Not after Python start. Using two encodings at the same would just ...
at the same time
... because I would like to inconsistency.
because it would lead to inconsistencies
--
Martin gzl...@googlemail.com added the comment:
During 1 month, we had PYTHONFSENCODING environment variable. It was not a
good idea.
I strongly agree. There is no sense in having a separate configurable value,
anyone who would think about using a PYTHONFSENCODING should just change their
Antoine Pitrou pit...@free.fr added the comment:
So, you're complaining about something which works, kind of:
$ touch héhé
$ LANG=C python3 -c import os; print(os.listdir())
['h\udcc3\udca9h\udcc3\udca9']
This makes robustly working with non-ascii filenames on different
platforms needlessly
Martin Pool m...@sourcefrog.net added the comment:
Thanks for the example.
Like you say, realistically, all data exchanged with other programs
and with the system needs to be in the same encoding. (User document
content may be in something else.)
On modern systems, this problem is solved by
Martin Pool m...@sourcefrog.net added the comment:
On 21 December 2011 12:16, Antoine Pitrou rep...@bugs.python.org wrote:
Antoine Pitrou pit...@free.fr added the comment:
So, you're complaining about something which works, kind of:
$ touch héhé
$ LANG=C python3 -c import os;
Antoine Pitrou pit...@free.fr added the comment:
The standard encoding is UTF-8.
How so? I don't know of any Linux or Unix spec which says so. If you get
the Linux heads to standardize this then I'll certainly be very happy
(and countless others will, too). But AFAIK this it not the case and I
38 matches
Mail list logo