STINNER Victor added the comment:
Antoine Pitrou added the comment:
Python uses the fact that the filesystem encoding is the locale
encoding in various places.
The patch doesn't change that.
Nick Coghlan added the comment:
Note that the *only* change Antoine's patch makes is that:
- *if*
Nick Coghlan added the comment:
Yes, that's the point. *Every* case I've seen where the locale encoding has
been reported as ASCII on a modern Linux system has been because the
environment has been configured to use the C locale, and that locale has a
silly, antiquated, encoding setting.
STINNER Victor added the comment:
2013/12/8 Nick Coghlan rep...@bugs.python.org:
Yes, that's the point. *Every* case I've seen where the locale encoding has
been reported as ASCII on a modern Linux system has been because the
environment has been configured to use the C locale, and that
Antoine Pitrou added the comment:
If you use a different encoding but only just for filenames, you will
get mojibake when you pass a filename on the command line or in an
environment varialble.
That's not what the patch does.
--
___
Python
STINNER Victor added the comment:
2013/12/8 Antoine Pitrou rep...@bugs.python.org:
Python uses the fact that the filesystem encoding is the locale
encoding in various places.
The patch doesn't change that.
You wrote: - With the patch: utf-8 utf-8 utf-8 ANSI_X3.4-1968, so
os.get
Serhiy Storchaka added the comment:
Setting sys.stderr encoding to UTF-8 on ASCII locale is wrong. sys.stderr has
the backslashreplace error handler by default, so it newer fails and should
newer produce non-ASCII data on ASCII locale.
--
nosy: +serhiy.storchaka
Larry Hastings added the comment:
Antoine: are you characterizing this as a bug rather than a new feature?
I'd like to see more of a consensus before something like this gets checked in.
Right now I see a variety of opinions.
When I think conservative approach and knows about system encoding
Antoine Pitrou added the comment:
Or said differently, the filesystem encoding is different than the
locale encoding.
Indeed, but the FS encoding and the IO encoding are the same.
locale encoding doesn't really matter here, as we are assuming that it's
wrong.
--
Nick Coghlan added the comment:
Victor, people set LANG=C for all sorts of reasons, and we have no
control over how operating systems define that locale. The user
perception is Python 3 doesn't work properly when you ssh into
systems, not Gee, I wish operating systems defined the C locale more
STINNER Victor added the comment:
haypo: title: Setting LANG=C breaks Python 3 - print() and write() are
relying on sys.getfilesystemencoding() instead of sys.getdefaultencoding()
Oh, I didn't want to change the title of the issue, it's a bug in Roundup when
I reply by email :-/
--
STINNER Victor added the comment:
If you want to avoid the encoding errors, you can also use
PYTHONIOENCODING=:replace or PYTHONIOENCODING=:backslashreplace in Python 3.4
to use the locale encoding, but use an error handler different than strict.
--
Sworddragon added the comment:
Using an environment variable is not the holy grail for this. On writing a
non-single-user application you can't expect the user to set extra environment
variables.
If compatibility is the only reason in my opinion it would be much better to
include something
Antoine Pitrou added the comment:
Using an environment variable is not the holy grail for this. On
writing a non-single-user application you can't expect the user to set
extra environment variables.
I am not understanding why the user would have to set anything at all.
What is the use case
Changes by Antoine Pitrou pit...@free.fr:
--
nosy: +ncoghlan
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19846
___
___
Python-bugs-list mailing
Nick Coghlan added the comment:
Antoine's suggestion of being a little more aggressive in choosing utf-8 over
ascii as the OS API encoding sounds reasonable to me.
I think we're getting to a point where a system claiming ASCII as the encoding
to use is almost certainly a misconfiguration
Antoine Pitrou added the comment:
Here is a patch.
$ LANG=C ./python -c import os, sys, locale;
print(sys.getfilesystemencoding(), sys.stdin.encoding, os.device_encoding(0),
locale.getpreferredencoding())
- Without the patch:
ascii ANSI_X3.4-1968 ANSI_X3.4-1968 ANSI_X3.4-1968
- With the
Changes by Serhiy Storchaka storch...@gmail.com:
--
nosy: +lemburg, loewis
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19846
___
___
STINNER Victor added the comment:
There was a previous try to use a file encoding different than the locale
encoding and it introduces too many issues:
https://mail.python.org/pipermail/python-dev/2010-October/104509.html
Inconsistencies if locale and filesystem encodings are different
Python
Antoine Pitrou added the comment:
Python uses the fact that the filesystem encoding is the locale
encoding in various places.
The patch doesn't change that.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19846
Nick Coghlan added the comment:
Note that the *only* change Antoine's patch makes is that:
- *if* the locale encoding is ASCII (or an alias for ASCII)
- *then* Python sets the filesystem encoding to UTF-8 instead
If the locale encoding is anything *other* than ASCII, then that will still be
Terry J. Reedy added the comment:
Unless there is an actually possibility of changing this, which I doubt since
it is a choice and not a bug, and changing might break things, this issue
should be closed.
--
nosy: +terry.reedy
___
Python tracker
Antoine Pitrou added the comment:
I think the ship has sailed on this. We can't change our heuristic everyone
someone finds a flaw in the current one.
In the long term, all sensible UNIX systems should be configured for utf-8
filenames and contents, so it won't make a difference anymore.
New submission from Sworddragon:
It seems that print() and write() (and maybe other of such I/O functions) are
relying on sys.getfilesystemencoding(). But these functions are not operating
with filenames but with their content. In the attachments is an example script
which demonstrates this
R. David Murray added the comment:
Victor can correct me if I'm wrong, but I believe that stdin/stdout/stderr all
use the filesystem encoding because filenames are the most likely source of
non-ascii characters on those streams. (Not a perfect solution, but the best
we can do.)
--
STINNER Victor added the comment:
Filesystem encoding is not a good name. You should read OS encoding or
maybe locale encoding.
This encoding is the best choice for interopability with other (python2 or
non python) programs. If you don't care of interoperabilty, force the
encoding using
25 matches
Mail list logo