Toshio Kuratomi added the comment:
Ahh... added to the nosy list and bug closed all before I got up for the day ;-)
A few words:
I do think that python is broken here.
I do not think that translating everything to utf-8 if ascii is the locale's
encoding is the solution.
As I would state it, the problem is that python's boundary with the OS is not
yet uniform. If you set LC_ALL=C (note, LC_ALL=C is just one of multiple ways
to beak things. For instance, LC_ALL=en_US.utf8 when dealing with latin-1 data
will also break) then python will still *read* non-ascii data from the OS
through some interfaces but it won't output it back to the OS. ie:
$ mkdir unicode && cd unicode
$ python3 -c 'open("ñ.txt".encode("latin-1"), "w").close()'
$ LC_ALL=en_US.utf8 python3
>>> import os
>>> dir_listing = os.listdir('.')
>>> for entry in dir_listing: print(entry)
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf1' in position
0: surrogates not allowed
Note that currently, input() and sys.stdin.read() won't read undecodable data
so this is somewhat symmetrical but it seems to me that saying "everything that
interfaces with the OS except the standard streams will use surrogateescape on
undecodable bytes" is drawing a line in an unintuitive location.
(A further note to serhiy.storchaka.... Your examples are not showing anything
broken in other programs. xterm is refusing both input and output that is
non-ascii. This is symmetric behaviour. ls is doing its best to display a
*human-readable* representation of bytes that it cannot convert in the current
encoding. It also provides the -b switch to see the octal values if you
actually care. Think of this like opening a binary file in less or another
pager.)
(Further note for haypo -- On Fedora, the default of en_US is utf8, not
ISO8859-1.)
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue19846>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com