Nick Coghlan added the comment:

Stephen Turnbull suggested on python-dev that this was a bad idea, and after 
reconsidering the current behaviour in Python 2, I realised that setting 
surrogateescape and letting the terminal deal with the consequences is exactly 
what we want.

What confused me is that ls replaces the unknown characters with question marks 
in the C locale:

$ ls
ニコラス.txt
$ LANG=C ls
????????????.txt


Python 2 passes the bytes through, regardless of locale:

$ python -c "import os; print(os.listdir('.')[0])"
ニコラス.txt
$ LANG=C python -c "import os; print(os.listdir('.')[0])"
ニコラス.txt


Current Python 3 gets confused if the C locale is set, as the encoding on 
sys.stdout gets set to "ascii", which breaks roundtripping:

$ python3 -c "import os; print(os.listdir('.')[0])"
ニコラス.txt                                   
$ LANG=C python3 -c "import os; print(os.listdir('.')[0])"
Traceback (most recent call last):
  File "<string>", line 1, in <module>           
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-11: 
ordinal not in range(128)

However, Python 3.5 will already set "surrogateescape" on sys.stdout by 
default, reproducing the behaviour of *Python 2*, rather than the behaviour of 
ls:
$ LANG=C ~/devel/py3k/python -c "import os; print(os.listdir('.')[0])"
ニコラス.txt

----------
resolution:  -> rejected
stage:  -> resolved
status: open -> closed
type:  -> enhancement

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue22016>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to