[issue28180] sys.getfilesystemencoding() should default to utf-8

Sworddragon Sat, 07 Jan 2017 14:20:50 -0800

Sworddragon added the comment:

> $ cat badfilename.py 
> badfn = "こんにちは".encode('euc-jp').decode('utf-8', 'surrogateescape')
> print("bad filename:", badfn)
>
> $ PYTHONIOENCODING=utf-8:backslashreplace python3 badfilename.py 
> bad filename: \udca4\udcb3\udca4\udcf3\udca4ˤ\udcc1\udca4\udccf
>
> $ PYTHONIOENCODING=utf-8:surrogateescape python3 badfilename.py 
> bad filename: �����ˤ���


The first example is still readable (but effectively for an user not so much) 
while the second example appears to be not readable anymore at all. But the 
second example is actually technically still readable and there is no data 
loss, isn't it? As in this case it would probably not speak against 
surrogateescape for sys.stderr in UTF-8 non-strict mode. Otherwise 
backslashescape might be indeed the better choice.


I have thought about this a bit more and in case we go PEP 538 with keeping 
strict errors more or less the old way there might be another solution that 
could improve the overall issue: print() could get an option to allow changing 
the error handler on demand (with 'strict' still being the default).

Most things that I do output with print() are deterministic or optional and not 
important application data. Being able to print this information without caring 
for de-/encoding errors would mitigate this issue. In case application data is 
being printed where data loss is not desired exceptions can still be thrown.

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue28180>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue28180] sys.getfilesystemencoding() should default to utf-8

Reply via email to