New submission from Nick Coghlan: One problem with Unicode in 3.x is that surrogateescape isn't normally enabled on stdin and stdout. This means the following code will fail with UnicodeEncodeError in the presence of invalid filesystem metadata:
print(os.listdir()) We don't really want to enable surrogateescape on sys.stdin or sys.stdout unilaterally, as it increases the chance of data corruption errors when the filesystem encoding and the IO encodings don't match. Last night, Toshio and I thought of a possible solution: enable surrogateescape by default for sys.stdin and sys.stdout on non-Windows systems if (and only if) they're using the same codec as that returned by sys.getfilesystemencoding() (allowing for codec aliases rather than doing a simple string comparison) This means that for full UTF-8 systems (which includes most modern Linux installations), roundtripping will be enabled by default between the standard streams and OS facing APIs, while systems where the encodings don't match will still fail noisily. A more general alternative is also possible: default to errors='surrogatescape' for *any* text stream that uses the filesystem encoding. It's primarily the standard streams we're interested in fixing, though. ---------- messages: 194968 nosy: abadger1999, benjamin.peterson, ezio.melotti, haypo, lemburg, ncoghlan, pitrou priority: normal severity: normal stage: needs patch status: open title: Enable surrogateescape on stdin and stdout when appropriate type: enhancement versions: Python 3.4 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue18713> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com