New submission from Nick Coghlan:

One problem with Unicode in 3.x is that surrogateescape isn't normally enabled 
on stdin and stdout. This means the following code will fail with 
UnicodeEncodeError in the presence of invalid filesystem metadata:

    print(os.listdir())

We don't really want to enable surrogateescape on sys.stdin or sys.stdout 
unilaterally, as it increases the chance of data corruption errors when the 
filesystem encoding and the IO encodings don't match.

Last night, Toshio and I thought of a possible solution: enable surrogateescape 
by default for sys.stdin and sys.stdout on non-Windows systems if (and only if) 
they're using the same codec as that returned by sys.getfilesystemencoding() 
(allowing for codec aliases rather than doing a simple string comparison)

This means that for full UTF-8 systems (which includes most modern Linux 
installations), roundtripping will be enabled by default between the standard 
streams and OS facing APIs, while systems where the encodings don't match will 
still fail noisily.

A more general alternative is also possible: default to errors='surrogatescape' 
for *any* text stream that uses the filesystem encoding. It's primarily the 
standard streams we're interested in fixing, though.

----------
messages: 194968
nosy: abadger1999, benjamin.peterson, ezio.melotti, haypo, lemburg, ncoghlan, 
pitrou
priority: normal
severity: normal
stage: needs patch
status: open
title: Enable surrogateescape on stdin and stdout when appropriate
type: enhancement
versions: Python 3.4

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue18713>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to