Re: [Python-Dev] PEP 528: Change Windows console encoding to UTF-8

Steve Dower Mon, 05 Sep 2016 12:57:06 -0700

On 05Sep2016 1234, eryk sun wrote:

Also, the console is UCS-2, which can't be transcoded between UTF-16
and UTF-8. Supporting UCS-2 in the console would integrate nicely with
the filesystem PEP. It makes it always possible to print
os.listdir('.'), copy and paste, and read it back without data loss.

Supporting UTF-8 actually works better for this. We already usesurrogatepass explicitly (on the filesystem side, with PEP 529) andimplicitly (on the console side, using the Windows conversion API).

It would probably be simpler to use UTF-16 in the main pipeline and
implement Martin's suggestion to mix in a UTF-8 buffer. The UTF-16
buffer could be renamed as "wbuffer", for expert use. However, if
you're fully committed to transcoding in the raw layer, I'm certain
that these problems can be addressed with small buffers and using
Python's codec machinery for a flexible mix of "surrogatepass" and
"replace" error handling.

I don't think it actually makes things simpler. Having two buffers isgenerally a bad idea unless they are perfectly synced, which would beimpossible here without data corruption (if you read half a utf-8character sequence and then read the wide buffer, do you get thatcharacter or not?).

Writing a partial character is easily avoidable by the user. We caneither fail with an error or print garbage, and currently printinggarbage is the most compatible behaviour. (Also occurs on Linux - I havea VM running this week for testing this stuff.)


Cheers,
Steve
_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 528: Change Windows console encoding to UTF-8

Reply via email to