On 5 September 2016 at 21:40, eryk sun <eryk...@gmail.com> wrote: > On Mon, Sep 5, 2016 at 7:54 PM, Steve Dower <steve.do...@python.org> wrote: >> On 05Sep2016 1234, eryk sun wrote: >>> It would probably be simpler to use UTF-16 in the main pipeline and >>> implement Martin's suggestion to mix in a UTF-8 buffer. The UTF-16 >>> buffer could be renamed as "wbuffer", for expert use. However, if >>> you're fully committed to transcoding in the raw layer, I'm certain >>> that these problems can be addressed with small buffers and using >>> Python's codec machinery for a flexible mix of "surrogatepass" and >>> "replace" error handling. >> >> I don't think it actually makes things simpler. Having two buffers is >> generally a bad idea unless they are perfectly synced, which would be >> impossible here without data corruption (if you read half a utf-8 character >> sequence and then read the wide buffer, do you get that character or not?). > > Martin's idea, as I understand it, is a UTF-8 buffer that reads from > and writes to the text wrapper.
Yes, that was basically it. Though I had only thought as far as simple encodings like ASCII, where one byte corresponds to one character. I wonder if you really need UTF-8 support. Are the encoding values currently encountered for Windows consoles all single-byte encodings or are they more complicated? > It necessarily consumes at least one > character and buffers it to allow reading per byte. Likewise for > writing, it buffers bytes until it can write a character to the text > wrapper. ISTM, it has to look for incomplete lead-continuation byte > sequences at the tail end, to hold them until the sequence is > complete, at which time it either decodes to a valid character or the > U+FFFD replacement character. This buffering behaviour would be necessary for a multi-byte encodings like UTF-8. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com