New submission from John Machin <[EMAIL PROTECTED]>: Problem in the newline handling in io.py, class IncrementalNewlineDecoder, method decode. It reads text files in 128- byte chunks. Converting CR LF to \n requires special case handling when '\r' is detected at the end of the decoded chunk in case there's an LF at the start of the next chunk. It prepends b'\r' (only 1 byte) to the next chunk's raw bytes and decodes that. But \r in UTF-16 takes 2 bytes; we are now 1 byte out of kilter and various failures are possible (including silently producing garbage output from a truncated file with an odd number of bytes).
The attached script illustrates the problems. ---------- components: Interpreter Core files: py30cr64bug.py messages: 77219 nosy: sjmachin severity: normal status: open title: reading UTF16-encoded text file crashes if \r on 64-char boundary type: crash versions: Python 3.0 Added file: http://bugs.python.org/file12260/py30cr64bug.py _______________________________________ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue4574> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com