[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary

John Machin Sun, 07 Dec 2008 03:01:39 -0800

New submission from John Machin <[EMAIL PROTECTED]>:

Problem in the newline handling in io.py, class
IncrementalNewlineDecoder, method decode. It reads text files in 128-
byte chunks. Converting CR LF to \n requires special case handling
when '\r' is detected at the end of the decoded chunk in case
there's an LF at the start of the next chunk. It prepends b'\r' (only 1
byte) to the next chunk's raw bytes and decodes that. But \r in UTF-16
takes 2 bytes; we are now 1 byte out of kilter and various failures are
possible (including silently producing garbage output from a truncated
file with an odd number of bytes).


The attached script illustrates the problems.

----------
components: Interpreter Core
files: py30cr64bug.py
messages: 77219
nosy: sjmachin
severity: normal
status: open
title: reading UTF16-encoded text file crashes if \r on 64-char boundary
type: crash
versions: Python 3.0
Added file: http://bugs.python.org/file12260/py30cr64bug.py

_______________________________________
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue4574>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary

Reply via email to