[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary
Antoine Pitrou pit...@free.fr added the comment: Committed to py3k and release30-maint in r67760 and r67759. Needs backporting to 2.x. -- priority: release blocker - normal ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4574 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary
Antoine Pitrou pit...@free.fr added the comment: Backported to trunk and 2.6.2 in r67762 and r67764. -- resolution: - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4574 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary
Antoine Pitrou pit...@free.fr added the comment: A couple of suggestions: - if IncrementalNewlineDecoder gets an encoding argument, it can also instantiate the decoder itself; that way the API is a bit simpler - to encode '\r' without the BOM, you can e.g. use an incremental encoder and encode it twice: enc = codecs.getincrementalencoder('utf16')('strict') enc.encode('\r') b'\xff\xfe\r\x00' enc.encode('\r') b'\r\x00' I think breaking the API can be ok since the original API is broken (witness this bug). There can even be a compatibility mode where we check whether `encoding` has an encode() method, and if it has, take it as the decoder object rather than the encoding name. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4574 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary
Changes by Antoine Pitrou pit...@free.fr: ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4574 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary
Antoine Pitrou pit...@free.fr added the comment: Here is a simpler patch with a different approach and a lot of tests. The advantage is that it doesn't break the API. Added file: http://bugs.python.org/file12344/utf16_newlines.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4574 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary
Antoine Pitrou pit...@free.fr added the comment: This new variant also removes the dangerous hack in getstate / setstate. Added file: http://bugs.python.org/file12345/utf16_newlines2.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4574 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary
Gregory P. Smith g...@krypto.org added the comment: utf16_newlines2.patch looks good to me. This is a data corruption issue. If it is deferred for 3.0.1 it must be fixed in 3.0.2. +1 on putting this in 3.0.1. -- assignee: - pitrou nosy: +gregory.p.smith priority: - release blocker ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4574 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary
STINNER Victor [EMAIL PROTECTED] added the comment: The bug is in IncrementalNewlineDecoder, not in the codec nor TextIOWrapper. -- nosy: +haypo ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue4574 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary
STINNER Victor [EMAIL PROTECTED] added the comment: Smaller example to demonstrate the problem. Added file: http://bugs.python.org/file12295/dec.py ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue4574 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary
Changes by STINNER Victor [EMAIL PROTECTED]: Removed file: http://bugs.python.org/file12295/dec.py ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue4574 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary
Changes by STINNER Victor [EMAIL PROTECTED]: Added file: http://bugs.python.org/file12296/incremental_newline_decoder_bug.py ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue4574 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary
STINNER Victor [EMAIL PROTECTED] added the comment: Here is a patch for test_io.py: check the problem by adding new encodings to TextIOWrapperTest.testNewlines(). -- keywords: +patch Added file: http://bugs.python.org/file12297/test_io.patch ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue4574 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary
STINNER Victor [EMAIL PROTECTED] added the comment: Ugly patch to fix this issue: - add more regression tests for charsets UTF-16*, UTF-32* - add mandatory argument encoding to io.IncrementalNewlineDecoder constructor = BREAK THE API - use the encoding the encode \r - most ulgy hack: strip the BOM for codecs UTF-16 and UTF-32 (when encoding \r to bytes) = I don't know how to encode \r without the BOM Added file: http://bugs.python.org/file12298/incremental_newline_decoder-2.patch ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue4574 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary
New submission from John Machin [EMAIL PROTECTED]: Problem in the newline handling in io.py, class IncrementalNewlineDecoder, method decode. It reads text files in 128- byte chunks. Converting CR LF to \n requires special case handling when '\r' is detected at the end of the decoded chunk in case there's an LF at the start of the next chunk. It prepends b'\r' (only 1 byte) to the next chunk's raw bytes and decodes that. But \r in UTF-16 takes 2 bytes; we are now 1 byte out of kilter and various failures are possible (including silently producing garbage output from a truncated file with an odd number of bytes). The attached script illustrates the problems. -- components: Interpreter Core files: py30cr64bug.py messages: 77219 nosy: sjmachin severity: normal status: open title: reading UTF16-encoded text file crashes if \r on 64-char boundary type: crash versions: Python 3.0 Added file: http://bugs.python.org/file12260/py30cr64bug.py ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue4574 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary
Changes by Antoine Pitrou [EMAIL PROTECTED]: -- nosy: +pitrou ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue4574 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com