[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary

2008-12-14 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Committed to py3k and release30-maint in r67760 and r67759. Needs
backporting to 2.x.

--
priority: release blocker - normal

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4574
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary

2008-12-14 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Backported to trunk and 2.6.2 in r67762 and r67764.

--
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4574
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary

2008-12-13 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

A couple of suggestions:

- if IncrementalNewlineDecoder gets an encoding argument, it can also
instantiate the decoder itself; that way the API is a bit simpler

- to encode '\r' without the BOM, you can e.g. use an incremental
encoder and encode it twice:
 enc = codecs.getincrementalencoder('utf16')('strict')
 enc.encode('\r')
b'\xff\xfe\r\x00'
 enc.encode('\r')
b'\r\x00'


I think breaking the API can be ok since the original API is broken
(witness this bug). There can even be a compatibility mode where we
check whether `encoding` has an encode() method, and if it has, take it
as the decoder object rather than the encoding name.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4574
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary

2008-12-13 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4574
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary

2008-12-13 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Here is a simpler patch with a different approach and a lot of tests.
The advantage is that it doesn't break the API.

Added file: http://bugs.python.org/file12344/utf16_newlines.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4574
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary

2008-12-13 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

This new variant also removes the dangerous hack in getstate / setstate.

Added file: http://bugs.python.org/file12345/utf16_newlines2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4574
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary

2008-12-13 Thread Gregory P. Smith

Gregory P. Smith g...@krypto.org added the comment:

utf16_newlines2.patch looks good to me.

This is a data corruption issue.  If it is deferred for 3.0.1 it must be 
fixed in 3.0.2.

+1 on putting this in 3.0.1.

--
assignee:  - pitrou
nosy: +gregory.p.smith
priority:  - release blocker

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4574
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary

2008-12-08 Thread STINNER Victor

STINNER Victor [EMAIL PROTECTED] added the comment:

The bug is in IncrementalNewlineDecoder, not in the codec nor 
TextIOWrapper.

--
nosy: +haypo

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4574
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary

2008-12-08 Thread STINNER Victor

STINNER Victor [EMAIL PROTECTED] added the comment:

Smaller example to demonstrate the problem.

Added file: http://bugs.python.org/file12295/dec.py

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4574
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary

2008-12-08 Thread STINNER Victor

Changes by STINNER Victor [EMAIL PROTECTED]:


Removed file: http://bugs.python.org/file12295/dec.py

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4574
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary

2008-12-08 Thread STINNER Victor

Changes by STINNER Victor [EMAIL PROTECTED]:


Added file: http://bugs.python.org/file12296/incremental_newline_decoder_bug.py

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4574
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary

2008-12-08 Thread STINNER Victor

STINNER Victor [EMAIL PROTECTED] added the comment:

Here is a patch for test_io.py: check the problem by adding new 
encodings to TextIOWrapperTest.testNewlines().

--
keywords: +patch
Added file: http://bugs.python.org/file12297/test_io.patch

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4574
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary

2008-12-08 Thread STINNER Victor

STINNER Victor [EMAIL PROTECTED] added the comment:

Ugly patch to fix this issue:
 - add more regression tests for charsets UTF-16*, UTF-32*
 - add mandatory argument encoding to io.IncrementalNewlineDecoder 
constructor = BREAK THE API
 - use the encoding the encode \r
 - most ulgy hack: strip the BOM for codecs UTF-16 and UTF-32 (when 
encoding \r to bytes) = I don't know how to encode \r without the 
BOM

Added file: http://bugs.python.org/file12298/incremental_newline_decoder-2.patch

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4574
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary

2008-12-07 Thread John Machin

New submission from John Machin [EMAIL PROTECTED]:

Problem in the newline handling in io.py, class
IncrementalNewlineDecoder, method decode. It reads text files in 128-
byte chunks. Converting CR LF to \n requires special case handling
when '\r' is detected at the end of the decoded chunk in case
there's an LF at the start of the next chunk. It prepends b'\r' (only 1
byte) to the next chunk's raw bytes and decodes that. But \r in UTF-16
takes 2 bytes; we are now 1 byte out of kilter and various failures are
possible (including silently producing garbage output from a truncated
file with an odd number of bytes).

The attached script illustrates the problems.

--
components: Interpreter Core
files: py30cr64bug.py
messages: 77219
nosy: sjmachin
severity: normal
status: open
title: reading UTF16-encoded text file crashes if \r on 64-char boundary
type: crash
versions: Python 3.0
Added file: http://bugs.python.org/file12260/py30cr64bug.py

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4574
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary

2008-12-07 Thread Antoine Pitrou

Changes by Antoine Pitrou [EMAIL PROTECTED]:


--
nosy: +pitrou

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4574
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com