[issue26730] SpooledTemporaryFile doesn't correctly preserve data for text (non-binary) SpooledTemporaryFile objects when Unicode characters are written

2019-11-26 Thread James Hennessy


James Hennessy  added the comment:

The quickest fix for the data corruption problem is to delete the line
newfile.seek(file.tell(), 0)
from the rollover() method.

This doesn't fix the inconsistency of tell() and seek(), but it's very low 
risk.  It's technically a change to the API, that rollover() no longer 
preserves the seek position, but unless the user was writing only characters 
from the ISO-8859-1 character set, it wasn't working properly before anyway.

--

___
Python tracker 
<https://bugs.python.org/issue26730>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26730] SpooledTemporaryFile doesn't correctly preserve data for text (non-binary) SpooledTemporaryFile objects when Unicode characters are written

2019-11-26 Thread James Hennessy


James Hennessy  added the comment:

I don't like the idea of using a TemporaryFile right from the beginning in text 
mode.  You might as well remove text mode support altogether if that's the 
approach you want to take, since it undoes any potential performance benefit of 
using SpooledTemporaryFile in the first place.

--

___
Python tracker 
<https://bugs.python.org/issue26730>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26730] SpooledTemporaryFile doesn't correctly preserve data for text (non-binary) SpooledTemporaryFile objects when Unicode characters are written

2019-11-26 Thread James Hennessy


James Hennessy  added the comment:

I have to disagree with the idea that SpooledTemporaryFile is not useful.  
Although on some systems, the file system may appear as fast as memory, that 
cannot be assumed to be universally true.  I think the idea behind 
SpooledTemporaryFile is completely valid.

--

___
Python tracker 
<https://bugs.python.org/issue26730>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26730] SpooledTemporaryFile doesn't correctly preserve data for text (non-binary) SpooledTemporaryFile objects when Unicode characters are written

2016-04-10 Thread James Hennessy

New submission from James Hennessy:

The tempfile.SpooledTemporaryFile class doesn't correctly preserve data for 
text (non-binary) SpooledTemporaryFile objects when Unicode characters are 
written.  The attached program demonstrates the failure.  It creates a 
SpooledTemporaryFile object, writes 20 string characters to it, and then tries 
to read them back.  If the SpooledTemporaryFile has rolled over to disk, as it 
does in the demonstration program, then the data is not read back correctly.  
Instead, an exception is recognized due to the data in the SpooledTemporaryFile 
being corrupted.

The problem is this statement in tempfile.py, in the rollover() method:
newfile.seek(file.tell(), 0)
The "file" variable references a StringIO object, whose tell() and seek() 
methods count in characters, not bytes, yet this value is applied to a 
TemporaryFile object, whose tell() and seek() methods deal in bytes, not 
characters.  The demonstration program writes 10 characters to the 
SpooledTemporaryFile.  Since 10 exceeds the rollover size of 5, the 
implementation writes the 10 characters to the TemporaryFile and then seeks to 
position 10 in the TemporaryFile, which it thinks is the end of the stream.  
But those 10 characters got encoded to 30 bytes, and seek position 10 is in the 
middle of the UTF-8 sequence for the fourth character.  The next write to the 
SpooledTemporaryFile starts overlaying bytes from there.  The attempt to read 
back the data fails because the byte stream no longer represents a valid UTF-8 
stream of data.

The related problem is the inconsistency of the behavior of tell() and seek() 
for text (non-binary) SpooledTemporaryFile objects.  If the data hasn't yet 
rolled over to a TemporaryFile, they count in string characters.  If the data 
has rolled over, they count in bytes.

A quick fix for this is to remove the seek() in the rollover() method.  I 
presume it's there to preserve the stream position if an explicit call to 
rollover() is made, since for an implicit call, the position would be at the 
end of the stream already.  This quick fix, therefore, would introduce an 
external incompatibility in the behavior of rollover().

Another possibility is to never use a StringIO object, but to always buffer 
data in a BytesIO object, as is done for binary SpooledTemporaryFile objects.  
This has the advantage of "fixing" the tell() and seek() inconsistency, making 
them count bytes all the time.  The downside, of course, is that data that 
doesn't end up being rolled over to a TemporaryFile gets encoded and decoded, a 
round trip that could otherwise be avoided.

This problem can be circumvented by a user of SpooledTemporaryFile by 
explicitly seeking to the end of the stream after every write to the 
SpooledTemporaryFile object:  spool.seek(0, io.SEEK_END)

--
components: Library (Lib)
files: showbug.py
messages: 263147
nosy: James Hennessy
priority: normal
severity: normal
status: open
title: SpooledTemporaryFile doesn't correctly preserve data for text 
(non-binary) SpooledTemporaryFile objects when Unicode characters are written
type: behavior
versions: Python 3.4
Added file: http://bugs.python.org/file42423/showbug.py

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26730>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com