Changes by Lars Gustäbel l...@gustaebel.de:
--
resolution: - fixed
stage: patch review - resolved
status: open - closed
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24259
___
Roundup Robot added the comment:
New changeset 372aa98eb72e by Lars Gustäbel in branch '2.7':
Issue #24259: tarfile now raises a ReadError if an archive is truncated inside
a data segment.
https://hg.python.org/cpython/rev/372aa98eb72e
New changeset c7f4f61697b7 by Lars Gustäbel in branch
Lars Gustäbel added the comment:
Martin, I followed your suggestion to raise ReadError. This needed an
additional change in copyfileobj() because it is used both for adding file data
to an archive and extracting file data from an archive.
But I think the patch is in good shape now.
Martin Panter added the comment:
From the current documentation and limited experience with the module,
ReadError (or a subclass) sounds best. I would only expect OSError only for
OS-level things, like file not found, disk error, etc.
The patches look good. One last suggestion is to use
Lars Gustäbel added the comment:
@Martin:
This is actually a nice idea that I hadn't thought of. I updated the Python 3
patch to use a seek() that moves to one byte before the next header block,
reads the remaining byte and raises an error if it hits eof. The code looks
rather clean compared
Changes by Lars Gustäbel l...@gustaebel.de:
Added file: http://bugs.python.org/file39580/issue24259-2.x-2.diff
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24259
___
Thomas Guettler added the comment:
With Python 3.4.0 you get an OSError if you try to extractall() the uploaded
tar_which_is_cut.tar. That's nice.
Seems like only 2.7 seems to be buggy.
=== python3
Python 3.4.0 (default, Apr 11 2014, 13:05:11)
[GCC 4.8.2] on linux
Type help, copyright,
Lars Gustäbel added the comment:
@Thomas:
I think your proposal adds a little too much complexity. Also, ExFileObject is
not used during iteration, and we would like to detect broken archives without
unpacking all the data segments first.
I have written patches for Python 2 and 3.
Thomas Guettler added the comment:
I thought about this again.
It could be solved with the help of a ByteCountingStreamReader.
With ByteCountingStreamReader I mean a wrapper around a stream like
codescs.StreamReader. But the ByteCountingStreamReader should not changes the
content, but just
Martin Panter added the comment:
For the record, the difference between Python 2 and 3 is probably a side effect
of revision 050f0f7be11e. Python 2 copies data from the ExFileObject returned
by extractfile(), while Python 3 copies directly from the underlying file.
The patches to the file
Lars Gustäbel added the comment:
I have written a test for the issue, so that we have a basis for discussion.
There are four different scenarios where an unexpected eof can occur: inside a
metadata block, directly after a metadata block, inside a data segment or
directly after a data segment
Martin Panter added the comment:
If you are already seeking in the file, can’t you seek to the end to determine
the length of the file, and then use that to verify if a data segment is
truncated? And if you can’t seek, I guess you have to read all the bytes anyway.
I guess Ethan’s test was an
Thomas Guettler added the comment:
Who has enough knowledge of the tarfile module to create a good patch?
--
nosy: +guettli
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24259
___
Martin Panter added the comment:
I might be able to make a patch, but what should the patch do exactly?
* Raise an exception as soon as something wrong is found
* Defer exceptions until close() is called, to allow partial data recovery
* Add some sort of defects API that you can optionally
Ethan Furman added the comment:
I ran the OP's code in 2.7, 3.3, 3.4, and 3.5, as well as using ubuntu's gnu
tar 1.27.1:
gnu tar did not report any errors.
Python (all tested versions) did not report any errors (with the errorlevel
parameter missing, set to 1, and set to 2).
--
Ethan Furman added the comment:
On the SO question [1] the OP stated that he tried errorlevel of both 1 and 2
with no effect... however, he was using Python2.6.
Martin, can you run that same test with 2.6 to verify that errorlevel did not
work there, but does now?
[1]
Martin Panter added the comment:
Actually, looking closer at the module, perhaps you just need to set the
errorlevel=1 option:
with tarfile.open(truncated.tar, errorlevel=1) as tar:...
tar.extractall(test-dir)
...
Traceback (most recent call last):
File stdin, line 2, in module
Martin Panter added the comment:
I guess it depends on the particular tar file and where it gets truncated. Just
now I tested with a tar file created from Python’s Tools directory:
$ tar c Tools/ good.tar
$ ls -gG good.tar
-rw-r--r-- 1 17397760 May 28 02:43 good.tar
$ head -c 13 good.tar
Ethan Furman added the comment:
Actually, the OP was using 2.7.6.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24259
___
___
Python-bugs-list
Ethan Furman added the comment:
I took an existing tar file and chopped it in half with `head -c`. I was able
to extract half the files, but I didn't check the viability of the last file as
I was looking for tar or python error feedback.
--
___
Changes by Ned Deily n...@acm.org:
--
nosy: +lars.gustaebel
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24259
___
___
Python-bugs-list mailing
New submission from Thomas Güttler:
The Python tarfile library does not detect a broken tar.
user@host$ wc -c good.tar
143360 good.tar
user@host$ head -c 13 good.tar cut.tar
user@host$ tar -tf cut.tar
...
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
Very
Changes by Ethan Furman et...@stoneleaf.us:
--
nosy: +ethan.furman
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24259
___
___
Python-bugs-list
23 matches
Mail list logo