New submission from Bart Olsthoorn:

CPython tarfile `gettarinfo` method uses fstat to determine the size of a file 
(using its fileobject). When that file object is actually created with 
Gzip.open (so a GZipfile), it will get the compressed size of the file. The 
addfile method will then continue to read the uncompressed data of the gzipped 
file, but will read too few bytes, resulting in a tar of incomplete files.

I suggest checking the file object class before using fstat to determine the 
size, and raise a warning if it's a gzip file.

To clarify, this only happens when adding a GZip file object to tar. I know 
that it's not a really common scenario, and the problem is really that GZip 
file size can only properly be determined by uncompressing and reading it 
entirely, but I think it's nice to not fail without warning.

So this is an example that is failing:
```
import tarfile
c = io.BytesIO()
with tarfile.open(mode='w', fileobj=c) as tar:
  for textfile in ['1.txt.gz', '2.txt.gz']:
    with gzip.open(textfile) as f:
      tarinfo = tar.gettarinfo(fileobj=f)
      tar.addfile(tarinfo=tarinfo, fileobj=f)
  data = c.getvalue()
return data
```

Instead this reads the proper filesize and writes the files to a tar:
```
import tarfile
c = io.BytesIO()
with tarfile.open(mode='w', fileobj=c) as tar:
  for textfile in ['1.txt.gz', '2.txt.gz']:
    with gzip.open(textfile) as f:
      buff = f.read()
      tarinfo = tarfile.TarInfo(name=f.name)
      tarinfo.size = len(buff)
      tar.addfile(tarinfo=tarinfo, fileobj=io.BytesIO(buff))
  data = c.getvalue()
return data
```

----------
messages: 227328
nosy: bartolsthoorn
priority: normal
severity: normal
status: open
title: Tarfile using fstat on GZip file object
type: behavior
versions: Python 3.4

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue22468>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to