Martin Panter added the comment:

I suspect Eric’s file has non-zero, non-gzip garbage bytes appended to the end 
of it. Assuming I am right, here is way to reproduce that scenario:

>>> from gzip import GzipFile
>>> from io import BytesIO
>>> file = BytesIO()
>>> with GzipFile(fileobj=file, mode="wb") as z:
...     z.write(b"data")
... 
4
>>> file.write(b"garbage")
7
>>> file.seek(0)
0
>>> GzipFile(fileobj=file).read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/proj/python/cpython/Lib/gzip.py", line 274, in read
    return self._buffer.read(size)
  File "/home/proj/python/cpython/Lib/gzip.py", line 461, in read
    if not self._read_gzip_header():
  File "/home/proj/python/cpython/Lib/gzip.py", line 409, in _read_gzip_header
    raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b'ga')

This is a bit different to Issue 1508475. That one is about cases where the 
“gzip” trailer has been truncated, although the compressed data is probably 
intact. This case is the converse: extra data has been added.

All of the “gzip”, “bzip2” and XZ Utils (for LZMA) command-line decompressors 
happily extract the compressed data without an error exit status, but emit 
warning messages:

gzip: stdin: decompression OK, trailing garbage ignored
bzip2: (stdin): trailing garbage after EOF ignored
xz: (stdin): Unexpected end of input

In Python, the “bzip” and LZMA modules successfully extract the compressed 
data, and ignore the non-compressed garbage at the end without even a warning. 
On the other hand, the “gzip” module has special code to ignore trailing zero 
bytes (Issue 2846), but treats any other trailing non-gzip data as an error.

So I think a strong argument could be made for the ability to extract all the 
compressed data from even if there is garbage appended. The question is, how 
would this support be added? Perhaps the mechanism chosen could also be 
integrated with a fix for Issue 1508475. Some options:

* Silently ignore the condition by default like the other compression modules 
(consistent, but could silently swallow real errors)
* An optional new GzipFile(strict=False) mode
* Perhaps an exception deferred until close() is called

----------
nosy: +vadmium

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue24301>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to