Martin Panter added the comment:
I suspect Eric’s file has non-zero, non-gzip garbage bytes appended to the end
of it. Assuming I am right, here is way to reproduce that scenario:
>>> from gzip import GzipFile
>>> from io import BytesIO
>>> file = BytesIO()
>>> with GzipFile(fileobj=file, mode="wb") as z:
... z.write(b"data")
...
4
>>> file.write(b"garbage")
7
>>> file.seek(0)
0
>>> GzipFile(fileobj=file).read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/proj/python/cpython/Lib/gzip.py", line 274, in read
return self._buffer.read(size)
File "/home/proj/python/cpython/Lib/gzip.py", line 461, in read
if not self._read_gzip_header():
File "/home/proj/python/cpython/Lib/gzip.py", line 409, in _read_gzip_header
raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b'ga')
This is a bit different to Issue 1508475. That one is about cases where the
“gzip” trailer has been truncated, although the compressed data is probably
intact. This case is the converse: extra data has been added.
All of the “gzip”, “bzip2” and XZ Utils (for LZMA) command-line decompressors
happily extract the compressed data without an error exit status, but emit
warning messages:
gzip: stdin: decompression OK, trailing garbage ignored
bzip2: (stdin): trailing garbage after EOF ignored
xz: (stdin): Unexpected end of input
In Python, the “bzip” and LZMA modules successfully extract the compressed
data, and ignore the non-compressed garbage at the end without even a warning.
On the other hand, the “gzip” module has special code to ignore trailing zero
bytes (Issue 2846), but treats any other trailing non-gzip data as an error.
So I think a strong argument could be made for the ability to extract all the
compressed data from even if there is garbage appended. The question is, how
would this support be added? Perhaps the mechanism chosen could also be
integrated with a fix for Issue 1508475. Some options:
* Silently ignore the condition by default like the other compression modules
(consistent, but could silently swallow real errors)
* An optional new GzipFile(strict=False) mode
* Perhaps an exception deferred until close() is called
----------
nosy: +vadmium
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue24301>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com