[issue24301] gzip module failing to decompress valid compressed file

2021-12-31 Thread Ruben Vorderman


Ruben Vorderman  added the comment:

ping

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24301] gzip module failing to decompress valid compressed file

2021-11-29 Thread Ruben Vorderman


Change by Ruben Vorderman :


--
keywords: +patch
pull_requests: +28076
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/29847

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24301] gzip module failing to decompress valid compressed file

2021-11-29 Thread Ruben Vorderman


Ruben Vorderman  added the comment:

Whoops. Sorry, I spoke before my turn. If gzip implements it, it seems only 
logical that python's *gzip* module should too. 
I believe it can be fixed quite easily. The code should raise a warning though. 
I will make a PR.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24301] gzip module failing to decompress valid compressed file

2021-11-29 Thread Ruben Vorderman


Ruben Vorderman  added the comment:

>From the spec:

https://datatracker.ietf.org/doc/html/rfc1952


   2.2. File format

  A gzip file consists of a series of "members" (compressed data
  sets).  The format of each member is specified in the following
  section.  The members simply appear one after another in the file,
  with no additional information before, between, or after them.


Gzip files with garbage after them are corrupted or not spec compliant. 
Therefore the gzip module should raise an error in this case.

--
nosy: +rhpvorderman

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24301] gzip module failing to decompress valid compressed file

2021-11-27 Thread Irit Katriel


Irit Katriel  added the comment:

Reproduced on 3.11:

>>> from gzip import GzipFile
>>> from io import BytesIO
>>> file = BytesIO()
>>> with GzipFile(fileobj=file, mode="wb") as z:
... z.write(b"data")
... 
4
>>> file.write(b"garbage")
7
>>> file.seek(0)
0
>>> GzipFile(fileobj=file).read()
Traceback (most recent call last):
  File "", line 1, in 
  File "/Users/iritkatriel/src/cpython-654/Lib/gzip.py", line 301, in read
return self._buffer.read(size)
   ^^^
  File "/Users/iritkatriel/src/cpython-654/Lib/_compression.py", line 118, in 
readall
while data := self.read(sys.maxsize):
  ^^
  File "/Users/iritkatriel/src/cpython-654/Lib/gzip.py", line 499, in read
if not self._read_gzip_header():
   
  File "/Users/iritkatriel/src/cpython-654/Lib/gzip.py", line 468, in 
_read_gzip_header
last_mtime = _read_gzip_header(self._fp)
 ^^^
  File "/Users/iritkatriel/src/cpython-654/Lib/gzip.py", line 428, in 
_read_gzip_header
raise BadGzipFile('Not a gzipped file (%r)' % magic)

gzip.BadGzipFile: Not a gzipped file (b'ga')

--
nosy: +iritkatriel
type:  -> behavior
versions: +Python 3.10, Python 3.11, Python 3.9 -Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24301] gzip module failing to decompress valid compressed file

2015-05-27 Thread Ned Deily

Ned Deily added the comment:

Can you add a public copy of a file that fails this way?  There are several 
open issues with gzip, like Issue1159051, that might cover this but it's hard 
to know for sure without a test case.

--
nosy: +ned.deily
type: crash - 

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24301
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24301] gzip module failing to decompress valid compressed file

2015-05-27 Thread Eric Gorr

New submission from Eric Gorr:

I have a file whose first four bytes are 1F 8B 08 00 and if I use gunzip from 
the command line, it outputs:

gzip: zImage_extracted.gz: decompression OK, trailing garbage ignored

and correctly decompresses the file. However, if I use the gzip module to read 
and decompress the data, I get the following exception thrown:

  File /usr/lib/python3.4/gzip.py, line 360, in read
while self._read(readsize):
  File /usr/lib/python3.4/gzip.py, line 433, in _read
if not self._read_gzip_header():
  File /usr/lib/python3.4/gzip.py, line 297, in _read_gzip_header
raise OSError('Not a gzipped file')

I believe the problem I am facing is the same one described here in this SO 
question and answer:

http://stackoverflow.com/questions/4928560/how-can-i-work-with-gzip-files-which-contain-extra-data


This would appear to be serious bug in the gzip module that needs to be fixed.

--
components: Extension Modules
messages: 244188
nosy: Eric Gorr
priority: normal
severity: normal
status: open
title: gzip module failing to decompress valid compressed file
type: crash
versions: Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24301
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24301] gzip module failing to decompress valid compressed file

2015-05-27 Thread Martin Panter

Martin Panter added the comment:

I suspect Eric’s file has non-zero, non-gzip garbage bytes appended to the end 
of it. Assuming I am right, here is way to reproduce that scenario:

 from gzip import GzipFile
 from io import BytesIO
 file = BytesIO()
 with GzipFile(fileobj=file, mode=wb) as z:
... z.write(bdata)
... 
4
 file.write(bgarbage)
7
 file.seek(0)
0
 GzipFile(fileobj=file).read()
Traceback (most recent call last):
  File stdin, line 1, in module
  File /home/proj/python/cpython/Lib/gzip.py, line 274, in read
return self._buffer.read(size)
  File /home/proj/python/cpython/Lib/gzip.py, line 461, in read
if not self._read_gzip_header():
  File /home/proj/python/cpython/Lib/gzip.py, line 409, in _read_gzip_header
raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b'ga')

This is a bit different to Issue 1508475. That one is about cases where the 
“gzip” trailer has been truncated, although the compressed data is probably 
intact. This case is the converse: extra data has been added.

All of the “gzip”, “bzip2” and XZ Utils (for LZMA) command-line decompressors 
happily extract the compressed data without an error exit status, but emit 
warning messages:

gzip: stdin: decompression OK, trailing garbage ignored
bzip2: (stdin): trailing garbage after EOF ignored
xz: (stdin): Unexpected end of input

In Python, the “bzip” and LZMA modules successfully extract the compressed 
data, and ignore the non-compressed garbage at the end without even a warning. 
On the other hand, the “gzip” module has special code to ignore trailing zero 
bytes (Issue 2846), but treats any other trailing non-gzip data as an error.

So I think a strong argument could be made for the ability to extract all the 
compressed data from even if there is garbage appended. The question is, how 
would this support be added? Perhaps the mechanism chosen could also be 
integrated with a fix for Issue 1508475. Some options:

* Silently ignore the condition by default like the other compression modules 
(consistent, but could silently swallow real errors)
* An optional new GzipFile(strict=False) mode
* Perhaps an exception deferred until close() is called

--
nosy: +vadmium

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24301
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com