[issue28436] GzipFile doesn't properly handle short reads and writes on the underlying stream

2016-10-14 Thread Martin Panter

Martin Panter added the comment:

I would fix the documentation to say the underlying stream should do “exact” 
reads and writes, e.g. one that implements io.BufferedIOBase.read(size) or 
write(). In my experience, most APIs in Python’s library assume or require 
this, rather than the “raw” behaviour.

Is it likely that people are passing raw FileIO or similar objects to GzipFile, 
or is this just a theoretical problem?

Also related: In Issue 24291 and Issue 26721, we realized that all the servers 
based on socketserver could unexpectedly do short writes, which was a practical 
bug (not just theoretical). I changed socketserver over to doing exact writes, 
and added a workaround in the wsgiref module to handle partial writes. See 

 for the altered documentation.

Other APIs that come to mind are shutil.copyfileobj() (documentation proposed 
in Issue 24291), and io.TextIOWrapper (documented as requiring BufferedIOBase). 
Also, the bzip and LZMA modules seem equally affected as gzip.

--
assignee:  -> docs@python
components: +Documentation -Library (Lib)
nosy: +docs@python
stage:  -> needs patch
versions: +Python 3.6, Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28436] GzipFile doesn't properly handle short reads and writes on the underlying stream

2016-10-13 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
nosy: +martin.panter

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28436] GzipFile doesn't properly handle short reads and writes on the underlying stream

2016-10-13 Thread Марк Коренберг

Марк Коренберг added the comment:

And also issue26877

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28436] GzipFile doesn't properly handle short reads and writes on the underlying stream

2016-10-13 Thread Марк Коренберг

Марк Коренберг added the comment:

Also see issue16859

--
nosy: +mmarkk

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28436] GzipFile doesn't properly handle short reads and writes on the underlying stream

2016-10-13 Thread Evgeny Kapun

New submission from Evgeny Kapun:

GzipFile's underlying stream can be a raw stream (such as FileIO), and such 
streams can return short reads and writes at any time (e.g. due to signals). 
The correct behavior in case of short read or write is to retry the call to 
read or write the remaining data.

GzipFile doesn't do this. This program demonstrates the problem with reading:

import io, gzip

class MyFileIO(io.FileIO):
def read(self, n):
# Emulate short read
return super().read(1)

raw = MyFileIO('test.gz', 'rb')
gzf = gzip.open(raw, 'rb')
gzf.read()

Output:

$ gzip -c /dev/null > test.gz
$ python3 test.py
Traceback (most recent call last):
  File "test.py", line 10, in 
gzf.read()
  File "/usr/lib/python3.5/gzip.py", line 274, in read
return self._buffer.read(size)
  File "/usr/lib/python3.5/gzip.py", line 461, in read
if not self._read_gzip_header():
  File "/usr/lib/python3.5/gzip.py", line 409, in _read_gzip_header
raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b'\x1f')

And this shows the problem with writing:

import io, gzip

class MyIO(io.RawIOBase):
def write(self, data):
print(data)
# Emulate short write
return 1

raw = MyIO()
gzf = gzip.open(raw, 'wb')
gzf.close()

Output:

$ python3 test.py 
b'\x1f\x8b'
b'\x08'
b'\x00'
b'\xb9\xea\xffW'
b'\x02'
b'\xff'
b'\x03\x00'
b'\x00\x00\x00\x00'
b'\x00\x00\x00\x00'

It can be seen that there is no attempt to write all the data. Indeed, the 
return value of write() method is completely ignored.

I think that either gzip module should be changed to handle short reads and 
writes properly, or its documentation should reflect the fact that it cannot be 
used with raw streams.

--
components: Library (Lib)
messages: 278606
nosy: abacabadabacaba
priority: normal
severity: normal
status: open
title: GzipFile doesn't properly handle short reads and writes on the 
underlying stream
type: behavior
versions: Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com