[issue23688] unnecessary copying of memoryview in gzip.GzipFile.write?

Wolfgang Maier Wed, 18 Mar 2015 01:53:08 -0700

Wolfgang Maier added the comment:

Thanks everyone for the lively discussion !


I like Serhiy's idea of making write work with arbitrary objects supporting the 
buffer protocol. In fact, I noticed before that GzipFile.write misbehaves with 
array.array input. It pretends to accept that, but it'll use len(data) for 
calculating the zip file metadata so reading from the file will later fail. I 
was assuming then that fixing that would be too complicated for a rather exotic 
usecase, but now that I see how simple it really is I think it should be done.

As for the concrete implementation, I guess an isinstance(data, bytes) check to 
speed up treatment of the most common input is a good idea, but I am not 
convinced that bytearray deserves the same attention.

Regarding memoryview.cast('B') vs memoryview.nbytes, I see Serhiy's point of 
keeping the patch size smaller. Personally though, I find use of nbytes much 
more self-explanatory than cast('B') the purpose of which was not immediately 
obvious to me. So I would opt for better readability of the final code rather 
than optimizing patch size, but I would be ok with either solution since it is 
not about the essence of the patch anyway.

Finally, the bug I report in issue21560 would be fixed as a side-effect of this 
patch here (because trying to get a memoryview from str would throw an early 
TypeError). Still, I think it would be a good idea to try to write to the 
wrapped fileobj *before* updating self.size and self.crc to be protected from 
unforeseen errors. So maybe we could include that change in the patch here ?

With all that the final code section could look like this:

        if isinstance(data, bytes):
            length = len(data)
        else:
            data = memoryview(data)
            length = data.nbytes

        if length > 0:
            self.fileobj.write( self.compress.compress(data) )
            self.size = self.size + length
            self.crc = zlib.crc32(data, self.crc) & 0xffffffff
            self.offset += length

        return length

One remaining detail then would be whether one would want to catch the 
TypeError possibly raised by the memoryview constructor to turn it into 
something less confusing (after all many users will not know what a memoryview 
has to do with all this). The above code would throw (with str input for 
example):

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/wolma/gzip-bug/Lib/gzip.py", line 340, in write
    data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'str'

Maybe, this could be turned into:

TypeError: must be bytes / bytes-like object, not 'str'  ?

to be consistent with the corresponding error in 'wt' mode ?

Let me know which of the above options you favour and I'll provide a new patch.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue23688>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue23688] unnecessary copying of memoryview in gzip.GzipFile.write?

Reply via email to