Wolfgang Maier added the comment:
Thanks everyone for the lively discussion !
I like Serhiy's idea of making write work with arbitrary objects supporting the
buffer protocol. In fact, I noticed before that GzipFile.write misbehaves with
array.array input. It pretends to accept that, but it'll use len(data) for
calculating the zip file metadata so reading from the file will later fail. I
was assuming then that fixing that would be too complicated for a rather exotic
usecase, but now that I see how simple it really is I think it should be done.
As for the concrete implementation, I guess an isinstance(data, bytes) check to
speed up treatment of the most common input is a good idea, but I am not
convinced that bytearray deserves the same attention.
Regarding memoryview.cast('B') vs memoryview.nbytes, I see Serhiy's point of
keeping the patch size smaller. Personally though, I find use of nbytes much
more self-explanatory than cast('B') the purpose of which was not immediately
obvious to me. So I would opt for better readability of the final code rather
than optimizing patch size, but I would be ok with either solution since it is
not about the essence of the patch anyway.
Finally, the bug I report in issue21560 would be fixed as a side-effect of this
patch here (because trying to get a memoryview from str would throw an early
TypeError). Still, I think it would be a good idea to try to write to the
wrapped fileobj *before* updating self.size and self.crc to be protected from
unforeseen errors. So maybe we could include that change in the patch here ?
With all that the final code section could look like this:
if isinstance(data, bytes):
length = len(data)
else:
data = memoryview(data)
length = data.nbytes
if length > 0:
self.fileobj.write( self.compress.compress(data) )
self.size = self.size + length
self.crc = zlib.crc32(data, self.crc) & 0xffffffff
self.offset += length
return length
One remaining detail then would be whether one would want to catch the
TypeError possibly raised by the memoryview constructor to turn it into
something less confusing (after all many users will not know what a memoryview
has to do with all this). The above code would throw (with str input for
example):
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/home/wolma/gzip-bug/Lib/gzip.py", line 340, in write
data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'str'
Maybe, this could be turned into:
TypeError: must be bytes / bytes-like object, not 'str' ?
to be consistent with the corresponding error in 'wt' mode ?
Let me know which of the above options you favour and I'll provide a new patch.
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue23688>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com