Patches item #914340, was opened at 2004-03-11 19:45 Message generated for change (Comment added) made by loewis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=914340&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Modules Group: Python 2.4 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Igor Belyi (belyi) Assigned to: Nobody/Anonymous (nobody) Summary: gzip.GzipFile to accept stream as fileobj. Initial Comment: When gzip.GzipFile is initialized with a fileobj which does not have tell() and seek() methods (non-rewinding stream) it throws exception. The interesting thing is that it doesn't have to. The following patch updates gzip.py to allow any stream with just a read() method to be used. This is helpful if you want to be able to do something like: gzip.GzipFile(fileobj=urllib.urlopen("file:///README.gz")).readlines() or use GzipFile with sys.stdin stream. But keep in mind that seek() and rewind() methond of the GzipFile() won't for such stream even with the patch. Igor ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2007-03-06 15:51 Message: Logged In: YES user_id=21627 Originator: NO The patch in this form is incomplete: it lacks test suite changes. Can somebody please provide patches to Lib/test/test_gzip.py that exercises this new functionality? ---------------------------------------------------------------------- Comment By: Jakob Truelsen (antialize) Date: 2006-06-19 10:35 Message: Logged In: YES user_id=379876 Is there any reson this patch is not accepted? If this patch is accepted then I have a patch to urlib2 to (automaticaly) accept gzipped content as described here http://www.http-compression.com/#client_request, if there is some reson this patch is not acceptable please detail, so it can be fixed, in tired of using popen and gunzip :) ---------------------------------------------------------------------- Comment By: Igor Belyi (belyi) Date: 2004-03-19 15:14 Message: Logged In: YES user_id=995711 I thought I need to add a little bit more verbose explanation for the changes... Current implementation of GzipFile() uses tell() and seek() to scroll stream of data in the following 2 cases: 1. When EOF is reached and the last 8 bytes of the file contain checksum and uncompress data size 2. When after decompression there's left some 'unused_data' meaning that a stream may contains more than one compressed item. What my change does it introduces 2 helper buffers: 'inputbuf' which keeps read but unused data from the stream and 'last8' which keeps last 8 'used' bytes Plus, my change introduces helper method _read_internal() which is used instead of the direct call to self.fileobj.read(). In this method data from the stream are read as needed with the call to self.fileobj.read() and correct values of 'inputbuf' and ''last8' are maintained. When case 1 above happen we use 'last8' buffer to read checksum and size. When case 2 above happen we add value of the 'unused_data' to inputbuf. There's one more instance of the self.fileobj.seek() call left in rewind() method but it is used only when rewind() or seek() methods of GzipFile class are used. And it won't be logical to expect those methods to work if the underlying fileobj does not support them. Igor ---------------------------------------------------------------------- Comment By: Igor Belyi (belyi) Date: 2004-03-19 05:27 Message: Logged In: YES user_id=995711 I thought I need to add a little bit more verbose explanation for the changes... Current implementation of GzipFile() uses tell() and seek() to scroll stream of data in the following 2 cases: 1. When EOF is reached and the last 8 bytes of the file contain checksum and uncompress data size 2. When after decompression there's left some 'unused_data' meaning that a stream may contains more than one compressed item. What my change does it introduces 2 helper buffers: 'inputbuf' which keeps read but unused data from the stream and 'last8' which keeps last 8 'used' bytes Plus, my change introduces helper method _read_internal() which is used instead of the direct call to self.fileobj.read(). In this method data from the stream are read as needed with the call to self.fileobj.read() and correct values of 'inputbuf' and ''last8' are maintained. When case 1 above happen we use 'last8' buffer to read checksum and size. When case 2 above happen we add value of the 'unused_data' to inputbuf. There's one more instance of the self.fileobj.seek() call left in rewind() method but it is used only when rewind() or seek() methods of GzipFile class are used. And it won't be logical to expect those methods to work if the underlying fileobj does not support them. Igor ---------------------------------------------------------------------- Comment By: Igor Belyi (belyi) Date: 2004-03-11 21:04 Message: Logged In: YES user_id=995711 Previous revision of the patch does not work correctly with mutliple compressed members in one stream. I've updated the patch file. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=914340&group_id=5470 _______________________________________________ Patches mailing list Patches@python.org http://mail.python.org/mailman/listinfo/patches