Re: [gentoo-user] How to copy gzip data from bytestream?
On Mon, Feb 21, 2022 at 8:29 PM Grant Edwards wrote: I've got a "raw" USB flash drive containing a large chunk of gzipped data. By "raw" I mean no partition table, now filesystem. Think of it as a tape (if you're old enough). gzip -tv is quite happy to validate the data and says it's OK, though it says it ignored extra bytes after the end of the "file". The flash drive size is 128GB, but the gzipped data is only maybe 20-30GB. Question: is there a simple way to copy just the 'gzip' data from the drive without copying the extra bytes after the end of the 'gzip' data? The only thing I can think of is: $ zcat /dev/sdX | gzip -c > data.gz But I was trying to figure out a way to do it without uncompressing and recompressing the data. I had hoped that the gzip header would contain a "length" field (so I would know how many bytes to copy using dd), but it does not. Apparently, the only way to find the end of the compressed data is to parse it using the proper algorithm (deflate, in this case). -- Grant Hi Grant, you could use gzip to tell you the compressed size of the file and then use another method to copy just those bytes (dd for example): gzip -clt Should print the compressed size in bytes, although by reading through the entire stream once. -- Felix
Re: [gentoo-user] How to copy gzip data from bytestream?
On Mon, Feb 21, 2022 at 8:29 PM Grant Edwards wrote: > > But I was trying to figure out a way to do it without uncompressing > and recompressing the data. I had hoped that the gzip header would > contain a "length" field (so I would know how many bytes to copy using > dd), but it does not. Apparently, the only way to find the end of the > compressed data is to parse it using the proper algorithm (deflate, in > this case). I'm guessing that the reason it lacks such a header, is precisely so that you can use it in a stream in just this manner. In order to have a length in the header it would need to be able to seek back to the start of the file to modify the header, which isn't always possible. I wouldn't be surprised if it stores some kind of metadata at the end of the file, but of course you can only find that if the end of the file is marked in some way. Tapes sometimes have ways to seek to the end of a recording - the drive can record a pattern that is detectable while seeking at high speed. Obviously USB drives lack such a mechanism unless provided by a filesystem or whatever application wrote the data. If you google the details of the gzip file format you might be able to figure out how to identify the end of the file, scan the image to find this marker, and then use dd to extract just the desired range. Unless the file is VERY large I suspect that is going to take you longer than just recompressing it all. I can't imagine that there is any way around sequentially reading the entire file to find the end, unless you have some mechanism that can read a random block and determine if it is valid gzip data and if so you can do a binary search assuming the data on the drive past the end of the file isn't valid gzip. -- Rich
[gentoo-user] How to copy gzip data from bytestream?
I've got a "raw" USB flash drive containing a large chunk of gzipped data. By "raw" I mean no partition table, now filesystem. Think of it as a tape (if you're old enough). gzip -tv is quite happy to validate the data and says it's OK, though it says it ignored extra bytes after the end of the "file". The flash drive size is 128GB, but the gzipped data is only maybe 20-30GB. Question: is there a simple way to copy just the 'gzip' data from the drive without copying the extra bytes after the end of the 'gzip' data? The only thing I can think of is: $ zcat /dev/sdX | gzip -c > data.gz But I was trying to figure out a way to do it without uncompressing and recompressing the data. I had hoped that the gzip header would contain a "length" field (so I would know how many bytes to copy using dd), but it does not. Apparently, the only way to find the end of the compressed data is to parse it using the proper algorithm (deflate, in this case). -- Grant