On Nov 29, 2013, at 4:37 AM, Henrik Bengtsson <h...@biostat.ucsf.edu> wrote:
> On Thu, Nov 28, 2013 at 4:48 PM, Simon Urbanek > <simon.urba...@r-project.org> wrote: >> On Nov 27, 2013, at 8:30 PM, Murray Stokely <mur...@stokely.org> wrote: >> >>> I think none of these examples describe a zlib compressed data block inside >>> a binary file that the OP asked about, as all of your examples are e.g. >>> prepending gzip or zip headers. >>> >>> Greg, is memDecompress what you are looking for? >>> >> >> I think so. >> >> But this is interesting — I think the documentation of >> memCompress/memDecompress is not quite correct and the parameters are >> misleading. Although it does mention the gzip headers, it is incorrect since >> zlib format is not a subset of the gzip format (albeit they use the same >> compression method), so you cannot extract gzip content using zlib >> decompression - you’ll get internal error -3 in memDecompress(2) if you try >> it since it expects the zlib header which is different form the gzip one. > > Interestingly. Just to make sure: are you 100% certain about this? Yes, see below. >> From the http://svn.r-project.org/R/trunk/src/main/connections.c: > > case 2: /* gzip */ > { > uLong inlen = LENGTH(from), outlen = 3*inlen; > int res; > Bytef *buf, *p = (Bytef *)RAW(from); > /* we check for a file header */ > if (p[0] == 0x1f && p[1] == 0x8b) { p += 2; inlen -= 2; } > while(1) { > buf = (Bytef *) R_alloc(outlen, sizeof(Bytef)); > res = uncompress(buf, &outlen, p, inlen); > if(res == Z_BUF_ERROR) { outlen *= 2; continue; } > if(res == Z_OK) break; > error("internal error %d in memDecompress(%d)", res, type); > } > ans = allocVector(RAWSXP, outlen); > memcpy(RAW(ans), buf, outlen); > break; > } > > That code looks for the 0x1F 0x8B magic number, which is the one for > gzip [http://www.gzip.org/zlib/rfc-gzip.html#header-trailer]. Or are > you saying that that if statement is incorrect? (Disclaimer: I don't > know much about gzip/zlib, but I happens to recognize that gzip magic > number.) > The above assumes that zlib is a subset of gzip which is *not* true - that was the point I was making. zlibs has *different* headers than gzip, not just fewer bytes. gzip has lots of other things in the header and they even also use different CRC methods. To illustrate: > writeBin(charToRaw("1234"), f<-gzfile("test.gz","wb")) > close(f) > readBin("test.gz",raw(),100) [1] 1f 8b 08 00 00 00 00 00 00 03 33 34 32 36 01 [16] 00 a3 e0 e3 9b 04 00 00 00 > memCompress("1234") [1] 78 9c 33 34 32 36 01 00 01 f8 00 cb As you can see gzip uses a different header (it starts with 0x1f 0x8b but then has many other files like mod time etc.) - the compressed payload starts at byte 11 and the CRC is 64-bit wide. In contrast, zlib has no magic header but it also has just two-byte header followed by the payload (starting at byte 3) and 32-bit CRC. So the two are entirely incompatible - you cannot decompress gzip format with zlib parser and vice-versa. The payload is the same, but the headers and trailers are entirely different. That's why Greg was specifically asking about zlib which does *not* mean gzip. Cheers, Simon > /Henrik > >> So “gzip” in type is a misnomer - it should say “zlib” since it can neither >> read nor write the gzip format. Also the documentation should make it clear >> since it’s pointless to try to use this on gzip contents. The better >> alternative would be to support both gzip and zlib since R can deal with >> both — the issue is that it will break code that used type=“gzip” explicitly >> to mean “zlib” so I’m not sure there is a good way out. >> >> Cheers, >> Simon >> >> >>> >>> On Wed, Nov 27, 2013 at 5:22 PM, Dirk Eddelbuettel <e...@debian.org> wrote: >>> >>>> >>>> On 27 November 2013 at 18:38, Dirk Eddelbuettel wrote: >>>> | >>>> | On 27 November 2013 at 23:49, Dr Gregory Jefferis wrote: >>>> | | I have a binary file type that includes a zlib compressed data block >>>> (ie >>>> | | not gzip). Is anyone aware of a way using base R or a CRAN package to >>>> | | decompress this kind of data (from disk or memory). So far I have found >>>> | | Rcompression::decompress on omegahat, but I would prefer to keep >>>> | | dependencies on CRAN (or bioconductor). I am also trying to avoid >>>> | | writing yet another C level interface to part of zlib. >>>> | >>>> | Unless I am missing something, this is in base R; see help(connections). >>>> | >>>> | Here is a quick demo: >>>> | >>>> | R> write.csv(trees, file="/tmp/trees.csv") # data we all have >>>> | R> system("gzip -v /tmp/trees.csv") # as I am lazy here >>>> | /tmp/trees.csv: 50.5% -- replaced with /tmp/trees.csv.gz >>>> | R> read.csv(gzfile("/tmp/trees.csv.gz")) # works out of the box >>>> >>>> Oh, and in case you meant zip file containing a data file, that also works. >>>> >>>> First converting what I did last >>>> >>>> edd@max:/tmp$ gunzip trees.csv.gz >>>> edd@max:/tmp$ zip trees.zip trees.csv >>>> adding: trees.csv (deflated 50%) >>>> edd@max:/tmp$ >>>> >>>> Then reading the csv from inside the zip file: >>>> >>>> R> read.csv(unz("/tmp/trees.zip", "trees.csv")) >>>> X Girth Height Volume >>>> 1 1 8.3 70 10.3 >>>> 2 2 8.6 65 10.3 >>>> 3 3 8.8 63 10.2 >>>> 4 4 10.5 72 16.4 >>>> 5 5 10.7 81 18.8 >>>> 6 6 10.8 83 19.7 >>>> 7 7 11.0 66 15.6 >>>> 8 8 11.0 75 18.2 >>>> 9 9 11.1 80 22.6 >>>> 10 10 11.2 75 19.9 >>>> 11 11 11.3 79 24.2 >>>> 12 12 11.4 76 21.0 >>>> 13 13 11.4 76 21.4 >>>> 14 14 11.7 69 21.3 >>>> 15 15 12.0 75 19.1 >>>> 16 16 12.9 74 22.2 >>>> 17 17 12.9 85 33.8 >>>> 18 18 13.3 86 27.4 >>>> 19 19 13.7 71 25.7 >>>> 20 20 13.8 64 24.9 >>>> 21 21 14.0 78 34.5 >>>> 22 22 14.2 80 31.7 >>>> 23 23 14.5 74 36.3 >>>> 24 24 16.0 72 38.3 >>>> 25 25 16.3 77 42.6 >>>> 26 26 17.3 81 55.4 >>>> 27 27 17.5 82 55.7 >>>> 28 28 17.9 80 58.3 >>>> 29 29 18.0 80 51.5 >>>> 30 30 18.0 80 51.0 >>>> 31 31 20.6 87 77.0 >>>> R> >>>> >>>> Regards, Dirk >>>> >>>> -- >>>> Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com >>>> >>>> ______________________________________________ >>>> R-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel