What version of xz are you using? The differences between xz and lzma are a bit more involved. One such example is that xz is a framed format which includes checksums on each “frame”. I would not expect checksum verification to account for all of that difference, but it can be disabled to confirm.
On Sat, Jul 9, 2022 at 6:31 AM Gary Lucas <gwluca...@gmail.com> wrote: > Hi, > > Would anyone be able to confirm that I am using the Java library > xz-java-1.9.zip correctly? If not, could you suggest a better way to > use it? Code snippets are included below. > > I am using the library to compress a public-domain data product called > ETOPO1. ETOPO1 provides a global-scale grid of 233 million elevation > and ocean depth samples as integer meters. My implementation > compresses the data in separate blocks of about 20 thousand values > each. Previously, I used Huffman coding and Deflate to reduce the size > of the data to about 4.39 bits per value. With your library, LZMA > reduces that to 4.14 bits per value and XZ to 4.16. So both techniques > represent a substantial improvement in compression compared to the > Huffman/Deflate methods. That improvement comes with a reasonable > cost. Decompression using LZMA and XZ is slower than Huffman/Deflate. > The original implementation requires an average of 4.8 seconds to > decompress the full set of 233 million points. The LZMA version > requires 15.2 seconds, and the XZ version requires 18.9 seconds. > > My understanding is that XZ should perform better than LZMA. Since > that is not the case, could there be something suboptimal with the way > my code uses the API? > > If you would like more detail about the implementation, please visit > > Compression Algorithms for Raster Data: > > https://gwlucastrig.github.io/GridfourDocs/notes/GridfourDataCompressionAlgorithms.html > Compression using Lagrange Multipliers for Optimal Predictors: > > https://gwlucastrig.github.io/GridfourDocs/notes/CompressionUsingOptimalPredictors.html > GVRS Frequently asked Questions (FAQ): > https://github.com/gwlucastrig/gridfour/wiki/A-GVRS-FAQ > > Thank you for your great data compression library. > > Gary > > And here are the Code Snippets: > > The Gridfour Virtual Raster Store (GVRS) is a wrapper format that > stores separate blocks of compressed data to provide random-access by > application code > > LZMA ------------------------------------------ > // byte [] input is input data > ByteArrayOutputStream baos = new ByteArrayOutputStream(); > lzmaOut = new LZMAOutputStream(baos, new LZMA2Options(), > input.length); > lzmaOut.write(input, 0, input.length); > lzmaOut.finish(); > lzmaOut.close(); > return baos.toByteArray(); // return byte[] which is stored to > file > > > // reading the compressed data: > ByteArrayInputStream bais = new > ByteArrayInputStream(compressedInput, 0, compressedInput.length); > LZMAInputStream lzmaIn = new LZMAInputStream(bais); > byte[] output = new byte[expectedOutputLength]; > lzmaIn.read(output, 0, output.length); > > > XZ ---------------------------------------------------- > // byte [] input is input data > ByteArrayOutputStream baos = new ByteArrayOutputStream(); > xzOut = new XzOutputStream(baos, new LZMA2Options(), input.length); > xzOut.write(input, 0, input.length); > xzOut.finish(); > xzOut.close(); > return baos.toByteArray(); // return byte[] which is stored to > file > > // reading the compressed data: > ByteArrayInputStream bais = new > ByteArrayInputStream(compressedInput, 0, compressedInput.length); > XzInputStream xzIn = new XzInputStream(bais); > byte[] output = new byte[expectedOutputLength]; > xzIn.read(output, 0, output.length); > >