[xz-devel] Question about using Java API for geospatial data

Gary Lucas Sat, 09 Jul 2022 04:31:06 -0700

Hi,

Would anyone be able to confirm that I am using the Java library
xz-java-1.9.zip correctly? If not, could you suggest a better way to
use it? Code snippets are included below.


I am using the library to compress a public-domain data product called
ETOPO1. ETOPO1 provides a global-scale grid of 233 million elevation
and ocean depth samples as integer meters. My implementation
compresses the data in separate blocks of about 20 thousand values
each. Previously, I used Huffman coding and Deflate to reduce the size
of the data to about 4.39 bits per value. With your library, LZMA
reduces that to 4.14 bits per value and XZ to 4.16. So both techniques
represent a substantial improvement in compression compared to the
Huffman/Deflate methods. That improvement comes with a reasonable
cost. Decompression using LZMA and XZ is slower than Huffman/Deflate.
The original implementation requires an average of 4.8 seconds to
decompress the full set of 233 million points.  The LZMA version
requires 15.2 seconds, and the XZ version requires 18.9 seconds.

My understanding is that XZ should perform better than LZMA. Since
that is not the case, could there be something suboptimal with the way
my code uses the API?

If you would like more detail about the implementation, please visit

        Compression Algorithms for Raster Data:
https://gwlucastrig.github.io/GridfourDocs/notes/GridfourDataCompressionAlgorithms.html
        Compression using Lagrange Multipliers for Optimal Predictors:
https://gwlucastrig.github.io/GridfourDocs/notes/CompressionUsingOptimalPredictors.html
        GVRS Frequently asked Questions (FAQ):
https://github.com/gwlucastrig/gridfour/wiki/A-GVRS-FAQ

Thank you for your great data compression library.

Gary

And here are the Code Snippets:

The Gridfour Virtual Raster Store (GVRS) is a wrapper format that
stores separate blocks of compressed data to provide random-access by
application code

LZMA ------------------------------------------
        // byte [] input is input data
        ByteArrayOutputStream baos = new  ByteArrayOutputStream();
        lzmaOut = new LZMAOutputStream(baos, new LZMA2Options(), input.length);
        lzmaOut.write(input, 0, input.length);
        lzmaOut.finish();
        lzmaOut.close();
        return baos.toByteArray();   // return byte[] which is stored to file


        // reading the compressed data:
        ByteArrayInputStream bais = new
ByteArrayInputStream(compressedInput, 0, compressedInput.length);
        LZMAInputStream lzmaIn = new LZMAInputStream(bais);
        byte[] output = new byte[expectedOutputLength];
        lzmaIn.read(output, 0, output.length);


XZ ----------------------------------------------------
        // byte [] input is input data
        ByteArrayOutputStream baos = new  ByteArrayOutputStream();
        xzOut = new XzOutputStream(baos, new LZMA2Options(), input.length);
        xzOut.write(input, 0, input.length);
        xzOut.finish();
        xzOut.close();
        return baos.toByteArray();   // return byte[] which is stored to file

       // reading the compressed data:
       ByteArrayInputStream bais = new
ByteArrayInputStream(compressedInput, 0, compressedInput.length);
        XzInputStream xzIn = new XzInputStream(bais);
        byte[] output = new byte[expectedOutputLength];
        xzIn.read(output, 0, output.length);

[xz-devel] Question about using Java API for geospatial data

Reply via email to