[jira] [Commented] (AVRO-2195) Add Zstandard Codec
[ https://issues.apache.org/jira/browse/AVRO-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563070#comment-16563070 ] Benson Qiu commented on AVRO-2195: -- Thank you [~nkollar] and [~nielsbasjes]! > Add Zstandard Codec > --- > > Key: AVRO-2195 > URL: https://issues.apache.org/jira/browse/AVRO-2195 > Project: Avro > Issue Type: New Feature > Components: java >Affects Versions: 1.9.0 >Reporter: Benson Qiu >Assignee: Benson Qiu >Priority: Major > Labels: patch > Fix For: 1.9.0 > > Attachments: AVRO-2195.patch, AVRO-2195.patch.v2, AVRO-2195.v3.patch, > AVRO-2195.v4.patch > > > Inspired by AVRO-1373. The Zstandard algorithm is available in the > commons-library, which Avro projects already depend on. > In a quick test that I did, Zstandard had a better compression ratio than > deflate (compression level 9). > [https://code.fb.com/core-data/smaller-and-faster-data-compression-with-zstandard/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2195) Add Zstandard Codec
[ https://issues.apache.org/jira/browse/AVRO-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561622#comment-16561622 ] Nandor Kollar commented on AVRO-2195: - Committed to master. Thanks [~benson.qiu] for the contribution, and [~nielsbasjes] for helping to review! > Add Zstandard Codec > --- > > Key: AVRO-2195 > URL: https://issues.apache.org/jira/browse/AVRO-2195 > Project: Avro > Issue Type: New Feature > Components: java >Affects Versions: 1.9.0 >Reporter: Benson Qiu >Assignee: Benson Qiu >Priority: Major > Labels: patch > Fix For: 1.9.0 > > Attachments: AVRO-2195.patch, AVRO-2195.patch.v2, AVRO-2195.v3.patch, > AVRO-2195.v4.patch > > > Inspired by AVRO-1373. The Zstandard algorithm is available in the > commons-library, which Avro projects already depend on. > In a quick test that I did, Zstandard had a better compression ratio than > deflate (compression level 9). > [https://code.fb.com/core-data/smaller-and-faster-data-compression-with-zstandard/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2195) Add Zstandard Codec
[ https://issues.apache.org/jira/browse/AVRO-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561616#comment-16561616 ] ASF subversion and git services commented on AVRO-2195: --- Commit cf2f30336efe0ecc3debc7bede86fde6d23f7c79 in avro's branch refs/heads/master from [~nkollar] [ https://gitbox.apache.org/repos/asf?p=avro.git;h=cf2f303 ] AVRO-2195: Add Zstandard Codec (Benson Qiu via Nandor Kollar) > Add Zstandard Codec > --- > > Key: AVRO-2195 > URL: https://issues.apache.org/jira/browse/AVRO-2195 > Project: Avro > Issue Type: New Feature > Components: java >Affects Versions: 1.9.0 >Reporter: Benson Qiu >Priority: Major > Labels: patch > Attachments: AVRO-2195.patch, AVRO-2195.patch.v2, AVRO-2195.v3.patch, > AVRO-2195.v4.patch > > > Inspired by AVRO-1373. The Zstandard algorithm is available in the > commons-library, which Avro projects already depend on. > In a quick test that I did, Zstandard had a better compression ratio than > deflate (compression level 9). > [https://code.fb.com/core-data/smaller-and-faster-data-compression-with-zstandard/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2195) Add Zstandard Codec
[ https://issues.apache.org/jira/browse/AVRO-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561043#comment-16561043 ] Niels Basjes commented on AVRO-2195: I only had concerns about the option of making it possible for the other languages and the license of the used implementation. Both of these point check out fine. I haven't been able to check the code itself so I trust Nandors judgement. > Add Zstandard Codec > --- > > Key: AVRO-2195 > URL: https://issues.apache.org/jira/browse/AVRO-2195 > Project: Avro > Issue Type: New Feature > Components: java >Affects Versions: 1.9.0 >Reporter: Benson Qiu >Priority: Major > Labels: patch > Attachments: AVRO-2195.patch, AVRO-2195.patch.v2, AVRO-2195.v3.patch, > AVRO-2195.v4.patch > > > Inspired by AVRO-1373. The Zstandard algorithm is available in the > commons-library, which Avro projects already depend on. > In a quick test that I did, Zstandard had a better compression ratio than > deflate (compression level 9). > [https://code.fb.com/core-data/smaller-and-faster-data-compression-with-zstandard/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2195) Add Zstandard Codec
[ https://issues.apache.org/jira/browse/AVRO-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553554#comment-16553554 ] Benson Qiu commented on AVRO-2195: -- Thanks [~nkollar] for the review. What are the guidelines for getting this committed? Should we wait for additional comments from [~nielsbasjes]? > Add Zstandard Codec > --- > > Key: AVRO-2195 > URL: https://issues.apache.org/jira/browse/AVRO-2195 > Project: Avro > Issue Type: New Feature > Components: java >Affects Versions: 1.9.0 >Reporter: Benson Qiu >Priority: Major > Labels: patch > Attachments: AVRO-2195.patch, AVRO-2195.patch.v2, AVRO-2195.v3.patch, > AVRO-2195.v4.patch > > > Inspired by AVRO-1373. The Zstandard algorithm is available in the > commons-library, which Avro projects already depend on. > In a quick test that I did, Zstandard had a better compression ratio than > deflate (compression level 9). > [https://code.fb.com/core-data/smaller-and-faster-data-compression-with-zstandard/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2195) Add Zstandard Codec
[ https://issues.apache.org/jira/browse/AVRO-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547902#comment-16547902 ] Nandor Kollar commented on AVRO-2195: - +1 for AVRO-2195.v4.patch > Add Zstandard Codec > --- > > Key: AVRO-2195 > URL: https://issues.apache.org/jira/browse/AVRO-2195 > Project: Avro > Issue Type: New Feature > Components: java >Affects Versions: 1.9.0 >Reporter: Benson Qiu >Priority: Major > Labels: patch > Attachments: AVRO-2195.patch, AVRO-2195.patch.v2, AVRO-2195.v3.patch, > AVRO-2195.v4.patch > > > Inspired by AVRO-1373. The Zstandard algorithm is available in the > commons-library, which Avro projects already depend on. > In a quick test that I did, Zstandard had a better compression ratio than > deflate (compression level 9). > [https://code.fb.com/core-data/smaller-and-faster-data-compression-with-zstandard/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2195) Add Zstandard Codec
[ https://issues.apache.org/jira/browse/AVRO-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544726#comment-16544726 ] Benson Qiu commented on AVRO-2195: -- [~nkollar] [~nielsbasjes] Mind taking a look at the latest patch? > Add Zstandard Codec > --- > > Key: AVRO-2195 > URL: https://issues.apache.org/jira/browse/AVRO-2195 > Project: Avro > Issue Type: New Feature > Components: java >Affects Versions: 1.9.0 >Reporter: Benson Qiu >Priority: Major > Labels: patch > Attachments: AVRO-2195.patch, AVRO-2195.patch.v2, AVRO-2195.v3.patch, > AVRO-2195.v4.patch > > > Inspired by AVRO-1373. The Zstandard algorithm is available in the > commons-library, which Avro projects already depend on. > In a quick test that I did, Zstandard had a better compression ratio than > deflate (compression level 9). > [https://code.fb.com/core-data/smaller-and-faster-data-compression-with-zstandard/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2195) Add Zstandard Codec
[ https://issues.apache.org/jira/browse/AVRO-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539051#comment-16539051 ] Benson Qiu commented on AVRO-2195: -- [~nielsbasjes] [~nkollar] Uploaded v4 patch. I realized there was some trailing whitespace in lang/java/avro/src/test/java/org/apache/avro/file/TestZstandardCodec.java so I fixed it. Otherwise, everything is the same as v3. > Add Zstandard Codec > --- > > Key: AVRO-2195 > URL: https://issues.apache.org/jira/browse/AVRO-2195 > Project: Avro > Issue Type: New Feature > Components: java >Affects Versions: 1.9.0 >Reporter: Benson Qiu >Priority: Major > Labels: patch > Attachments: AVRO-2195.patch, AVRO-2195.patch.v2, AVRO-2195.v3.patch, > AVRO-2195.v4.patch > > > Inspired by AVRO-1373. The Zstandard algorithm is available in the > commons-library, which Avro projects already depend on. > In a quick test that I did, Zstandard had a better compression ratio than > deflate (compression level 9). > [https://code.fb.com/core-data/smaller-and-faster-data-compression-with-zstandard/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2195) Add Zstandard Codec
[ https://issues.apache.org/jira/browse/AVRO-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535623#comment-16535623 ] Benson Qiu commented on AVRO-2195: -- [~nielsbasjes] [~nkollar] Uploaded v3 patch: - zstd-jni dependency changed to test scope - Additional unit test for avro/src/main/java/org/apache/avro/file/ZstandardCodec.java > Add Zstandard Codec > --- > > Key: AVRO-2195 > URL: https://issues.apache.org/jira/browse/AVRO-2195 > Project: Avro > Issue Type: New Feature > Components: java >Affects Versions: 1.9.0 >Reporter: Benson Qiu >Priority: Major > Labels: patch > Attachments: AVRO-2195.patch, AVRO-2195.patch.v2, AVRO-2195.v3.patch > > > Inspired by AVRO-1373. The Zstandard algorithm is available in the > commons-library, which Avro projects already depend on. > In a quick test that I did, Zstandard had a better compression ratio than > deflate (compression level 9). > [https://code.fb.com/core-data/smaller-and-faster-data-compression-with-zstandard/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2195) Add Zstandard Codec
[ https://issues.apache.org/jira/browse/AVRO-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534218#comment-16534218 ] Niels Basjes commented on AVRO-2195: You can set the dependency in the pom.xml with scope set to test to achieve this. > Add Zstandard Codec > --- > > Key: AVRO-2195 > URL: https://issues.apache.org/jira/browse/AVRO-2195 > Project: Avro > Issue Type: New Feature > Components: java >Affects Versions: 1.9.0 >Reporter: Benson Qiu >Priority: Major > Labels: patch > Attachments: AVRO-2195.patch, AVRO-2195.patch.v2 > > > Inspired by AVRO-1373. The Zstandard algorithm is available in the > commons-library, which Avro projects already depend on. > In a quick test that I did, Zstandard had a better compression ratio than > deflate (compression level 9). > [https://code.fb.com/core-data/smaller-and-faster-data-compression-with-zstandard/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2195) Add Zstandard Codec
[ https://issues.apache.org/jira/browse/AVRO-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534181#comment-16534181 ] Benson Qiu commented on AVRO-2195: -- Thanks for the comments on v2. Good to hear that the BSD license doesn't appear to be a blocker. {quote}I'm in doubt here; should we do the same or just include it as shown in the current patch? {quote} I actually did not include zstd-jni in the original v1 patch. However, I had to include it for v2 in order to write unit tests that depend on zstd-jni. {quote}I've tried the second patch but it misses the lang/java/avro/src/main/java/org/apache/avro/file/ZstandardCodec.java completely. {quote} Thanks for catching this! I will look into adding a separate unit test to provide test coverage for this class. > Add Zstandard Codec > --- > > Key: AVRO-2195 > URL: https://issues.apache.org/jira/browse/AVRO-2195 > Project: Avro > Issue Type: New Feature > Components: java >Affects Versions: 1.9.0 >Reporter: Benson Qiu >Priority: Major > Labels: patch > Attachments: AVRO-2195.patch, AVRO-2195.patch.v2 > > > Inspired by AVRO-1373. The Zstandard algorithm is available in the > commons-library, which Avro projects already depend on. > In a quick test that I did, Zstandard had a better compression ratio than > deflate (compression level 9). > [https://code.fb.com/core-data/smaller-and-faster-data-compression-with-zstandard/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2195) Add Zstandard Codec
[ https://issues.apache.org/jira/browse/AVRO-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534128#comment-16534128 ] Niels Basjes commented on AVRO-2195: According to this page [https://www.apache.org/legal/resolved.html#category-a] the BSD license is ok to include. The originating issue came to the same conclusion: COMPRESS-423 Looking at the provided patch I noticed that the zstd-jni still needed to be included separately in addition to the commons-compress because has explicitly been marked as optional [https://github.com/apache/commons-compress/blob/master/pom.xml#L87] COMPRESS-423 does not mention why this was done. I'm in doubt here; should we do the same or just include it as shown in the current patch? What I also find remarkable is that the zstd-jni is not just a dependency, it is actually shaded into the avro jar. I've tried the second patch but it misses the lang/java/avro/src/main/java/org/apache/avro/file/ZstandardCodec.java completely. > Add Zstandard Codec > --- > > Key: AVRO-2195 > URL: https://issues.apache.org/jira/browse/AVRO-2195 > Project: Avro > Issue Type: New Feature > Components: java >Affects Versions: 1.9.0 >Reporter: Benson Qiu >Priority: Major > Labels: patch > Attachments: AVRO-2195.patch, AVRO-2195.patch.v2 > > > Inspired by AVRO-1373. The Zstandard algorithm is available in the > commons-library, which Avro projects already depend on. > In a quick test that I did, Zstandard had a better compression ratio than > deflate (compression level 9). > [https://code.fb.com/core-data/smaller-and-faster-data-compression-with-zstandard/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2195) Add Zstandard Codec
[ https://issues.apache.org/jira/browse/AVRO-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533923#comment-16533923 ] Nandor Kollar commented on AVRO-2195: - Patch looks good to me, however the {{com.github.luben:zstd-jni}} has BSD license, and I'm not sure if this is could be a problem or not. [~nielsbasjes] what do you think? Do you have any objection against committing this? > Add Zstandard Codec > --- > > Key: AVRO-2195 > URL: https://issues.apache.org/jira/browse/AVRO-2195 > Project: Avro > Issue Type: New Feature > Components: java >Affects Versions: 1.9.0 >Reporter: Benson Qiu >Priority: Major > Labels: patch > Attachments: AVRO-2195.patch, AVRO-2195.patch.v2 > > > Inspired by AVRO-1373. The Zstandard algorithm is available in the > commons-library, which Avro projects already depend on. > In a quick test that I did, Zstandard had a better compression ratio than > deflate (compression level 9). > [https://code.fb.com/core-data/smaller-and-faster-data-compression-with-zstandard/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2195) Add Zstandard Codec
[ https://issues.apache.org/jira/browse/AVRO-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530911#comment-16530911 ] Benson Qiu commented on AVRO-2195: -- Thanks [~nielsbasjes] and [~nkollar] for the comments. {quote} An important feature of Avro is being crossplatform/crosslanguage so I am very curious regarding the portability of this compressor across languages. Simply put: does it exist outside the Java world? {quote} Repeating what I mentioned on the dev mailing list: There are quite a few language bindings for zstandard compression documented [here|https://facebook.github.io/zstd/#other-languages]. Please let me know if that sufficiently addresses your question. I have uploaded v2 of the patch. It contains the following changes: * Added unit tests in `TestDataFile` and `TestAvroKeyOutputFormat`. * Added a dependency on `zstd-jni`. * Fixed typo in the comment: /** zstandard codec */ > Add Zstandard Codec > --- > > Key: AVRO-2195 > URL: https://issues.apache.org/jira/browse/AVRO-2195 > Project: Avro > Issue Type: New Feature > Components: java >Affects Versions: 1.9.0 >Reporter: Benson Qiu >Priority: Major > Labels: patch > Attachments: AVRO-2195.patch, AVRO-2195.patch.v2 > > > Inspired by AVRO-1373. The Zstandard algorithm is available in the > commons-library, which Avro projects already depend on. > In a quick test that I did, Zstandard had a better compression ratio than > deflate (compression level 9). > [https://code.fb.com/core-data/smaller-and-faster-data-compression-with-zstandard/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2195) Add Zstandard Codec
[ https://issues.apache.org/jira/browse/AVRO-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529553#comment-16529553 ] Nandor Kollar commented on AVRO-2195: - Avro has several parametrized test case with codecs as parameters, how about adding Zstandard as an additional parameter to those tests? \{{TestDataFile}} is a good example for it. > Add Zstandard Codec > --- > > Key: AVRO-2195 > URL: https://issues.apache.org/jira/browse/AVRO-2195 > Project: Avro > Issue Type: New Feature > Components: java >Affects Versions: 1.9.0 >Reporter: Benson Qiu >Priority: Major > Labels: patch > Attachments: AVRO-2195.patch > > > Inspired by AVRO-1373. The Zstandard algorithm is available in the > commons-library, which Avro projects already depend on. > In a quick test that I did, Zstandard had a better compression ratio than > deflate (compression level 9). > [https://code.fb.com/core-data/smaller-and-faster-data-compression-with-zstandard/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2195) Add Zstandard Codec
[ https://issues.apache.org/jira/browse/AVRO-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529497#comment-16529497 ] Niels Basjes commented on AVRO-2195: Thanks for putting up this contribution. An important feature of Avro is being crossplatform/crosslanguage so I am very curious regarding the portability of this compressor across languages. Simply put: does it exist outside the Java world? I have not yet had a good look at your code. A few small first feedback points: * A tiny typo in the comment /** zstanard codec.*/ * That dependency you mentioned seems to be a known limitation: [https://commons.apache.org/proper/commons-compress/limitations.html] * I would really like to see the tests. > Add Zstandard Codec > --- > > Key: AVRO-2195 > URL: https://issues.apache.org/jira/browse/AVRO-2195 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.9.0 >Reporter: Benson Qiu >Priority: Major > Labels: patch > Attachments: AVRO-2195.patch > > > Inspired by AVRO-1373. The Zstandard algorithm is available in the > commons-library, which Avro projects already depend on. > In a quick test that I did, Zstandard had a better compression ratio than > deflate (compression level 9). > [https://code.fb.com/core-data/smaller-and-faster-data-compression-with-zstandard/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2195) Add Zstandard Codec
[ https://issues.apache.org/jira/browse/AVRO-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527317#comment-16527317 ] Benson Qiu commented on AVRO-2195: -- Initial Patch (AVRO-2195.patch): * Code is based on AVRO-1373 * Existing tests pass. However, I did not write new unit tests yet. * I did a manual test by creating a quick Java application that writes Avro files using zstandard compression: {code:java} // ... dataFileWriter.setCodec(CodecFactory.zstandardCodec()); // ... while (dataFileReader.hasNext()) { record = dataFileReader.next(); dataFileWriter.append(record); } // ... {code} In my manual test, I observed that my Java application needs a dependency on `ztd-jni`. Without that dependency, we get a `ClassNotFoundException` for `com.github.luben.zstd.ZstdOutputStream`. {code:java} com.github.luben zstd-jni 1.3.4-10 {code} > Add Zstandard Codec > --- > > Key: AVRO-2195 > URL: https://issues.apache.org/jira/browse/AVRO-2195 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.9.0 >Reporter: Benson Qiu >Priority: Major > Labels: patch > Attachments: AVRO-2195.patch > > > Inspired by AVRO-1373. The Zstandard algorithm is available in the > commons-library, which Avro projects already depend on. > In a quick test that I did, Zstandard had a better compression ratio than > deflate (compression level 9), with significantly faster compression times. > [https://code.fb.com/core-data/smaller-and-faster-data-compression-with-zstandard/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)