[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166298#comment-16166298 ] Jason Lowe commented on HADOOP-13578: - bq. it finally gets us the promised land of fast, high compression and splittability all in one file format. To clarify, this ZStandard codec is not splittable. The ZStandard file format is theoretically capable of supporting splits, but that's not an official feature yet. As I understand it, split support is being added via the "seekable" support which is in the contrib area at https://github.com/facebook/zstd/tree/dev/contrib/seekable_format, but that's not part of the standard zstd library yet. See https://github.com/facebook/zstd/issues/395 for more discussion around the support of splits in the zstd codec. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, > HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, > HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, > HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165480#comment-16165480 ] Wei-Chiu Chuang commented on HADOOP-13578: -- -1 for including it in 2.7 or 2.8. I am sorry but a new feature is not supposed to land it a maintenance release. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, > HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, > HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, > HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165440#comment-16165440 ] Adam Kennedy commented on HADOOP-13578: --- ZStd support in 2.7x would be hugely valuable, particularly because it finally gets us the promised land of fast, high compression and splittability all in one file format. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, > HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, > HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, > HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165329#comment-16165329 ] churro morales commented on HADOOP-13578: - This was never backported, we did it internally here but never pushed upstream. The build is quite different from Hadoop-2.9 thats where a majority of the changes lie. If there is enough interest I could possibly put up a 2.7x patch. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, > HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, > HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, > HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155295#comment-16155295 ] Andre F de Miranda commented on HADOOP-13578: - [~churromorales] was this ever backported to 2.7.x? I couldn't find the mentioned JIRA Cheers > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, > HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, > HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, > HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16146301#comment-16146301 ] churro morales commented on HADOOP-13578: - The new version of ZStandard 1.3.1 has a new license: BSD + GPLv2 which removed the PATENTS file. [https://github.com/facebook/zstd/releases/tag/v1.3.1] > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, > HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, > HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, > HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113710#comment-16113710 ] Andrew Wang commented on HADOOP-13578: -- Believe it's fine, see this mailing list thread: https://lists.apache.org/thread.html/55e4ebf60ef0d109c9f74eac92f58a01b2efc551ae321e5444e05772@%3Ccommon-dev.hadoop.apache.org%3E >From Jason: {quote} I think we are OK to leave support for the zstd codec in the Hadoop code base. I asked Chris Mattman for clarification, noting that the support for the zstd codec requires the user to install the zstd headers and libraries and then configure it to be included in the native Hadoop build. The Hadoop releases are not shipping any zstd code (e.g.: headers or libraries) nor does it require zstd as a mandatory dependency. Here's what he said: {quote} > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, > HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, > HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, > HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113647#comment-16113647 ] Adam Kennedy commented on HADOOP-13578: --- Is this impacted by LEGAL-303? > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, > HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, > HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, > HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864240#comment-15864240 ] churro morales commented on HADOOP-13578: - Hi [~redsky_luan] - I believe I can put up a patch for 2.7, should be pretty easy. Lets create another JIRA ticket for this and move it out. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, > HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, > HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, > HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863517#comment-15863517 ] James_Luan commented on HADOOP-13578: - Hi there.. Is there any schedule for backporing zstd on 2.7? I'd glad to test zstd on version 2.7 > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, > HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, > HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, > HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15798669#comment-15798669 ] Hudson commented on HADOOP-13578: - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11072 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11072/]) HADOOP-13578. Add Codec for ZStandard Compression. Contributed by churro (jlowe: rev a0a276162147e843a5a4e028abdca5b66f5118da) * (add) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/zstd/TestZStandardCompressorDecompressor.java * (edit) hadoop-project/pom.xml * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NativeLibraryChecker.java * (add) hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardDecompressor.c * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/Decompressor.java * (add) hadoop-common-project/hadoop-common/src/test/resources/zstd/test_file.txt.zst * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java * (add) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/ZStandardCodec.java * (edit) hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/NativeCodeLoader.c * (add) hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/org_apache_hadoop_io_compress_zstd.h * (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/TestCompressionStreamReuse.java * (edit) hadoop-common-project/hadoop-common/src/main/resources/META-INF/services/org.apache.hadoop.io.compress.CompressionCodec * (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/pom.xml * (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/TestCodec.java * (edit) hadoop-common-project/hadoop-common/src/CMakeLists.txt * (add) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zstd/ZStandardCompressor.java * (add) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zstd/package-info.java * (edit) dev-support/bin/dist-copynativelibs * (add) hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardCompressor.c * (edit) hadoop-common-project/hadoop-common/pom.xml * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NativeCodeLoader.java * (add) hadoop-common-project/hadoop-common/src/test/resources/zstd/test_file.txt * (edit) hadoop-project-dist/pom.xml * (add) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zstd/ZStandardDecompressor.java * (edit) hadoop-common-project/hadoop-common/src/config.h.cmake * (edit) BUILDING.txt * (edit) hadoop-common-project/hadoop-common/src/site/markdown/NativeLibraries.md.vm > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, > HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, > HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, > HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796609#comment-15796609 ] churro morales commented on HADOOP-13578: - [~jlowe] thank you for taking the time to review this patch. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, > HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, > HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, > HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796550#comment-15796550 ] Jason Lowe commented on HADOOP-13578: - Sorry for the long delay, as I just got back from vacation. +1 for the v9 trunk and branch-2 patches. The branch-2 test failures and ASF warnings are unrelated. I'll commit this tomorrow if there are no objections. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, > HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, > HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, > HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756328#comment-15756328 ] Hadoop QA commented on HADOOP-13578: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 52s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 31s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 30s{color} | {color:green} branch-2 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 28s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 28s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 9m 2s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 48s{color} | {color:green} branch-2 passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-project-dist . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 37s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 26s{color} | {color:green} branch-2 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 6m 26s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 30s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 5m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 31s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 6m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 31s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 29s{color} | {color:orange} root: The patch generated 4 new + 161 unchanged - 1 fixed = 165 total (was 162) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 9m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 1s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-project-dist . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 5s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 6m 54s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} | |
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755586#comment-15755586 ] Hadoop QA commented on HADOOP-13578: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 22s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 30s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 5s{color} | {color:green} branch-2 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 50s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 28s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 9m 37s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 56s{color} | {color:green} branch-2 passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-project-dist . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 39s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 10s{color} | {color:green} branch-2 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 6m 7s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 43s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 5m 43s{color} | {color:red} root-jdk1.8.0_111 with JDK v1.8.0_111 generated 1 new + 6 unchanged - 1 fixed = 7 total (was 7) {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 38s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 6m 38s{color} | {color:red} root-jdk1.7.0_121 with JDK v1.7.0_121 generated 1 new + 16 unchanged - 1 fixed = 17 total (was 17) {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 38s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 34s{color} | {color:orange} root: The patch generated 4 new + 161 unchanged - 1 fixed = 165 total (was 162) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-project-dist . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 13s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | |
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755172#comment-15755172 ] Hadoop QA commented on HADOOP-13578: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} HADOOP-13578 does not apply to branch-2. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-13578 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12843629/HADOOP-13578-branch-2.v9.patch | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/11291/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, > HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, > HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, > HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755021#comment-15755021 ] Hadoop QA commented on HADOOP-13578: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 8s{color} | {color:red} HADOOP-13578 does not apply to branch-2. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-13578 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12843619/HADOOP-13578-branch-2.v9.patch | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/11289/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, > HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, > HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, > HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751842#comment-15751842 ] Jason Lowe commented on HADOOP-13578: - Latest patch looks good. The test failures and ASF warnings are unrelated. I filed HDFS-11251 to track the TestDataNodeHotSwapVolumes failure and HDFS-11252 to track the TestFileTruncate failure. YARN-5934 is tracking the TestTimelineWebServices failure. Please attach an equivalent branch-2 patch, and if that looks good then I'll commit this. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, > HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, > HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, HADOOP-13578.v7.patch, > HADOOP-13578.v8.patch, HADOOP-13578.v9.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15750095#comment-15750095 ] Hadoop QA commented on HADOOP-13578: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 9m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-project-dist . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 32s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 9m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 23s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 42s{color} | {color:orange} root: The patch generated 3 new + 159 unchanged - 1 fixed = 162 total (was 160) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 9m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 11s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 8s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 5 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 5s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-project-dist . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 0s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}111m 34s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 44s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}209m 40s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | | | hadoop.hdfs.server.namenode.TestFileTruncate | | | hadoop.yarn.server.timeline.webapp.TestTimelineWebServices | \\ \\ || Subsystem || Report/Notes || | Docker |
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749728#comment-15749728 ] churro morales commented on HADOOP-13578: - Great, thank you [~jlowe] for your patience and taking the time to do so many reviews. The 2x branch patch will be mostly the same code-wise except the code to build and package the libraries is quite different. I'll work on getting a patch ready today. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, > HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, > HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, HADOOP-13578.v7.patch, > HADOOP-13578.v8.patch, HADOOP-13578.v9.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749717#comment-15749717 ] Jason Lowe commented on HADOOP-13578: - We should use this JIRA for the branch-2 patch. Just attach the branch-2 version according to the [patch naming guidelines|https://wiki.apache.org/hadoop/HowToContribute#Naming_your_patch] and the precommit build will run against branch-2 instead of trunk, e.g.: HADOOP-13578-branch-2.v9.patch can be the branch-2 version of the HADOOP-13578.v9.patch. I'd attach the trunk patch first, and if the precommit looks good go ahead and attach the branch-2 version so it can run on that. If you attach both at the same time then precommit will only run on one of them (the first attached, IIRC). > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, > HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, > HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, HADOOP-13578.v7.patch, > HADOOP-13578.v8.patch, HADOOP-13578.v9.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749694#comment-15749694 ] churro morales commented on HADOOP-13578: - No problem will do that and upload the patch, thanks again for taking the time to review. I know we are interested in backports for this feature to the 2.x branch. Should we create a new ticket for that? > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, > HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, > HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, HADOOP-13578.v7.patch, > HADOOP-13578.v8.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749683#comment-15749683 ] Jason Lowe commented on HADOOP-13578: - Yep, that looks good. If you could put the else on the same line as the previous closing bracket to match the coding style that'd be nice as well. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, > HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, > HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, HADOOP-13578.v7.patch, > HADOOP-13578.v8.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749677#comment-15749677 ] churro morales commented on HADOOP-13578: - [~jlowe] in regards to your previous comment in regards to not returning after throwing. I can make the following change: {code} size_t size = dlsym_ZSTD_compressStream(stream, , ); if (dlsym_ZSTD_isError(size)) { THROW(env, "java/lang/InternalError", dlsym_ZSTD_getErrorName(size)); return (jint) 0; } if (finish && input.pos == input.size) { // end the stream, flush and write the frame epilogue size = dlsym_ZSTD_endStream(stream, ); if (!size) { (*env)->SetBooleanField(env, this, ZStandardCompressor_finished, JNI_TRUE); } } else { // need to flush the output buffer // this also updates the output buffer position. size = dlsym_ZSTD_flushStream(stream, ); } if (dlsym_ZSTD_isError(size)) { THROW(env, "java/lang/InternalError", dlsym_ZSTD_getErrorName(size)); return (jint) 0; } {code} If you are okay with this I can put up a new patch containing this fix. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, > HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, > HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, HADOOP-13578.v7.patch, > HADOOP-13578.v8.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749643#comment-15749643 ] Hadoop QA commented on HADOOP-13578: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 9m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-project-dist . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 36s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 10m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 11s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 58s{color} | {color:orange} root: The patch generated 3 new + 159 unchanged - 1 fixed = 162 total (was 160) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 12s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 9s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 5 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 5s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-project-dist . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 55s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 58s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 42s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}117m 16s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.security.token.delegation.web.TestWebDelegationToken | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | HADOOP-13578 | | JIRA Patch URL |
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749618#comment-15749618 ] churro morales commented on HADOOP-13578: - Yes you are correct, ZSTD_compressStream_generic does nothing if the stage is final, so that part looks good to me. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, > HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, > HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, HADOOP-13578.v7.patch, > HADOOP-13578.v8.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749570#comment-15749570 ] Jason Lowe commented on HADOOP-13578: - One more thing I just thought of -- if we end up having to loop around and call endStream more than once, is it OK that we called ZSTD_compressStream between the two calls to ZSTD_endStream? The call will be degenerate with input size == 0, but I'm wondering if that's OK from the zstd lib API perspective. Actually I think we're OK here since ZSTD_endStream will do a similar degenerate ZSTD_compressStream call internally if it isn't in the final compression stage, and it looks like ZSTD_compressStream explicitly does nothing if the compressor is in the final stage. So I think we're good, but it'd be great if you could double-check. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, > HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, > HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, HADOOP-13578.v7.patch, > HADOOP-13578.v8.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749557#comment-15749557 ] Jason Lowe commented on HADOOP-13578: - Thanks for updating the patch! It's really close now. We need to check for an error if ZSTD_endStream returns a failure, otherwise we could end up looping forever on a persistent error being returned from ZSTD_endStream. The outer compress logic will keep calling compress with finish=true trying to wrap things up and the JNI code will keep avoiding setting the finished flag because ZSTD_endStream isn't returning 0. There's no return after the THROW macros here, so we can fall through and call ZSTD_flushStream after ZSTD_compressStream returns an error. That means the error being reported from compress might be clobbered by the error being returned from flush if they are different. Also we can just move the error check on "size" up inside the flush block since it only applies to that case, or change "remaining_to_flush" to "size" and let the previous point be fixed by falling through to the common error check on "size". {code} size_t size = dlsym_ZSTD_compressStream(stream, , ); if (dlsym_ZSTD_isError(size)) { THROW(env, "java/lang/InternalError", dlsym_ZSTD_getErrorName(size)); } if (finish && input.pos == input.size) { // end the stream, flush and write the frame epilogue size_t const remaining_to_flush = dlsym_ZSTD_endStream(stream, ); if (!remaining_to_flush) { (*env)->SetBooleanField(env, this, ZStandardCompressor_finished, JNI_TRUE); } } else { // need to flush the output buffer // this also updates the output buffer position. size = dlsym_ZSTD_flushStream(stream, ); } if (dlsym_ZSTD_isError(size)) { THROW(env, "java/lang/InternalError", dlsym_ZSTD_getErrorName(size)); } {code} > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, > HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, > HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, HADOOP-13578.v7.patch, > HADOOP-13578.v8.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749292#comment-15749292 ] Hadoop QA commented on HADOOP-13578: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 9m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-project-dist . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 29s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 9m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 22s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 43s{color} | {color:orange} root: The patch generated 3 new + 159 unchanged - 1 fixed = 162 total (was 160) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 12s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 8s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 5 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 5s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-project-dist . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 38s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 10s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 42s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}112m 47s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.ha.TestZKFailoverController | | | hadoop.security.token.delegation.web.TestWebDelegationToken | | | hadoop.io.compress.zstd.TestZStandardCompressorDecompressor | | | hadoop.net.TestDNS | \\ \\ || Subsystem ||
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15748972#comment-15748972 ] churro morales commented on HADOOP-13578: - fair enough, thats an easy check to make instead of checking if (finish) in the ZStandardCompressor.c we can just do a if (finish && input.size == input.pos) which will ensure we have at least compressed everything in the input buffer before writing the final frame. Here is what is included in the updated patch: Fixed the zstd buffer constant naming. Fixed the constructor for ZStandardCompressor Removed finished from the ZStandardDirectDecompressor. Reverted the finished logic in the ZStandardCompressor back to its original state return after we throw in any c code. Parameterized the inflateBytes method such that we don't have to slice the buffer anymore. Took care of the endStream with data still left in the buffer Also fixed the remaining checkstyle issues. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, > HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, > HADOOP-13578.v5.patch, HADOOP-13578.v6.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746346#comment-15746346 ] Jason Lowe commented on HADOOP-13578: - The thing I'm worried about is that when we call ZSTD_compressStream we are passing descriptors for both the input buffer and the output buffer. When we call ZSTD_endStream we are only passing the descriptor for the output buffer. Therefore I don't know how ZSTD_endStream is supposed to finish consuming any input that ZSTD_compressStream didn't get to if it doesn't have access to that input buffer descriptor. Looking at the zstd code you'll see that when it does call ZSTD_compressStream inside ZSTD_endStream, it's calling it with srcSize == 0. That means there is no more source to consume. So if the last call of the JNI code to ZSTD_compressStream did not fully consume the input buffer's data (i.e.: input pos is not moved to the end of the data) then it looks like calling ZSTD_endStream will simply flush out what input data did make it and then end the frame. That matches what the documentation for ZSTD_endStream says. So I still think we need to make sure we do not call ZSTD_endStream if input.pos is not at the end of the input buffer after we call ZSTD_compressStream, or we risk losing the last chunk of data if the zstd library for some reason cannot fully consume the input buffer when we try to finish. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, > HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, > HADOOP-13578.v5.patch, HADOOP-13578.v6.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15743180#comment-15743180 ] churro morales commented on HADOOP-13578: - HI [~jlowe] Thanks for taking the time to review. I agree with all of the above comments and will correct those issues. The last question you had was related to the ZSTD_endStream(). The endStream() finishes the frame and writes the epilogue only if the uncompressed buffer has been fully consumed. Otherwise it basically does the same thing as ZSTD_compressStream(). You are correct, if the output buffer is too small it may not be able to flush. There is a check in ZSTD_endStream() which does this: {code} size_t const notEnded = ZSTD_compressStream_generic(zcs, ostart, , , , zsf_end); size_t const remainingToFlush = zcs->outBuffContentSize - zcs->outBuffFlushedSize; op += sizeWritten; if (remainingToFlush) { output->pos += sizeWritten; return remainingToFlush + ZSTD_BLOCKHEADERSIZE /* final empty block */ + (zcs->checksum * 4); } // Create the epilogue and flush the epilogue {code} so if there is still data to be consumed the library wont finish the frame, thus making it safe to call repeatedly with our framework because we never set the finished flag until the epilogue has been written successfully. The code in the CompressorStream.java which calls our codec simply does this: {code} @Override public void finish() throws IOException { if (!compressor.finished()) { compressor.finish(); while (!compressor.finished()) { compress(); } } } {code} So I believe we wont drop any data with the way things are done. Please let me know if I am missing something obvious here :). > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, > HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, > HADOOP-13578.v5.patch, HADOOP-13578.v6.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15742894#comment-15742894 ] Jason Lowe commented on HADOOP-13578: - Thanks for updating the patch! Sorry for the delay in re-review, as I was travelling last week. The ASF license warnings appear to be unrelated, as does the unit test failure. Comments on the patch: IO_COMPRESSION_CODEC_ZSTD_BUFFER_SIZE should be IO_COMPRESSION_CODEC_ZSTD_BUFFER_SIZE_DEFAULT The ZStandardCompressor(level, bufferSize) constructor should be implemented in terms of ZStandardCompressor(level, inputBufferSize, outputBufferSize) to reduce the code redundancy and improve maintainability. If inflateBytesDirect took a dest position then we wouldn't need to slice the destination buffer, avoiding a temporary objection allocation. I'm not sure about this snippet in ZStandardDirectDecompressor#decompress: {code} endOfInput = !src.hasRemaining(); super.finished = !src.hasRemaining(); {code} After adding the super.finished change it appears the endOfInput and finished variables will simply reflect each other, so endOfInput would be redundant. Instead I think endOfInput may still be necessary, and the finished variable handling will already be taken care of by the JNI code. In other words I think we don't need the super.finished line that was added. However maybe I'm missing something here. I think we need to return after these THROW macros or risk passing null pointers to the zstd library: {code} void * uncompressed_bytes = (*env)->GetDirectBufferAddress(env, uncompressed_direct_buf); if (!uncompressed_bytes) { THROW(env, "java/lang/InternalError", "Undefined memory address for uncompressedDirectBuf"); } // Get the output direct buffer void * compressed_bytes = (*env)->GetDirectBufferAddress(env, compressed_direct_buf); if (!compressed_bytes) { THROW(env, "java/lang/InternalError", "Undefined memory address for compressedDirectBuf"); } {code} Same comments for similar code on the decompressor JNI side. Related to this code: {code} if (remaining_to_flush) { (*env)->SetBooleanField(env, this, ZStandardCompressor_finished, JNI_FALSE); } else { (*env)->SetBooleanField(env, this, ZStandardCompressor_finished, JNI_TRUE); } {code} What I meant by my previous review was to have the JNI layer clear the finished flag in any case we didn't set it, including the case where the finish flag isn't asking us to wrap things up. Actually thinking about this further, I'm not sure we need the case where we set it to false. We just need to set it to true in the JNI layer when the end flush indicates there aren't any more bytes to flush. In every other case finished should already be false, and the user will need to call reset() to clear both the finish and finished flags to continue with more input. So I think we can put this back to the way it was. Sorry for the confusion on my part. If there are still bytes left to consume in the uncompressed buffer then I don't think we want to call ZSTD_endStream. We should only be calling ZSTD_endStream when the uncompressed buffer has been fully consumed, correct? Otherwise we may fail to compress the final chunk of input data when the user sets the finish flag and the zstd library doesn't fully consume the input on the ZSTD_compressStream call. That could result in a "successful" compression that drops data. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, > HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, > HADOOP-13578.v5.patch, HADOOP-13578.v6.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727492#comment-15727492 ] Hadoop QA commented on HADOOP-13578: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 19m 21s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 40s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 9m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 18s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-project-dist . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 25s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 45s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 9m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 21s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 43s{color} | {color:orange} root: The patch generated 64 new + 159 unchanged - 1 fixed = 223 total (was 160) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 11s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 9s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 5 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 5s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-project-dist . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 20s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}114m 31s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 51s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}234m 38s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.timeline.webapp.TestTimelineWebServices | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | HADOOP-13578 | | JIRA Patch URL |
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726997#comment-15726997 ] churro morales commented on HADOOP-13578: - I'll also take care of the java checkstyle issues and upload a patch provided you are good with this patch. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, > HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, > HADOOP-13578.v5.patch, HADOOP-13578.v6.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726853#comment-15726853 ] Hadoop QA commented on HADOOP-13578: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 35s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-project-dist . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 10s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 10m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 44s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 50s{color} | {color:orange} root: The patch generated 64 new + 159 unchanged - 1 fixed = 223 total (was 160) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 14s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 10s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 5 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 6s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-project-dist . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 47s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 15m 10s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 51s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}121m 38s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.io.compress.zstd.TestZStandardCompressorDecompressor | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | HADOOP-13578 | | JIRA Patch URL |
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713137#comment-15713137 ] Jason Lowe commented on HADOOP-13578: - I forgot to mention it would also be good to address at least some of the checkstyle issues, such as the indentation levels and line-length. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, > HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713129#comment-15713129 ] Jason Lowe commented on HADOOP-13578: - Thanks for updating the patch! I'm torn on the buffer size, whether we should use the generic io.file.buffer.size or have a zst-specific config. If the codec can significantly speed up with a larger default buffer size than the file buffer size then maybe we should consider having a separate, zst-specific buffer size config. I'd recommend a zst-specific config key with a default value of 0 that means use whatever buffer size the zst library wants to use, but the user can override it to a custom size. That way users can change the zst codec buffer size (again, useful for keeping memory usage reasonable for very wide sort factor merges) but not change the buffer sizes of anything else. Similarly, someone changing the default buffer size to something relatively small (e.g.: 4K) for some unrelated use case could unknowingly hamper the performance of the zst codec. Regarding the test input file, is it copyrighted or do we otherwise have the redistribution rights? Also the .zst binary test file is missing from the patch. *ZStandardCodec.java* Nit: should import the specific things needed rather than wildcard imports. *ZStandardCompressor.java* reinit calls reset then init, but reset already is calling init. LOG should be private rather than public. The LOG.isDebugEnabled check is unnecessary. Nit: The VisibleForTesting annotation should be on a separate line like the other annotations. *ZStandardCompressor.c:* Shouldn't we throw if GetDirectBufferAddress returns null? >From the zstd documentation, ZSTD_compressStream may not always fully consume >the input. Therefore if the finish flag is set in deflateBytesDirect I think >we need to avoid calling ZTD_endStream if there is still bytes left in the >input or we risk dropping some of the input. We can just return without >setting the finished flag and expect to get called again from the outer >compression loop logic. Similarly the documentation for ZSTD_endStream says that it may not be able to flush the full contents of the data in one call, and if it indicates there is more data left to flush then we should call it again. So if we tried to end the stream and there's more data left then we shouldn't throw an error. Instead we simply leave the finished flag unset and return so the outer logic loop will empty the output buffer and call compress() again. We will then again call ZSTD_endStream to continue attempting to finish the compression. Bonus points for adding a test case that uses a 1-byte output buffer to demonstrate the extreme case is working properly. Is it necessary to call ZSTD_flushStream after every ZSTD_compressStream call? Seems to me we can skip it entirely, but I might be missing something. *ZStandardDecompressor.java:* LOG should be private rather than public. Should there be a checkStream call at the beginning of decompress()? This would mirror the same check in the compressor for the corresponding compress() method. In inflateDirect, the setting of preSliced = dst in the conditional block is redundant since it was just done before the block. Actually I think it would be faster and create less garbage to avoid creating a temporary ByteBuffer and have the JNI code to lookup the output buffer's position and limit and adjust accordingly. That way we aren't creating an object each time. The swapping of variables in inflateDirect is cumbersome. It would be much cleaner if inflateBytesDirect took arguments so we can pass what the JNI needs directly to it rather than shuffling the parameters into the object's fields before we call it. That also cleans up the JNI code a bit since it doesn't need to do many of the field lookups. For example: {code} private native int inflateBytesDirect(ByteBuffer src, int srcOffset, int srcLen, ByteBuffer dst, int dstOffset, int dstLen); {code} We could simplify the variables even more if the JNI code called the ByteBuffer methods to update the positions and limits directly rather than poking the values into the Java object and having Java do it when the JNI returns. I'll leave the details up to you, just pointing out that we can clean this up and avoid a lot of swapping. We could do a similar thing on the compressor side. I'm a little confused about the endOfInput flag for the direct decompressor. I assume it's for handling the case where there are multiple frames within the input buffer. The JNI code will set finished = true once it hits the end of the frame, and it appears it will remain that way even if more input is sent (i.e.: finished will go from false to true after the first frame in the input buffer and remain true throughout all subsequent frames). It makes me wonder if we should be
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15703449#comment-15703449 ] Hadoop QA commented on HADOOP-13578: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 9m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-project-dist . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 29s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 9m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 30s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 46s{color} | {color:orange} root: The patch generated 73 new + 159 unchanged - 1 fixed = 232 total (was 160) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 9m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 11s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 9s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 2383 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 5s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-project-dist . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 32s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 59s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 44s{color} | {color:red} The patch generated 3 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}113m 44s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.net.TestDNS | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | HADOOP-13578 | | JIRA Patch URL |
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15678157#comment-15678157 ] churro morales commented on HADOOP-13578: - Hi Jason thanks for the review. I figured out the memory issue and will submit a patch monday, it was quite a bit of fun trying to figure out why the JVM was segfaulting for particular scenarios :). io.compression.codec.zst.buffersize should be io.compression.codec.zstd.buffersize to have a consistent prefix with io.compression.codec.zstd.level In ZStandardCodec#createOutputStream and ZStandardCodec#createInputStream I'm curious why we're not honoring the user's specified buffer size and just using the default buffer size. Many codecs use io.file.buffer.size here, but others use a codec-specific config property. bq. I totally agree, I'm going to just use io.file.buffer.size and that will be the buffer size and we will honor it. In ZStandardCompressor and ZStandardDecompressor, the buffers should simply be declared as ByteBuffer instead of Buffer. That way the ugly casting when doing bulk get/put with them is unnecessary. bq. done In ZStandardDecompressor it's a little confusing that some buffer length variables like userBufLen reflect the number of bytes remaining to be consumed in the buffer, while compressedDirectBufLen reflects the number of bytes originally placed in the buffer, so the real length of data to be consumed is (compressedDirectBufLen - compressedDirectBufOffset). Maybe a variable name change would help notify future maintainers of the semantic difference with similar variables for other buffers, or we should change the semantics so that variable is consistent with the others? bq. Yeah that is confusing, I'll change the name of the variables. I was just following on how other Decompressors did their naming. I'm a little wary of the public getBytesWritten and getBytesRead methods in ZStandardDecompressor. getBytesWritten doesn't appear to be all that useful since the caller can trivially compute it based on the amount of data they read from the input stream the decompressor is writing. Also both of these methods reset values to zero at frame boundaries in the input stream, which may confuse callers who are comparing these values to the entire compressed input data size. I'd recommend removing getBytesWritten which is unused and making getBytesRead package-private just for unit testing. bq. I originally had that for testing purposes, I totally forgot to remove it. I'll get rid of those methods entirely. It would be nice to change the logging from org.apache.commons.logging to org.slf4j. Sorry I didn't catch this earlier. We're supposed to be slowly migrating to SLF4J, and new code should be using it. Fortunately it's an easy change, and we can remove the conditional checks for debug logging. bq. sure thing Do we need to have test_file.txt? I'm wondering if simply generating random data (maybe with a constant seed so the test is deterministic) into a byte buffer would be sufficient input for testing. Typically we want to avoid test files in the source tree unless there's something very specific about the file's contents necessary to exercise a unit test case. bq. Sorry, I forgot to commit the other file that went along: test_file.txt.zst. That file was compressed using the command line utility and wanted to make sure we could go back and forth between command line and hadoop. I think that is useful to have as a test case as if anyone changes that behavior down the line, tests will break and it should be pretty clear as to why. I will fix all coding style issues and other minor nits no problem. Again thank you so much for taking the time to review this patch. I will get the hopefully final patch up Monday. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, > HADOOP-13578.v2.patch, HADOOP-13578.v3.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677114#comment-15677114 ] Jason Lowe commented on HADOOP-13578: - Looking forward to the updated patch with the memory fixes. Some comments on the latest patch. I didn't get time to look at the native code or unit test changes in detail, but here are the comments I had thus far: io.compression.codec.zst.buffersize should be io.compression.codec.zstd.buffersize to have a consistent prefix with io.compression.codec.zstd.level In ZStandardCodec#createOutputStream and ZStandardCodec#createInputStream I'm curious why we're not honoring the user's specified buffer size and just using the default buffer size. Many codecs use io.file.buffer.size here, but others use a codec-specific config property. In ZStandardCompressor and ZStandardDecompressor, the buffers should simply be declared as ByteBuffer instead of Buffer. That way the ugly casting when doing bulk get/put with them is unnecessary. In ZStandardDecompressor it's a little confusing that some buffer length variables like userBufLen reflect the number of bytes remaining to be consumed in the buffer, while compressedDirectBufLen reflects the number of bytes originally placed in the buffer, so the real length of data to be consumed is (compressedDirectBufLen - compressedDirectBufOffset). Maybe a variable name change would help notify future maintainers of the semantic difference with similar variables for other buffers, or we should change the semantics so that variable is consistent with the others? I'm a little wary of the public getBytesWritten and getBytesRead methods in ZStandardDecompressor. getBytesWritten doesn't appear to be all that useful since the caller can trivially compute it based on the amount of data they read from the input stream the decompressor is writing. Also both of these methods reset values to zero at frame boundaries in the input stream, which may confuse callers who are comparing these values to the entire compressed input data size. I'd recommend removing getBytesWritten which is unused and making getBytesRead package-private just for unit testing. It would be nice to change the logging from org.apache.commons.logging to org.slf4j. Sorry I didn't catch this earlier. We're supposed to be slowly migrating to SLF4J, and new code should be using it. Fortunately it's an easy change, and we can remove the conditional checks for debug logging. Do we need to have test_file.txt? I'm wondering if simply generating random data (maybe with a constant seed so the test is deterministic) into a byte buffer would be sufficient input for testing. Typically we want to avoid test files in the source tree unless there's something very specific about the file's contents necessary to exercise a unit test case. Typos and coding style minor nits: - ZStandardCodec -- conf should be private in ZStandardCodec since it isn't accessed outside of the class. -- "objec." s/b "object." and "ZStandard Decompressor" s/b "ZStandardDecompressor" - ZStandardCompressor -- Comment refers to ZLIB format -- Missing whitespace in "(b== null)" in setInput -- Missing braces around "keepUncompressedBuf && uncompressedDirectBufLen > 0" conditional block in needsInput -- 'else' should be on same line as closing brace in compress -- 'atmost' s/b 'at most' -- Missing braces around conditional block in checkStream - ZStandardDecompressor -- Annotation should appear on line before getDirectBufferSize method. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, > HADOOP-13578.v2.patch, HADOOP-13578.v3.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15671488#comment-15671488 ] churro morales commented on HADOOP-13578: - found a small issue with the latest patch which has a connection leak as well as closing a stream that doesn't exist. I will fix this and put up another patch in a few hours. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, > HADOOP-13578.v2.patch, HADOOP-13578.v3.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668912#comment-15668912 ] Hadoop QA commented on HADOOP-13578: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 25s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 6m 30s{color} | {color:red} root in trunk failed. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 7s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-project-dist . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 43s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 38s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 9m 38s{color} | {color:red} root generated 5 new + 2 unchanged - 0 fixed = 7 total (was 2) {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 9m 38s{color} | {color:red} root generated 513 new + 179 unchanged - 0 fixed = 692 total (was 179) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 44s{color} | {color:orange} root: The patch generated 68 new + 91 unchanged - 1 fixed = 159 total (was 92) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 13s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 10s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 2385 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 6s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-project-dist . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 46s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 47m 49s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 47s{color} | {color:red} The patch generated 3 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}147m 43s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.crypto.key.kms.server.TestKMS | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue |
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668282#comment-15668282 ] Jason Lowe commented on HADOOP-13578: - Cool, thanks for the update and looking into things. I was hoping with the discovery that the zstd streaming interface can seemingly take arbitrary input and output buffer sizes that we could avoid the headaches of copying leftover input data, etc. Instead we keep feeding input data to ZSTD_decompressStream in a loop until either we run out of input data or the output buffer is full. As long as there is still input data remaining (i.e.: output buffer filled) then we don't need more input and therefore don't need to shuffle data around in the input buffer. Hopefully that makes implementing the Decompressor less clunky than before. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, > HADOOP-13578.v2.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668094#comment-15668094 ] churro morales commented on HADOOP-13578: - [~jlowe] Thanks for the feedback. As for your concerns: bq. The problem with ignoring the Codec interface is that code using the Compressor/Decompressor interfaces will break if it happens to get the zstd codec but work for all others. That's not really adding a full Codec for zstd. I agree it's not the most ideal interface for the way zstd wants to be called, but that doesn't change the fact that existing code that should work as-is on a Hadoop-provided codec will not if we don't implement the Compressor/Decompressor interfaces. As much as it pains me, I agree with your sentiments. I will implement the Compressor / Decompressor interfaces. bq. It looks like the zstd code already includes a wrapper that translates the zlib API into the zstd API (see https://github.com/facebook/zstd/tree/dev/zlibWrapper) which we can maybe leverage or at least use as a model for the Compressor/Decompressor implementations. Good point, I did look at using the zlib wrapper originally, but in v1.0.0 (https://github.com/facebook/zstd/tree/v1.0.0/zlibWrapper) they didn't have support for deflateReset and inflateReset which would have errored out and broken our code. No worries though, I think I have it figured out using the Compressor / Decompressor logic without having to reference the zlibwrapper. bq. To be clear, I think it's a good thing that the compressor/decompressor streams are using zstd more natively, but I also think it's important to not ignore the non-stream Codec APIs. I think the reason we need to implement the compressor / decompressor is more for the reuse from the CodecPool. Each compressor / decompressor / codec is very much tied to whether the compression library has been implemented using a stream based or block based approach. From what I can see, the API is called as follows: {code} Compressor compressor = CodecPool.getCompressor(); codec.createOutputStream(OutputStream, compressor); // do work {code} And when we look at specific implementations of codec.createOutputStream(), codec's like Snappy always returns a Block(Compressor|Decompressor)Stream and it seems to me that the Snappy(Compressor|Decompressor) can only work on these type of streams. Looks like the way it is used everywhere (including hbase) is that we always get a stream and work on it, the compressor / decompressor are passed in so we have a pool and don't have to constantly re-initialize these native libs. For example I wrote this test, and if you use the Compression(Input|Output)Streams that are native to the codec the test will pass. But if you create a Compressor/Decompressor and use a Compression(Input|Output)Stream that is not specific to this codec the test will fail. {code:java} @Test public void testHandlingWithStreams() throws Exception { byte[] bytes = BytesGenerator.get(1024*64); ByteArrayInputStream inputStream = new ByteArrayInputStream(bytes); SnappyCodec codec = new SnappyCodec(); codec.setConf(new Configuration()); Compressor compressor = codec.createCompressor(); ByteArrayOutputStream baos = new ByteArrayOutputStream(); CompressionOutputStream compressorStream = new CompressorStream(baos, compressor); // if you replace compressorStream with // CompressionOutputStream compressorStream = codec.createOutputStream(baos, compressor); // things work fine. byte[] buffer = new byte[100]; int result; while ((result = inputStream.read(buffer, 0, buffer.length)) != -1) { compressorStream.write(buffer, 0, result); } compressorStream.flush(); compressorStream.finish(); // lets make sure the compressed bytes are able to be decompressed to read byte[] compressedBytes = baos.toByteArray(); ByteArrayInputStream bais = new ByteArrayInputStream(compressedBytes); baos = new ByteArrayOutputStream(); Decompressor decompressor = codec.createDecompressor(); CompressionInputStream decompressorStream = new DecompressorStream(bais, decompressor); // if you replace decompressorStream with // CompressionInputStream decompressorStream = codec.createInputStream(bais, decompressor); // things work fine. byte[] decompressionBuffer = new byte[100]; while ((result = decompressorStream.read(decompressionBuffer, 0, buffer.length)) != -1) { baos.write(decompressionBuffer, 0, result); } decompressorStream.close(); byte[] decompressBytes = baos.toByteArray(); assertArrayEquals(bytes, decompressBytes); } {code} bq. I'm also not so sure about the minimum buffer assertion. I saw in the zstd unit tests there is a byte-by-byte streaming decompression test where it tries to decompress a buffer with a 1-byte input buffer
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664738#comment-15664738 ] Jason Lowe commented on HADOOP-13578: - The problem with ignoring the Codec interface is that code using the Compressor/Decompressor interfaces will break if it happens to get the zstd codec but work for all others. That's not really adding a full Codec for zstd. I agree it's not the most ideal interface for the way zstd wants to be called, but that doesn't change the fact that existing code that should work as-is on a Hadoop-provided codec will not if we don't implement the Compressor/Decompressor interfaces. It looks like the zstd code already includes a wrapper that translates the zlib API into the zstd API (see https://github.com/facebook/zstd/tree/dev/zlibWrapper) which we can maybe leverage or at least use as a model for the Compressor/Decompressor implementations. To be clear, I think it's a good thing that the compressor/decompressor streams are using zstd more natively, but I also think it's important to not ignore the non-stream Codec APIs. I'm also OK if the initial implementation of the zstd Compressor/Decompressor is not super optimal. One tactic would be to always copy the input data to an internal input buffer and track the amount of data zstd wants. Then the needsInput method is easy, just return true when the amount of data in the internal input buffer is less than what zstd last requested. The decompress method would return 0 if needInput would return true. Not super efficient since we make a copy of all the input data before passing it to zstd, but it should be relatively straightforward to implement. I'm also not so sure about the minimum buffer assertion. I saw in the zstd unit tests there is a byte-by-byte streaming decompression test where it tries to decompress a buffer with a 1-byte input buffer and a 1-byte output buffer. I also verified this independently by applying the following changes to the streaming_decompression.c example: {noformat} $ diff -u streaming_decompression.c.orig streaming_decompression.c --- streaming_decompression.c.orig 2016-11-14 18:39:43.423069894 + +++ streaming_decompression.c 2016-11-14 19:08:36.395286748 + @@ -63,10 +63,10 @@ static void decompressFile_orDie(const char* fname) { FILE* const fin = fopen_orDie(fname, "rb"); -size_t const buffInSize = ZSTD_DStreamInSize(); +size_t const buffInSize = 1; void* const buffIn = malloc_orDie(buffInSize); FILE* const fout = stdout; -size_t const buffOutSize = ZSTD_DStreamOutSize(); /* Guarantee to successfully flush at least one complete compressed block in all circumstances. */ +size_t const buffOutSize = 1; void* const buffOut = malloc_orDie(buffOutSize); ZSTD_DStream* const dstream = ZSTD_createDStream(); @@ -80,6 +80,7 @@ while (input.pos < input.size) { ZSTD_outBuffer output = { buffOut, buffOutSize, 0 }; toRead = ZSTD_decompressStream(dstream, , ); /* toRead : size of next compressed block */ + toRead = (toRead > buffInSize) ? buffInSize : toRead; if (ZSTD_isError(toRead)) { fprintf(stderr, "ZSTD_decompressStream() error : %s \n", ZSTD_getErrorName(toRead)); exit(12); } fwrite_orDie(buffOut, output.pos, fout); } {noformat} With those changes the streaming_decompression example was able to decompress zstd files even though it was using an extremely small input and output buffer. So even though I'm not giving the zstd library a full block of input it's able to compensate. bq. The finished field is more to reinitialize the zstd stream than the actual java decompressor stream. Ah, I see that now. Thanks for the explanation. bq. So we could allow the user to configure and take the larger of the 2 buffer sizes in case they configure it with a buffer size that is too small. See input/output buffer discussion above. I don't think we need to apply safety rails to what the user specifies because the zstd library apparently can accommodate. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, > HADOOP-13578.v2.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail:
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15657674#comment-15657674 ] churro morales commented on HADOOP-13578: - bq. I'm confused why the codec returns null for getCompressorType, createCompressor, getDecompressorType, and createDecompressor. This is going to break code that expects to wield the compressor/decompressor rather than streams directly. I would expect it to work more like the DefaultCodec or BZip2Codec where the codec returns appropriate Compressor and Decompressor classes and leverages the existing CompressorStream, DecompressorStream, and CompressionCodec.Util classes to handle the stream interfaces of Codec. For example, the CompressorStream abstract class already handles the single-byte write method, protection from double-finish and double-close, the DecompressorStream already implemetns skip, etc. The reason for this was that in testing the code, namely the decompressor logic. The smallest unit of work in zstandard is a block. I noticed many cases during decompression where I would fill up the output buffer and still have some data left in my input buffer. The data in my input buffer was not a complete block of compressed data and thus I was having quite a few issues when using the decompressor API which I felt was really clunky, you have to return special values to see if you need input and then you have no control over how much of the buffer you fill with compressed data. For example if I have half a block of compressed data at the end of the input buffer, it was really difficult to state I need more data in the input buffer, because I don't have a complete compressed block, I have to move that partial compressed block to the beginning of the input buffer I also have to ensure that needsInput now returns true, even though I have compressed data to decompress (but not enough for a block) The entire streaming decompressor API is built around Bzip2 and Zlib which if you look have the exact same C API's and even return the same values for their respective inflate / deflate methods. I would much prefer to have an inputstream / outputstream and just have my library compress / decompress on that stream itself because the zstd api does not match that of zlib or bzip2. bq. The original approach used direct byte buffers, but the new patch no longer does. All the other codecs leverage direct byte buffers, so I'm curious about the reasoning for that change. I'm not a JVM expert, but I'm wondering if the *PrimitiveArrayCritical methods have unfavorable impacts on other threads in the JVM (e.g.: due to pausing GC or other effects). Given that GetPrimitiveArrayCritical could trigger an array copy in order to perform it's task and we have to copy the bytes from the output buffer into the output stream anyway, I would expect direct byte buffers to be faster for at least the output buffer case. This is from the oracle documentation: Here is the signature of GetPrimativeArrayCritical: void * GetPrimitiveArrayCritical(JNIEnv *env, jarray array, jboolean *isCopy); Note that GetPrimitiveArrayCritical might still make a copy of the array if the VM internally represents arrays in a different format. Therefore we need to check its return value against NULL for possible out of memory situations. Looks like for most of the jvm's on linux platforms it references a memory address, but we can still use direct byte buffers if you feel strongly about it. At least for the output buffer case as you had stated. I didnt notice any performance hit on my local test environment but can look further into this. bq. Speaking of double-finish, it looks like that could be problematic and lead to double-free's in the native code layer. In addition to leveraging CompressorStream/DecompressorStream as mentioned above to help with this, we could zero the stream field after we are finished and check for a non-null stream context before doing operations. We should throw NPE if the caller passes a null for the source data buffer. Similarly the native code should throw an error if it cannot obtain pointers to the Java buffers. Currently it just silently returns no progress which will result in an infinite loop in practice as it tries to reach the end of the input and never gets there if the error keeps occurring on each JNI invocation. totally agree, will fix this. bq. The documentation for the zstd streaming interface mentions that flushing or ending a stream may require multiple invocations in order to fully accomplish the task, but I don't see corresponding loops in the java code to handle that scenario: Will make sure to take care of this scenario. bq. IOUtils.closeQuietly is being called on the output stream which could lead to silent data corruption. If there was an issue writing out the last bits of data as a result of the
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633647#comment-15633647 ] Jason Lowe commented on HADOOP-13578: - Thanks for updating the patch! It's great to see compatibility between the zstd CLI and the Hadoop codec. I'm confused why the codec returns null for getCompressorType, createCompressor, getDecompressorType, and createDecompressor. This is going to break code that expects to wield the compressor/decompressor rather than streams directly. I would expect it to work more like the DefaultCodec or BZip2Codec where the codec returns appropriate Compressor and Decompressor classes and leverages the existing CompressorStream, DecompressorStream, and CompressionCodec.Util classes to handle the stream interfaces of Codec. For example, the CompressorStream abstract class already handles the single-byte write method, protection from double-finish and double-close, the DecompressorStream already implemetns skip, etc. The original approach used direct byte buffers, but the new patch no longer does. All the other codecs leverage direct byte buffers, so I'm curious about the reasoning for that change. I'm not a JVM expert, but I'm wondering if the *PrimitiveArrayCritical methods have unfavorable impacts on other threads in the JVM (e.g.: due to pausing GC or other effects). Given that GetPrimitiveArrayCritical could trigger an array copy in order to perform it's task and we have to copy the bytes from the output buffer into the output stream anyway, I would expect direct byte buffers to be faster for at least the output buffer case. Speaking of double-finish, it looks like that could be problematic and lead to double-free's in the native code layer. In addition to leveraging CompressorStream/DecompressorStream as mentioned above to help with this, we could zero the stream field after we are finished and check for a non-null stream context before doing operations. We should throw NPE if the caller passes a null for the source data buffer. Similarly the native code should throw an error if it cannot obtain pointers to the Java buffers. Currently it just silently returns no progress which will result in an infinite loop in practice as it tries to reach the end of the input and never gets there if the error keeps occurring on each JNI invocation. The documentation for the zstd streaming interface mentions that flushing or ending a stream may require multiple invocations in order to fully accomplish the task, but I don't see corresponding loops in the java code to handle that scenario: {noformat} * At any moment, it's possible to flush whatever data remains within buffer, using ZSTD_flushStream(). * `output->pos` will be updated. * Note some content might still be left within internal buffer if `output->size` is too small. * @return : nb of bytes still present within internal buffer (0 if it's empty) *or an error code, which can be tested using ZSTD_isError(). * * ZSTD_endStream() instructs to finish a frame. * It will perform a flush and write frame epilogue. * The epilogue is required for decoders to consider a frame completed. * Similar to ZSTD_flushStream(), it may not be able to flush the full content if `output->size` is too small. * In which case, call again ZSTD_endStream() to complete the flush. {noformat} IOUtils.closeQuietly is being called on the output stream which could lead to silent data corruption. If there was an issue writing out the last bits of data as a result of the close call then this will silently eat the error. The user is left with a "successful" operation that did not actually succeed. The getLibraryName native method should do something sane for the non-UNIX case, like returning the Unavailable string as some codecs do when they can't compute it in the UNIX case. Bonus points for adding a WINDOWS case, and it looks like we can model after the Windows implementations of getLibraryName in the other codecs. In the decompressor, why are we finished when we decode a frame? I thought it was valid for a ZStandard compression stream to be made up of one or more frames, so if there is more input after decoding a frame it seems like the prudent thing to do is try decoding another frame. It would be nice to allow the input and output buffer sizes to be configurable. When used as part of wide merge sorts, there are going to be a lot of codec instances. It may be necessary to use a smaller buffer size per codec to reduce the merge's memory footprint due to all those codec I/O buffers. It makes sense to use the library-recommended size as the default value, but it should be straightforward to allow the user to override this. Nit: srcSize is not really a size in the following code but rather a terminating offset. Renaming it to something like endOffset would be more clear, since
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630357#comment-15630357 ] Hadoop QA commented on HADOOP-13578: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 24s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 1s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 58s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 12m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 22s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-project-dist . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 35s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 26s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 37s{color} | {color:orange} root: The patch generated 56 new + 91 unchanged - 1 fixed = 147 total (was 92) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 14s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 10s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 132 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 4s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-project-dist . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 33s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 11m 33s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 31s{color} | {color:red} The patch generated 3 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}118m 29s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.net.TestDNS | \\ \\ || Subsystem || Report/Notes || | Docker |
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552135#comment-15552135 ] Jason Lowe commented on HADOOP-13578: - I don't believe the block stuff has anything to do with HDFS blocks. Rather it describes compression occurring in chunks (blocks) of data at a time. Without the small header at the beginning of each block, it becomes difficult in a general way to know how much data is in the next compressed block when decompressing it. Using the Block codec streams doesn't inherently make the data splittable since one can't easily locate the codec block boundaries at an arbitrary split in the data stream (i.e.: HDFS block boundaries). IMHO if we want to chunk the data for splitting then we can just use a SequenceFile configured for block compression with this codec. Using the Block streams is a big drawback since it makes the format incompatible with the compression standard. This already causes problems with LZ4, see HADOOP-12990. Rather that compressing in blocks that we have to put extra headers on to decode we can use the zstd streaming APIs to stream the data through the compressor and decompressor. That lets us keep the file format compatible and avoids error scenarios where the codec is configured to use a buffer size that is too small to decompress one of the codec blocks. With the streaming API we are decoupling our buffer size from the size of the data to compress/decompress. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15550599#comment-15550599 ] churro morales commented on HADOOP-13578: - @jlowe thank you for the thorough review. The reason that the zstd cli and hadoop can't read each other's compressed / decompressed data is that ZStandardCodec uses the Block(Compressor|Decompressor) stream. I assumed this library would be used to compress large amounts of data. So when you use this stream each block gets a header and some compressed data. I believe the 8 bytes you are referring are two ints (the size of the compressed and uncompressed block). If you remove these headers then the cli will be able to read the zstd blocks and if you use the zstd-cli and compress a file (prepend the header for sizes) it will work in hadoop. The snappy compressor / decompressor works in the same way. I do not believe you can compress in snappy format using hadoop then transfer the file locally and call Snappy.uncompress() without removing the headers. If we do not want this to be compressed at a block level, that is fine. Otherwise we can just add a utility in hadoop to take care of the block headers like they did with hadoop-snappy or some of the CLI libraries for snappy like snzip. As far as the decompressed bytes, I agree. I will check to see that the size returned from the function that tells you how many bytes are necessary to uncompress the buffer and ensure thats not larger than our buffer size. I can also add the isError and getErrorName to the decompression library. The reason I explicitly checked if the expected size was equal to the desired size is because the error that zstd provided was too vague. But I'll add it in case there are other errors. Yes I will look at Hadoop-13684. The build of the codec was very similar to snappy because the license was BSD so we could package it in like snappy. I can also take care of the nits you described as well. Are we okay with the compression being at block level? If we are then this implementation will work just like all of the other block compression codecs where it will add / require the header for the hadoop blocks. Thanks again for the review. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15549007#comment-15549007 ] Jason Lowe commented on HADOOP-13578: - Thanks for updating the patch! bq. what do you think about adding a test that would go through all the codecs and run the M/R job you used as your example with the MRMiniCluster I think this would be overkill. Those tests would not be fast to execute, especially across all codecs. Also they won't run during precommit builds that just touch the codecs since they would necessarily have to be in the mapreduce project to pick up the MR cluster and MR job support code. If this was simply a case of returning the wrong buffer length then I'm wondering why the basic sanity tests of the codec don't catch this. Those tests should be updated to catch this without needing to run the full cluster end-to-end scenario. Most of the tests above now pass, but the codec is still not able to produce output that the zstd CLI utility can understand, nor can the codec parse files that were created by the zstd CLI utility. For example, trying to decode a file created by the zstd CLI utility blows up like this: {noformat} $ zstd < /etc/services | hadoop fs -put - services.zst $ hadoop fs -text services.zst WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX. 2016-10-05 13:21:36,236 INFO [main] compress.CodecPool (CodecPool.java:getDecompressor(181)) - Got brand-new decompressor [.zst] -text: Fatal internal error java.lang.ArrayIndexOutOfBoundsException at org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.setInput(ZStandardDecompressor.java:109) at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105) at java.io.InputStream.read(InputStream.java:101) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:91) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:65) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:126) at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:105) at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:100) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:321) at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:293) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:275) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:259) at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119) at org.apache.hadoop.fs.shell.Command.run(Command.java:166) at org.apache.hadoop.fs.FsShell.run(FsShell.java:326) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.fs.FsShell.main(FsShell.java:384) {noformat} Similary the zstd CLI utility just complains that any output produced by the codec is not in in zstd format, e.g.: {noformat} $ hadoop fs -cat wcout-zstandard/part-r-0.zst | zstd -d WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX. zstd: stdin: not in zstd format {noformat} Looks like this codec is tacking on an extra 8 bytes, since when I strip those off then the zstd CLI utility is able to decode it. Similarly when the codec tries to read this Hadoop-specific header from a file created by the zstd CLI utility then it blows up since that header is not there. The GzipCodec works without this Hadoop-specific header, so we can model this codec after it to help make this codec compatible with the standard .zst file format. The code still compiles with warnings: {noformat} [INFO] Running make -j 2 VERBOSE=1 [WARNING] /hadoop/y-src/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardDecompressor.c: In function ‘Java_org_apache_hadoop_io_compress_zstd_ZStandardDecompressor_decompressBytes’: [WARNING] /hadoop/y-src/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardDecompressor.c:82: warning: unused variable ‘uncompressed_direct_buf_len’ [WARNING] /hadoop/y-src/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardDecompressor.c: In function ‘Java_org_apache_hadoop_io_compress_zstd_ZStandardDecompressor_decompressBytes’: [WARNING] /hadoop/y-src/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardDecompressor.c:82: warning: unused variable ‘uncompressed_direct_buf_len’ {noformat} Those warnings are a red flag. The code is not checking if the amount of space available in the uncompressed
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15544232#comment-15544232 ] Hadoop QA commented on HADOOP-13578: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 38s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 12m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 18s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-project-dist . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 45s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 35s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 35s{color} | {color:orange} root: The patch generated 53 new + 82 unchanged - 1 fixed = 135 total (was 83) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 13s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 9s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 7 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 5s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-project-dist . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 8s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 99m 50s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 43s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}205m 48s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency | | | hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices | | |
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543553#comment-15543553 ] churro morales commented on HADOOP-13578: - [~jlowe] Great catch! I figured out the issue I was returning the wrong buffer length in the decompressBytes function in ZStandardDecompressor.c . Made the change locally and everything works. I'll fix the warnings, make sure all your test cases work and incorporate your earlier comments as well. Thanks again for the review, I really appreciate you taking the time to take a look. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543405#comment-15543405 ] Jason Lowe commented on HADOOP-13578: - bq. As far as the first issue were you using the hadoop-mapreduce-native-task code for intermediate compression for MR jobs? No, I don't believe so unless trunk uses that native code by default. This was just a straightforward mapreduce wordcount job with just the two settings I mentioned above, mapreduce.map.output.compress=true and mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.ZStandardCodec. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543332#comment-15543332 ] churro morales commented on HADOOP-13578: - [~jlowe] Thanks for the update. I will look into the issues. As far as the first issue were you using the hadoop-mapreduce-native-task code for intermediate compression for MR jobs? If so I didn't implement that feature because I didn't know if enough people were interested. As far as the warnings, I will clean them up first and will run a word count job to ensure that the decompression / compression works correctly. I'll see if I can reproduce your word count results, that is a bit concerning. Thanks for the review, I'll take a look at it today / tomorrow. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15542571#comment-15542571 ] Jason Lowe commented on HADOOP-13578: - Sorry for the delay in getting a more detailed review. Before I delved deep into the code I ran the codec through some basic tests and found a number of problems. The native code compiles with warnings that should be cleaned up: {noformat} [WARNING] /hadoop/y-src/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardDecompressor.c: In function ‘Java_org_apache_hadoop_io_compress_zstd_ZStandardDecompressor_decompressBytes’: [WARNING] /hadoop/y-src/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardDecompressor.c:110: warning: format ‘%d’ expects type ‘int’, but argument 5 has type ‘size_t’ [WARNING] /hadoop/y-src/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardDecompressor.c:110: warning: format ‘%d’ expects type ‘int’, but argument 5 has type ‘size_t’ [WARNING] /hadoop/y-src/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardDecompressor.c: In function ‘Java_org_apache_hadoop_io_compress_zstd_ZStandardDecompressor_decompressBytes’: [WARNING] /hadoop/y-src/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardDecompressor.c:110: warning: format ‘%d’ expects type ‘int’, but argument 5 has type ‘size_t’ [WARNING] /hadoop/y-src/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardDecompressor.c:110: warning: format ‘%d’ expects type ‘int’, but argument 5 has type ‘size_t’ {noformat} The codec is not working as an intermediate codec for MapReduce jobs. Running a wordcount job with -Dmapreduce.map.output.compress=true -Dmapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.GzipCodec works, but specifying -Dmapreduce.map.output.compress=true -Dmapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.ZStandardCodec causes the reducers to fail while fetching outputs complaining about premature EOF: {noformat} 2016-10-03 13:51:32,140 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#5 about to shuffle output of map attempt_1475501532481_0007_m_00_0 decomp: 323113 len: 93339 to MEMORY 2016-10-03 13:51:32,149 WARN [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to shuffle for fetcher#5 java.io.IOException: Premature EOF from inputStream at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:209) at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.doShuffle(InMemoryMapOutput.java:90) at org.apache.hadoop.mapreduce.task.reduce.IFileWrappedMapOutput.shuffle(IFileWrappedMapOutput.java:63) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:536) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) {noformat} The codec also has some issues with MapReduce jobs when reading input from a previous job's output that has been zstd compressed. For example, this sequence of steps generates output one would expect, where we're effectively word counting the output of wordcount on /etc/services (just some sample input for wordcount): {noformat} $ hadoop fs -put /etc/services wcin $ hadoop jar $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples*.jar wordcount -Dmapreduce.map.output.compress=true -Dmapreduce.output.fileoutputformat.compress=true -Dmapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec wcin wcout-gzip $ hadoop jar $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples*.jar wordcount wcout-gzip wcout-gzip2 {noformat} But if we do the same with org.apache.hadoop.io.compress.ZStandardCodec there's an odd record consisting of about 25K of NULs (i.e.: 0x00 bytes) in the output of the second job. The output of the ZStandardCodec is not readable by the zstd CLI utility, nor is output generated by the zstd CLI utility readable by ZStandardCodec. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15536534#comment-15536534 ] churro morales commented on HADOOP-13578: - [~jlowe] Just wanted to let you know, I have incorporated your changes, but was waiting for a more detailed review before putting a patch up. Don't want to spam you with too many patches :) > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524323#comment-15524323 ] churro morales commented on HADOOP-13578: - [~jlowe] Thank you for taking a look. bq. The BUILDING.txt addition must have been copied from snappy, since a lot of it still refers to snappy. Yes did a copy and paste with the Building.txt, will take a look. bq. There's a TODO in the code that should get sorted out: Yes I think I have it figured out, had to read up a bit :). Since we empty compress the entire buffer at once, that means it's transformed into a single ZStandard Frame. So the overhead is a Magic # (4 bytes) + Header (14 bytes max) + Checksum (optional 4 bytes). So I believe that can be a constant of 22 Bytes because of the way we compress in hadoop. I'll update that as well. bq. Shouldn't this come from IO_COMPRESSION_CODEC_ZSTD_LEVEL_DEFAULT that's already in the config keys? Correct!, will change that back to 3. Also will use the constant. I'll incorporate your fixes. Comments much appreciated. > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales >Assignee: churro morales > Attachments: HADOOP-13578.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514052#comment-15514052 ] churro morales commented on HADOOP-13578: - i was wondering if people were interested in getting zstandard integrated into the hadoop-mapreduce-client-nativetask? That is my only task left for trunk (since that code only resides there). I have backports ready for the 2.7 and 2.6 branches as well. Maybe we can split that feature up into a separate jira ticket? If thats the case I can have patches up very soon. Was just wondering everyone's thoughts here? > Add Codec for ZStandard Compression > --- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature >Reporter: churro morales > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org