subject:"\[jira\] \[Commented\] \(HADOOP\-13578\) Add Codec for ZStandard Compression"

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2017-09-14 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166298#comment-16166298
 ] 

Jason Lowe commented on HADOOP-13578:
-

bq. it finally gets us the promised land of fast, high compression and 
splittability all in one file format.

To clarify, this ZStandard codec is not splittable.  The ZStandard file format 
is theoretically capable of supporting splits, but that's not an official 
feature yet.  As I understand it, split support is being added via  the 
"seekable" support which is in the contrib area at 
https://github.com/facebook/zstd/tree/dev/contrib/seekable_format, but that's 
not part of the standard zstd library yet.  See 
https://github.com/facebook/zstd/issues/395 for more discussion around the 
support of splits in the zstd codec.


> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, 
> HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, 
> HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, 
> HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2017-09-13 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165480#comment-16165480
 ] 

Wei-Chiu Chuang commented on HADOOP-13578:
--

-1 for including it in 2.7 or 2.8.
I am sorry but a new feature is not supposed to land it a maintenance release.

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, 
> HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, 
> HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, 
> HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2017-09-13 Thread Adam Kennedy (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165440#comment-16165440
 ] 

Adam Kennedy commented on HADOOP-13578:
---

ZStd support in 2.7x would be hugely valuable, particularly because it finally 
gets us the promised land of fast, high compression and splittability all in 
one file format.

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, 
> HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, 
> HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, 
> HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2017-09-13 Thread churro morales (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165329#comment-16165329
 ] 

churro morales commented on HADOOP-13578:
-

This was never backported, we did it internally here but never pushed upstream. 
 The build is quite different from Hadoop-2.9 thats where a majority of the 
changes lie.  If there is enough interest I could possibly put up a 2.7x patch. 

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, 
> HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, 
> HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, 
> HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2017-09-06 Thread Andre F de Miranda (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155295#comment-16155295
 ] 

Andre F de Miranda commented on HADOOP-13578:
-

[~churromorales] was this ever backported to 2.7.x? I couldn't find the 
mentioned JIRA

Cheers

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, 
> HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, 
> HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, 
> HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2017-08-29 Thread churro morales (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16146301#comment-16146301
 ] 

churro morales commented on HADOOP-13578:
-

The new version of ZStandard 1.3.1 has a new license: BSD + GPLv2 which removed 
the PATENTS file.  [https://github.com/facebook/zstd/releases/tag/v1.3.1]

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, 
> HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, 
> HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, 
> HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2017-08-03 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113710#comment-16113710
 ] 

Andrew Wang commented on HADOOP-13578:
--

Believe it's fine, see this mailing list thread:

https://lists.apache.org/thread.html/55e4ebf60ef0d109c9f74eac92f58a01b2efc551ae321e5444e05772@%3Ccommon-dev.hadoop.apache.org%3E

>From Jason:

{quote}
I think we are OK to leave support for the zstd codec in the Hadoop code base. 
I asked Chris Mattman for clarification, noting that the support for the zstd 
codec requires the user to install the zstd headers and libraries and then 
configure it to be included in the native Hadoop build. The Hadoop releases are 
not shipping any zstd code (e.g.: headers or libraries) nor does it require 
zstd as a mandatory dependency. Here's what he said:
{quote}

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, 
> HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, 
> HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, 
> HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2017-08-03 Thread Adam Kennedy (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113647#comment-16113647
 ] 

Adam Kennedy commented on HADOOP-13578:
---

Is this impacted by LEGAL-303?

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, 
> HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, 
> HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, 
> HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2017-02-13 Thread churro morales (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864240#comment-15864240
 ] 

churro morales commented on HADOOP-13578:
-

Hi [~redsky_luan] - I believe I can put up a patch for 2.7, should be pretty 
easy. 

Lets create another JIRA ticket for this and move it out. 

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, 
> HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, 
> HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, 
> HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2017-02-13 Thread James_Luan (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863517#comment-15863517
 ] 

James_Luan commented on HADOOP-13578:
-

Hi there.. Is there any schedule for backporing zstd on 2.7? I'd glad to test 
zstd on version 2.7

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, 
> HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, 
> HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, 
> HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2017-01-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15798669#comment-15798669
 ] 

Hudson commented on HADOOP-13578:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11072 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11072/])
HADOOP-13578. Add Codec for ZStandard Compression. Contributed by churro 
(jlowe: rev a0a276162147e843a5a4e028abdca5b66f5118da)
* (add) 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/zstd/TestZStandardCompressorDecompressor.java
* (edit) hadoop-project/pom.xml
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NativeLibraryChecker.java
* (add) 
hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardDecompressor.c
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/Decompressor.java
* (add) 
hadoop-common-project/hadoop-common/src/test/resources/zstd/test_file.txt.zst
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java
* (add) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/ZStandardCodec.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/NativeCodeLoader.c
* (add) 
hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/org_apache_hadoop_io_compress_zstd.h
* (edit) 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/TestCompressionStreamReuse.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/resources/META-INF/services/org.apache.hadoop.io.compress.CompressionCodec
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/pom.xml
* (edit) 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/TestCodec.java
* (edit) hadoop-common-project/hadoop-common/src/CMakeLists.txt
* (add) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zstd/ZStandardCompressor.java
* (add) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zstd/package-info.java
* (edit) dev-support/bin/dist-copynativelibs
* (add) 
hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardCompressor.c
* (edit) hadoop-common-project/hadoop-common/pom.xml
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NativeCodeLoader.java
* (add) 
hadoop-common-project/hadoop-common/src/test/resources/zstd/test_file.txt
* (edit) hadoop-project-dist/pom.xml
* (add) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zstd/ZStandardDecompressor.java
* (edit) hadoop-common-project/hadoop-common/src/config.h.cmake
* (edit) BUILDING.txt
* (edit) 
hadoop-common-project/hadoop-common/src/site/markdown/NativeLibraries.md.vm


> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, 
> HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, 
> HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, 
> HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2017-01-03 Thread churro morales (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796609#comment-15796609
 ] 

churro morales commented on HADOOP-13578:
-

[~jlowe] thank you for taking the time to review this patch. 

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, 
> HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, 
> HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, 
> HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2017-01-03 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796550#comment-15796550
 ] 

Jason Lowe commented on HADOOP-13578:
-

Sorry for the long delay, as I just got back from vacation.

+1 for the v9 trunk and branch-2 patches.  The branch-2 test failures and ASF 
warnings are unrelated.

I'll commit this tomorrow if there are no objections.

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, 
> HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, 
> HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, 
> HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-12-16 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756328#comment-15756328
 ] 

Hadoop QA commented on HADOOP-13578:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
52s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
31s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
30s{color} | {color:green} branch-2 passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
28s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
28s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  9m  
2s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
48s{color} | {color:green} branch-2 passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-project-dist . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
37s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
26s{color} | {color:green} branch-2 passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  6m 
26s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
30s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  5m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
31s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  6m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
31s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 29s{color} | {color:orange} root: The patch generated 4 new + 161 unchanged 
- 1 fixed = 165 total (was 162) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  9m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-project-dist . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m  
5s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  6m 
54s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
|

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-12-16 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755586#comment-15755586
 ] 

Hadoop QA commented on HADOOP-13578:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 
22s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
30s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m  
5s{color} | {color:green} branch-2 passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
50s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
28s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  9m 
37s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
56s{color} | {color:green} branch-2 passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-project-dist . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
39s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
10s{color} | {color:green} branch-2 passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  6m  
7s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
43s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red}  5m 43s{color} | 
{color:red} root-jdk1.8.0_111 with JDK v1.8.0_111 generated 1 new + 6 unchanged 
- 1 fixed = 7 total (was 7) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
38s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red}  6m 38s{color} | 
{color:red} root-jdk1.7.0_121 with JDK v1.7.0_121 generated 1 new + 16 
unchanged - 1 fixed = 17 total (was 17) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
38s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 34s{color} | {color:orange} root: The patch generated 4 new + 161 unchanged 
- 1 fixed = 165 total (was 162) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-project-dist . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
13s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} |
|

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-12-16 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755172#comment-15755172
 ] 

Hadoop QA commented on HADOOP-13578:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} HADOOP-13578 does not apply to branch-2. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-13578 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12843629/HADOOP-13578-branch-2.v9.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11291/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, 
> HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, 
> HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, 
> HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-12-16 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755021#comment-15755021
 ] 

Hadoop QA commented on HADOOP-13578:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  8s{color} 
| {color:red} HADOOP-13578 does not apply to branch-2. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-13578 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12843619/HADOOP-13578-branch-2.v9.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11289/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578-branch-2.v9.patch, HADOOP-13578.patch, 
> HADOOP-13578.v1.patch, HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, 
> HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, 
> HADOOP-13578.v7.patch, HADOOP-13578.v8.patch, HADOOP-13578.v9.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-12-15 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751842#comment-15751842
 ] 

Jason Lowe commented on HADOOP-13578:
-

Latest patch looks good.  The test failures and ASF warnings are unrelated.  I 
filed HDFS-11251 to track the TestDataNodeHotSwapVolumes failure and HDFS-11252 
to track the TestFileTruncate failure.  YARN-5934 is tracking the 
TestTimelineWebServices failure.

Please attach an equivalent branch-2 patch, and if that looks good then I'll 
commit this.

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, 
> HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, 
> HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, HADOOP-13578.v7.patch, 
> HADOOP-13578.v8.patch, HADOOP-13578.v9.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-12-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15750095#comment-15750095
 ] 

Hadoop QA commented on HADOOP-13578:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  9m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
 9s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-project-dist . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
32s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  9m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m 
23s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 42s{color} | {color:orange} root: The patch generated 3 new + 159 unchanged 
- 1 fixed = 162 total (was 160) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  9m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
11s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m  
8s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 5 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
5s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-project-dist . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m  
0s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}111m 34s{color} 
| {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
44s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}209m 40s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
|   | hadoop.hdfs.server.namenode.TestFileTruncate |
|   | hadoop.yarn.server.timeline.webapp.TestTimelineWebServices |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-12-14 Thread churro morales (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749728#comment-15749728
 ] 

churro morales commented on HADOOP-13578:
-

Great, thank you [~jlowe] for your patience and taking the time to do so many 
reviews.  The 2x branch patch will be mostly the same code-wise except the code 
to build and package the libraries is quite different.  I'll work on getting a 
patch ready today. 


> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, 
> HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, 
> HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, HADOOP-13578.v7.patch, 
> HADOOP-13578.v8.patch, HADOOP-13578.v9.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-12-14 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749717#comment-15749717
 ] 

Jason Lowe commented on HADOOP-13578:
-

We should use this JIRA for the branch-2 patch.  Just attach the branch-2 
version according to the [patch naming 
guidelines|https://wiki.apache.org/hadoop/HowToContribute#Naming_your_patch] 
and the precommit build will run against branch-2 instead of trunk, e.g.: 
HADOOP-13578-branch-2.v9.patch can be the branch-2 version of the 
HADOOP-13578.v9.patch.  I'd attach the trunk patch first, and if the precommit 
looks good go ahead and attach the branch-2 version so it can run on that.  If 
you attach both at the same time then precommit will only run on one of them 
(the first attached, IIRC).


> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, 
> HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, 
> HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, HADOOP-13578.v7.patch, 
> HADOOP-13578.v8.patch, HADOOP-13578.v9.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-12-14 Thread churro morales (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749694#comment-15749694
 ] 

churro morales commented on HADOOP-13578:
-

No problem will do that and upload the patch, thanks again for taking the time 
to review.  I know we are interested in backports for this feature to the 2.x 
branch.  Should we create a new ticket for that?

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, 
> HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, 
> HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, HADOOP-13578.v7.patch, 
> HADOOP-13578.v8.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-12-14 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749683#comment-15749683
 ] 

Jason Lowe commented on HADOOP-13578:
-

Yep, that looks good.  If you could put the else on the same line as the 
previous closing bracket to match the coding style that'd be nice as well.

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, 
> HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, 
> HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, HADOOP-13578.v7.patch, 
> HADOOP-13578.v8.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-12-14 Thread churro morales (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749677#comment-15749677
 ] 

churro morales commented on HADOOP-13578:
-

[~jlowe] in regards to your previous comment in regards to not returning after 
throwing.  I can make the following change: 

{code} 
size_t size = dlsym_ZSTD_compressStream(stream, , );
if (dlsym_ZSTD_isError(size)) {
THROW(env, "java/lang/InternalError", dlsym_ZSTD_getErrorName(size));
return (jint) 0;
}
if (finish && input.pos == input.size) {
// end the stream, flush and  write the frame epilogue
size = dlsym_ZSTD_endStream(stream, );
if (!size) {
(*env)->SetBooleanField(env, this, ZStandardCompressor_finished, 
JNI_TRUE);
}
}
else {
// need to flush the output buffer
// this also updates the output buffer position.
size = dlsym_ZSTD_flushStream(stream, );
}
if (dlsym_ZSTD_isError(size)) {
THROW(env, "java/lang/InternalError", dlsym_ZSTD_getErrorName(size));
return (jint) 0;
}
{code}

If you are okay with this I can put up a new patch containing this fix. 

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, 
> HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, 
> HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, HADOOP-13578.v7.patch, 
> HADOOP-13578.v8.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-12-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749643#comment-15749643
 ] 

Hadoop QA commented on HADOOP-13578:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  9m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
 9s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-project-dist . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
36s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 10m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 
11s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 58s{color} | {color:orange} root: The patch generated 3 new + 159 unchanged 
- 1 fixed = 162 total (was 160) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
12s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m  
9s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 5 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
5s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-project-dist . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
55s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 58s{color} 
| {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
42s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}117m 16s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.security.token.delegation.web.TestWebDelegationToken |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | HADOOP-13578 |
| JIRA Patch URL |

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-12-14 Thread churro morales (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749618#comment-15749618
 ] 

churro morales commented on HADOOP-13578:
-

Yes you are correct, ZSTD_compressStream_generic does nothing if the stage is 
final, so that part looks good to me. 

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, 
> HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, 
> HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, HADOOP-13578.v7.patch, 
> HADOOP-13578.v8.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-12-14 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749570#comment-15749570
 ] 

Jason Lowe commented on HADOOP-13578:
-

One more thing I just thought of -- if we end up having to loop around and call 
endStream more than once, is it OK that we called ZSTD_compressStream between 
the two calls to ZSTD_endStream?  The call will be degenerate with input size 
== 0, but I'm wondering if that's OK from the zstd lib API perspective.

Actually I think we're OK here since ZSTD_endStream will do a similar 
degenerate ZSTD_compressStream call internally if it isn't in the final 
compression stage, and it looks like ZSTD_compressStream explicitly does 
nothing if the compressor is in the final stage.  So I think we're good, but 
it'd be great if you could double-check.

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, 
> HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, 
> HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, HADOOP-13578.v7.patch, 
> HADOOP-13578.v8.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-12-14 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749557#comment-15749557
 ] 

Jason Lowe commented on HADOOP-13578:
-

Thanks for updating the patch!  It's really close now.

We need to check for an error if ZSTD_endStream returns a failure, otherwise we 
could end up looping forever on a persistent error being returned from 
ZSTD_endStream.  The outer compress logic will keep calling compress with 
finish=true trying to wrap things up and the JNI code will keep avoiding 
setting the finished flag because ZSTD_endStream isn't returning 0.

There's no return after the THROW macros here, so we can fall through and call 
ZSTD_flushStream after ZSTD_compressStream returns an error.  That means the 
error being reported from compress might be clobbered by the error being 
returned from flush if they are different.  Also we can just move the error 
check on "size" up inside the flush block since it only applies to that case, 
or change "remaining_to_flush" to "size" and let the previous point be fixed by 
falling through to the common error check on "size".
{code}
size_t size = dlsym_ZSTD_compressStream(stream, , );
if (dlsym_ZSTD_isError(size)) {
THROW(env, "java/lang/InternalError", dlsym_ZSTD_getErrorName(size));
}
if (finish && input.pos == input.size) {
// end the stream, flush and  write the frame epilogue
size_t const remaining_to_flush = dlsym_ZSTD_endStream(stream, );
if (!remaining_to_flush) {
(*env)->SetBooleanField(env, this, ZStandardCompressor_finished, 
JNI_TRUE);
}
}
else {
// need to flush the output buffer
// this also updates the output buffer position.
size = dlsym_ZSTD_flushStream(stream, );
}
if (dlsym_ZSTD_isError(size)) {
THROW(env, "java/lang/InternalError", dlsym_ZSTD_getErrorName(size));
}
{code}

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, 
> HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, 
> HADOOP-13578.v5.patch, HADOOP-13578.v6.patch, HADOOP-13578.v7.patch, 
> HADOOP-13578.v8.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-12-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749292#comment-15749292
 ] 

Hadoop QA commented on HADOOP-13578:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  9m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-project-dist . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
29s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  9m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m 
22s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 43s{color} | {color:orange} root: The patch generated 3 new + 159 unchanged 
- 1 fixed = 162 total (was 160) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
12s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m  
8s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 5 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
5s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-project-dist . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
38s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 10s{color} 
| {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
42s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}112m 47s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ha.TestZKFailoverController |
|   | hadoop.security.token.delegation.web.TestWebDelegationToken |
|   | hadoop.io.compress.zstd.TestZStandardCompressorDecompressor |
|   | hadoop.net.TestDNS |
\\
\\
|| Subsystem ||

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-12-14 Thread churro morales (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15748972#comment-15748972
 ] 

churro morales commented on HADOOP-13578:
-

fair enough, thats an easy check to make instead of checking if (finish) in the 
ZStandardCompressor.c we can just do a if (finish && input.size == input.pos) 
which will ensure we have at least compressed everything in the input buffer 
before writing the final frame.

Here is what is included in the updated patch:

Fixed the zstd buffer constant naming.
Fixed the constructor for ZStandardCompressor
Removed finished from the ZStandardDirectDecompressor.
Reverted the finished logic in the ZStandardCompressor back to its original 
state
return after we throw in any c code.
Parameterized the inflateBytes method such that we don't have to slice the 
buffer anymore.
Took care of the endStream with data still left in the buffer
Also fixed the remaining checkstyle issues.






> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, 
> HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, 
> HADOOP-13578.v5.patch, HADOOP-13578.v6.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-12-13 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746346#comment-15746346
 ] 

Jason Lowe commented on HADOOP-13578:
-

The thing I'm worried about is that when we call ZSTD_compressStream we are 
passing descriptors for both the input buffer and the output buffer.  When we 
call ZSTD_endStream we are only passing the descriptor for the output buffer.  
Therefore I don't know how ZSTD_endStream is supposed to finish consuming any 
input that ZSTD_compressStream didn't get to if it doesn't have access to that 
input buffer descriptor.

Looking at the zstd code you'll see that when it does call ZSTD_compressStream 
inside ZSTD_endStream, it's calling it with srcSize == 0.  That means there is 
no more source to consume.  So if the last call of the JNI code to 
ZSTD_compressStream did not fully consume the input buffer's data (i.e.: input 
pos is not moved to the end of the data) then it looks like calling 
ZSTD_endStream will simply flush out what input data did make it and then end 
the frame.  That matches what the documentation for ZSTD_endStream says.  So I 
still think we need to make sure we do not call ZSTD_endStream if input.pos is 
not at the end of the input buffer after we call ZSTD_compressStream, or we 
risk losing the last chunk of data if the zstd library for some reason cannot 
fully consume the input buffer when we try to finish.


> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, 
> HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, 
> HADOOP-13578.v5.patch, HADOOP-13578.v6.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-12-12 Thread churro morales (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15743180#comment-15743180
 ] 

churro morales commented on HADOOP-13578:
-

HI [~jlowe]

Thanks for taking the time to review.  I agree with all of the above comments 
and will correct those issues.  The last question you had was related to the 
ZSTD_endStream().  The endStream() finishes the frame and writes the epilogue 
only if the uncompressed buffer has been fully consumed.  Otherwise it 
basically does the same thing as ZSTD_compressStream(). 

You are correct, if the output buffer is too small it may not be able to flush. 
 There is a check in ZSTD_endStream() which does this: 

{code} 
size_t const notEnded = ZSTD_compressStream_generic(zcs, ostart, 
, , , zsf_end);  
size_t const remainingToFlush = zcs->outBuffContentSize - 
zcs->outBuffFlushedSize;
op += sizeWritten;
if (remainingToFlush) {
output->pos += sizeWritten;
return remainingToFlush + ZSTD_BLOCKHEADERSIZE /* final empty block 
*/ + (zcs->checksum * 4);
}
   // Create the epilogue and flush the epilogue

{code}

so if there is still data to be consumed the library wont finish the frame, 
thus making it safe to call repeatedly with our framework because we never set 
the finished flag until the epilogue has been written successfully.  

The code in the CompressorStream.java which calls our codec simply does this:

{code} 
@Override
  public void finish() throws IOException {
if (!compressor.finished()) {
  compressor.finish();
  while (!compressor.finished()) {
compress();
  }
}
  }
{code}

So I believe we wont drop any data with the way things are done.  Please let me 
know if I am missing something obvious here :). 




> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, 
> HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, 
> HADOOP-13578.v5.patch, HADOOP-13578.v6.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-12-12 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15742894#comment-15742894
 ] 

Jason Lowe commented on HADOOP-13578:
-

Thanks for updating the patch!  Sorry for the delay in re-review, as I was 
travelling last week.  The ASF license warnings appear to be unrelated, as does 
the unit test failure.  Comments on the patch:

IO_COMPRESSION_CODEC_ZSTD_BUFFER_SIZE should be 
IO_COMPRESSION_CODEC_ZSTD_BUFFER_SIZE_DEFAULT

The ZStandardCompressor(level, bufferSize) constructor should be implemented in 
terms of ZStandardCompressor(level, inputBufferSize, outputBufferSize) to 
reduce the code redundancy and improve maintainability.

If inflateBytesDirect took a dest position then we wouldn't need to slice the 
destination buffer, avoiding a temporary objection allocation.

I'm not sure about this snippet in ZStandardDirectDecompressor#decompress:
{code}
  endOfInput = !src.hasRemaining();
  super.finished = !src.hasRemaining();
{code}
After adding the super.finished change it appears the endOfInput and finished 
variables will simply reflect each other, so endOfInput would be redundant.  
Instead I think endOfInput may still be necessary, and the finished variable 
handling will already be taken care of by the JNI code.  In other words I think 
we don't need the super.finished line that was added.  However maybe I'm 
missing something here.

I think we need to return after these THROW macros or risk passing null 
pointers to the zstd library:
{code}
void * uncompressed_bytes = (*env)->GetDirectBufferAddress(env, 
uncompressed_direct_buf);
if (!uncompressed_bytes) {
THROW(env, "java/lang/InternalError", "Undefined memory address for 
uncompressedDirectBuf");
}

// Get the output direct buffer
void * compressed_bytes = (*env)->GetDirectBufferAddress(env, 
compressed_direct_buf);
if (!compressed_bytes) {
THROW(env, "java/lang/InternalError", "Undefined memory address for 
compressedDirectBuf");
}
{code}
Same comments for similar code on the decompressor JNI side.

Related to this code:
{code}
if (remaining_to_flush) {
(*env)->SetBooleanField(env, this, ZStandardCompressor_finished, 
JNI_FALSE);
} else {
(*env)->SetBooleanField(env, this, ZStandardCompressor_finished, 
JNI_TRUE);
}
{code}
What I meant by my previous review was to have the JNI layer clear the finished 
flag in any case we didn't set it, including the case where the finish flag 
isn't asking us to wrap things up.  Actually thinking about this further, I'm 
not sure we need the case where we set it to false.  We just need to set it to 
true in the JNI layer when the end flush indicates there aren't any more bytes 
to flush.  In every other case finished should already be false, and the user 
will need to call reset() to clear both the finish and finished flags to 
continue with more input.  So I think we can put this back to the way it was.  
Sorry for the confusion on my part.

If there are still bytes left to consume in the uncompressed buffer then I 
don't think we want to call ZSTD_endStream.  We should only be calling 
ZSTD_endStream when the uncompressed buffer has been fully consumed, correct?  
Otherwise we may fail to compress the final chunk of input data when the user 
sets the finish flag and the zstd library doesn't fully consume the input on 
the ZSTD_compressStream call.  That could result in a "successful" compression 
that drops data.


> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, 
> HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, 
> HADOOP-13578.v5.patch, HADOOP-13578.v6.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-12-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727492#comment-15727492
 ] 

Hadoop QA commented on HADOOP-13578:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 19m 
21s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
40s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  9m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
18s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-project-dist . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
25s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
45s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  9m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m 
21s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 43s{color} | {color:orange} root: The patch generated 64 new + 159 unchanged 
- 1 fixed = 223 total (was 160) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
11s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m  
9s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 5 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
5s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-project-dist . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
20s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}114m 31s{color} 
| {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
51s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}234m 38s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.timeline.webapp.TestTimelineWebServices |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | HADOOP-13578 |
| JIRA Patch URL |

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-12-06 Thread churro morales (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726997#comment-15726997
 ] 

churro morales commented on HADOOP-13578:
-

I'll also take care of the java checkstyle issues and upload a patch provided 
you are good with this patch.

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, 
> HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, 
> HADOOP-13578.v5.patch, HADOOP-13578.v6.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-12-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726853#comment-15726853
 ] 

Hadoop QA commented on HADOOP-13578:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
35s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-project-dist . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
10s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 10m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 
44s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 50s{color} | {color:orange} root: The patch generated 64 new + 159 unchanged 
- 1 fixed = 223 total (was 160) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
14s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
10s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 5 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
6s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-project-dist . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
47s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 15m 10s{color} 
| {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
51s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}121m 38s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.io.compress.zstd.TestZStandardCompressorDecompressor |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | HADOOP-13578 |
| JIRA Patch URL |

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-12-01 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713137#comment-15713137
 ] 

Jason Lowe commented on HADOOP-13578:
-

I forgot to mention it would also be good to address at least some of the 
checkstyle issues, such as the indentation levels and line-length.

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, 
> HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-12-01 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713129#comment-15713129
 ] 

Jason Lowe commented on HADOOP-13578:
-

Thanks for updating the patch!

I'm torn on the buffer size, whether we should use the generic 
io.file.buffer.size or have a zst-specific config.  If the codec can 
significantly speed up with a larger default buffer size than the file buffer 
size then maybe we should consider having a separate, zst-specific buffer size 
config.  I'd recommend a zst-specific config key with a default value of 0 that 
means use whatever buffer size the zst library wants to use, but the user can 
override it to a custom size.  That way users can change the zst codec buffer 
size (again, useful for keeping memory usage reasonable for very wide sort 
factor merges) but not change the buffer sizes of anything else.  Similarly, 
someone changing the default buffer size to something relatively small (e.g.: 
4K) for some unrelated use case could unknowingly hamper the performance of the 
zst codec.

Regarding the test input file, is it copyrighted or do we otherwise have the 
redistribution rights?  Also the .zst binary test file is missing from the 
patch.

*ZStandardCodec.java*
Nit: should import the specific things needed rather than wildcard imports.

*ZStandardCompressor.java*
reinit calls reset then init, but reset already is calling init.

LOG should be private rather than public.

The LOG.isDebugEnabled check is unnecessary.

Nit: The VisibleForTesting annotation should be on a separate line like the 
other annotations.

*ZStandardCompressor.c:*
Shouldn't we throw if GetDirectBufferAddress returns null?

>From the zstd documentation, ZSTD_compressStream may not always fully consume 
>the input.  Therefore if the finish flag is set in deflateBytesDirect I think 
>we need to avoid calling ZTD_endStream if there is still bytes left in the 
>input or we risk dropping some of the input.  We can just return without 
>setting the finished flag and expect to get called again from the outer 
>compression loop logic.

Similarly the documentation for ZSTD_endStream says that it may not be able to 
flush the full contents of the data in one call, and if it indicates there is 
more data left to flush then we should call it again.  So if we tried to end 
the stream and there's more data left then we shouldn't throw an error.  
Instead we simply leave the finished flag unset and return so the outer logic 
loop will empty the output buffer and call compress() again.  We will then 
again call ZSTD_endStream to continue attempting to finish the compression.  
Bonus points for adding a test case that uses a 1-byte output buffer to 
demonstrate the extreme case is working properly.

Is it necessary to call ZSTD_flushStream after every ZSTD_compressStream call?  
Seems to me we can skip it entirely, but I might be missing something.

*ZStandardDecompressor.java:*
 LOG should be private rather than public.

Should there be a checkStream call at the beginning of decompress()?  This 
would mirror the same check in the compressor for the corresponding compress() 
method.

In inflateDirect, the setting of preSliced = dst in the conditional block is 
redundant since it was just done before the block.  Actually I think it would 
be faster and create less garbage to avoid creating a temporary ByteBuffer and 
have the JNI code to lookup the output buffer's position and limit and adjust 
accordingly.  That way we aren't creating an object each time.

The swapping of variables in inflateDirect is cumbersome.  It would be much 
cleaner if inflateBytesDirect took arguments so we can pass what the JNI needs 
directly to it rather than shuffling the parameters into the object's fields 
before we call it. That also cleans up the JNI code a bit since it doesn't need 
to do many of the field lookups.  For example:
{code}
  private native int inflateBytesDirect(ByteBuffer src, int srcOffset, int 
srcLen, ByteBuffer dst, int dstOffset, int dstLen);
{code}
We could simplify the variables even more if the JNI code called the ByteBuffer 
methods to update the positions and limits directly rather than poking the 
values into the Java object and having Java do it when the JNI returns.  I'll 
leave the details up to you, just pointing out that we can clean this up and 
avoid a lot of swapping.  We could do a similar thing on the compressor side.  

I'm a little confused about the endOfInput flag for the direct decompressor.  I 
assume it's for handling the case where there are multiple frames within the 
input buffer.  The JNI code will set finished = true once it hits the end of 
the frame, and it appears it will remain that way even if more input is sent 
(i.e.: finished will go from false to true after the first frame in the input 
buffer and remain true throughout all subsequent frames).  It makes me wonder 
if we should be

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-11-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15703449#comment-15703449
 ] 

Hadoop QA commented on HADOOP-13578:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  9m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-project-dist . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
29s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  9m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m 
30s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 46s{color} | {color:orange} root: The patch generated 73 new + 159 unchanged 
- 1 fixed = 232 total (was 160) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  9m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
11s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m  
9s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 2383 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
5s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-project-dist . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
32s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 59s{color} 
| {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
44s{color} | {color:red} The patch generated 3 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}113m 44s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.net.TestDNS |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | HADOOP-13578 |
| JIRA Patch URL |

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-11-18 Thread churro morales (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15678157#comment-15678157
 ] 

churro morales commented on HADOOP-13578:
-

Hi Jason thanks for the review.  I figured out the memory issue and will submit 
a patch monday, it was quite a bit of fun trying to figure out why the JVM was 
segfaulting for particular scenarios :).

io.compression.codec.zst.buffersize should be 
io.compression.codec.zstd.buffersize to have a consistent prefix with 
io.compression.codec.zstd.level
In ZStandardCodec#createOutputStream and ZStandardCodec#createInputStream I'm 
curious why we're not honoring the user's specified buffer size and just using 
the default buffer size. Many codecs use io.file.buffer.size here, but others 
use a codec-specific config property.

bq. I totally agree, I'm going to just use io.file.buffer.size and that will be 
the buffer size and we will honor it. 

In ZStandardCompressor and ZStandardDecompressor, the buffers should simply be 
declared as ByteBuffer instead of Buffer. That way the ugly casting when doing 
bulk get/put with them is unnecessary.
bq. done

In ZStandardDecompressor it's a little confusing that some buffer length 
variables like userBufLen reflect the number of bytes remaining to be consumed 
in the buffer, while compressedDirectBufLen reflects the number of bytes 
originally placed in the buffer, so the real length of data to be consumed is 
(compressedDirectBufLen - compressedDirectBufOffset). Maybe a variable name 
change would help notify future maintainers of the semantic difference with 
similar variables for other buffers, or we should change the semantics so that 
variable is consistent with the others?

bq. Yeah that is confusing, I'll change the name of the variables.  I was just 
following on how other Decompressors did their naming.

I'm a little wary of the public getBytesWritten and getBytesRead methods in 
ZStandardDecompressor. getBytesWritten doesn't appear to be all that useful 
since the caller can trivially compute it based on the amount of data they read 
from the input stream the decompressor is writing. Also both of these methods 
reset values to zero at frame boundaries in the input stream, which may confuse 
callers who are comparing these values to the entire compressed input data 
size. I'd recommend removing getBytesWritten which is unused and making 
getBytesRead package-private just for unit testing.

bq. I originally had that for testing purposes, I totally forgot to remove it.  
I'll get rid of those methods entirely. 

It would be nice to change the logging from org.apache.commons.logging to 
org.slf4j. Sorry I didn't catch this earlier. We're supposed to be slowly 
migrating to SLF4J, and new code should be using it. Fortunately it's an easy 
change, and we can remove the conditional checks for debug logging.

bq. sure thing

Do we need to have test_file.txt? I'm wondering if simply generating random 
data (maybe with a constant seed so the test is deterministic) into a byte 
buffer would be sufficient input for testing. Typically we want to avoid test 
files in the source tree unless there's something very specific about the 
file's contents necessary to exercise a unit test case.

bq. Sorry, I forgot to commit the other file that went along: 
test_file.txt.zst.  That file was compressed using the command line utility and 
wanted to make sure we could go back and forth between command line and hadoop. 
 I think that is useful to have as a test case as if anyone changes that 
behavior down the line, tests will break and it should be pretty clear as to 
why. 

I will fix all coding style issues and other minor nits no problem.

Again thank you so much for taking the time to review this patch.  I will get 
the hopefully final patch up Monday.  










> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, 
> HADOOP-13578.v2.patch, HADOOP-13578.v3.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-11-18 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677114#comment-15677114
 ] 

Jason Lowe commented on HADOOP-13578:
-

Looking forward to the updated patch with the memory fixes.  Some comments on 
the latest patch.  I didn't get time to look at the native code or unit test 
changes in detail, but here are the comments I had thus far:

io.compression.codec.zst.buffersize should be 
io.compression.codec.zstd.buffersize to have a consistent prefix with 
io.compression.codec.zstd.level

In ZStandardCodec#createOutputStream and ZStandardCodec#createInputStream I'm 
curious why we're not honoring the user's specified buffer size and just using 
the default buffer size.  Many codecs use io.file.buffer.size here, but others 
use a codec-specific config property.

In ZStandardCompressor and ZStandardDecompressor, the buffers should simply be 
declared as ByteBuffer instead of Buffer.  That way the ugly casting when doing 
bulk get/put with them is unnecessary.

In ZStandardDecompressor it's a little confusing that some buffer length 
variables like userBufLen reflect the number of bytes remaining to be consumed 
in the buffer, while compressedDirectBufLen reflects the number of bytes 
originally placed in the buffer, so the real length of data to be consumed is 
(compressedDirectBufLen - compressedDirectBufOffset).  Maybe a variable name 
change would help notify future maintainers of the semantic difference with 
similar variables for other buffers, or we should change the semantics so that 
variable is consistent with the others?

I'm a little wary of the public getBytesWritten and getBytesRead methods in 
ZStandardDecompressor.  getBytesWritten doesn't appear to be all that useful 
since the caller can trivially compute it based on the amount of data they read 
from the input stream the decompressor is writing.  Also both of these methods 
reset values to zero at frame boundaries in the input stream, which may confuse 
callers who are comparing these values to the entire compressed input data 
size.  I'd recommend removing getBytesWritten which is unused and making 
getBytesRead package-private just for unit testing.

It would be nice to change the logging from org.apache.commons.logging to 
org.slf4j.  Sorry I didn't catch this earlier.  We're supposed to be slowly 
migrating to SLF4J, and new code should be using it.  Fortunately it's an easy 
change, and we can remove the conditional checks for debug logging.

Do we need to have test_file.txt?  I'm wondering if simply generating random 
data (maybe with a constant seed so the test is deterministic) into a byte 
buffer would be sufficient input for testing.  Typically we want to avoid test 
files in the source tree unless there's something very specific about the 
file's contents necessary to exercise a unit test case.

Typos and coding style minor nits:
- ZStandardCodec
-- conf should be private in ZStandardCodec since it isn't accessed outside of 
the class.
-- "objec." s/b "object." and "ZStandard Decompressor" s/b 
"ZStandardDecompressor"
- ZStandardCompressor
-- Comment refers to ZLIB format
-- Missing whitespace in "(b== null)" in setInput
-- Missing braces around "keepUncompressedBuf && uncompressedDirectBufLen > 0" 
conditional block in needsInput
-- 'else' should be on same line as closing brace in compress
-- 'atmost' s/b 'at most'
-- Missing braces around conditional block in checkStream
- ZStandardDecompressor
-- Annotation should appear on line before getDirectBufferSize method.


> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, 
> HADOOP-13578.v2.patch, HADOOP-13578.v3.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-11-16 Thread churro morales (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15671488#comment-15671488
 ] 

churro morales commented on HADOOP-13578:
-

found a small issue with the latest patch which has a connection leak as well 
as closing a stream that doesn't exist. 

I will fix this and put up another patch in a few hours.

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, 
> HADOOP-13578.v2.patch, HADOOP-13578.v3.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-11-15 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668912#comment-15668912
 ] 

Hadoop QA commented on HADOOP-13578:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
25s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  6m 
30s{color} | {color:red} root in trunk failed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
 7s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-project-dist . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
43s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
38s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red}  9m 38s{color} | 
{color:red} root generated 5 new + 2 unchanged - 0 fixed = 7 total (was 2) 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  9m 38s{color} 
| {color:red} root generated 513 new + 179 unchanged - 0 fixed = 692 total (was 
179) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 44s{color} | {color:orange} root: The patch generated 68 new + 91 unchanged 
- 1 fixed = 159 total (was 92) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
13s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
10s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 2385 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
6s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-project-dist . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
46s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 47m 49s{color} 
| {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
47s{color} | {color:red} The patch generated 3 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}147m 43s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.crypto.key.kms.server.TestKMS |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue |

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-11-15 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668282#comment-15668282
 ] 

Jason Lowe commented on HADOOP-13578:
-

Cool, thanks for the update and looking into things.

I was hoping with the discovery that the zstd streaming interface can seemingly 
take arbitrary input and output buffer sizes that we could avoid the headaches 
of copying leftover input data, etc.  Instead we keep feeding input data to 
ZSTD_decompressStream in a loop until either we run out of input data or the 
output buffer is full.  As long as there is still input data remaining (i.e.: 
output buffer filled) then we don't need more input and therefore don't need to 
shuffle data around in the input buffer.  Hopefully that makes implementing the 
Decompressor less clunky than before.

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, 
> HADOOP-13578.v2.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-11-15 Thread churro morales (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668094#comment-15668094
 ] 

churro morales commented on HADOOP-13578:
-

[~jlowe]

Thanks for the feedback.  As for your concerns:  

bq. The problem with ignoring the Codec interface is that code using the 
Compressor/Decompressor interfaces will break if it happens to get the zstd 
codec but work for all others. That's not really adding a full Codec for zstd. 
I agree it's not the most ideal interface for the way zstd wants to be called, 
but that doesn't change the fact that existing code that should work as-is on a 
Hadoop-provided codec will not if we don't implement the 
Compressor/Decompressor interfaces.

As much as it pains me, I agree with your sentiments.  I will implement the 
Compressor / Decompressor interfaces.

bq. It looks like the zstd code already includes a wrapper that translates the 
zlib API into the zstd API (see 
https://github.com/facebook/zstd/tree/dev/zlibWrapper) which we can maybe 
leverage or at least use as a model for the Compressor/Decompressor 
implementations. 

Good point, I did look at using the zlib wrapper originally, but in v1.0.0 
(https://github.com/facebook/zstd/tree/v1.0.0/zlibWrapper) they didn't have 
support for deflateReset and inflateReset which would have errored out and 
broken our code.  No worries though, I think I have it figured out using the 
Compressor / Decompressor logic without having to reference the zlibwrapper.

bq.  To be clear, I think it's a good thing that the compressor/decompressor 
streams are using zstd more natively, but I also think it's important to not 
ignore the non-stream Codec APIs.

I think the reason we need to implement the compressor / decompressor is more 
for the reuse from the CodecPool.  Each compressor / decompressor / codec is 
very much tied to whether the compression library has been implemented using a 
stream based or block based approach.  From what I can see, the API is called 
as follows:

{code} 
Compressor compressor =  CodecPool.getCompressor();
codec.createOutputStream(OutputStream, compressor);
// do work
{code}

And when we look at specific implementations of codec.createOutputStream(), 
codec's like Snappy always returns a Block(Compressor|Decompressor)Stream and 
it seems to me that the Snappy(Compressor|Decompressor) can only work on these 
type of streams.  Looks like the way it is used everywhere (including hbase) is 
that we always get a stream and work on it, the compressor / decompressor are 
passed in so we have a pool and don't have to constantly re-initialize these 
native libs.

For example I wrote this test, and if you use the 
Compression(Input|Output)Streams that are native to the codec the test will 
pass.  But if you create a Compressor/Decompressor and use a 
Compression(Input|Output)Stream that is not specific to this codec the test 
will fail. 

{code:java}
 @Test
  public void testHandlingWithStreams() throws Exception {
byte[] bytes = BytesGenerator.get(1024*64);
ByteArrayInputStream inputStream = new ByteArrayInputStream(bytes);
SnappyCodec codec = new SnappyCodec();
codec.setConf(new Configuration());
Compressor compressor = codec.createCompressor();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
CompressionOutputStream compressorStream = new CompressorStream(baos, 
compressor);
// if you replace compressorStream with 
// CompressionOutputStream compressorStream = 
codec.createOutputStream(baos, compressor);
// things work fine.
byte[] buffer = new byte[100];
int result;

while ((result = inputStream.read(buffer, 0, buffer.length)) != -1) {
  compressorStream.write(buffer, 0, result);
}
compressorStream.flush();
compressorStream.finish();

// lets make sure the compressed bytes are able to be decompressed to read
byte[] compressedBytes = baos.toByteArray();
ByteArrayInputStream bais = new ByteArrayInputStream(compressedBytes);
baos = new ByteArrayOutputStream();
Decompressor decompressor = codec.createDecompressor();
CompressionInputStream decompressorStream = new DecompressorStream(bais, 
decompressor);
// if you replace decompressorStream with 
// CompressionInputStream decompressorStream = 
codec.createInputStream(bais, decompressor);
// things work fine. 
byte[] decompressionBuffer = new byte[100];
while ((result = decompressorStream.read(decompressionBuffer, 0, 
buffer.length)) != -1) {
  baos.write(decompressionBuffer, 0, result);
}
decompressorStream.close();
byte[] decompressBytes = baos.toByteArray();
assertArrayEquals(bytes, decompressBytes);
  }
{code}

bq. I'm also not so sure about the minimum buffer assertion. I saw in the zstd 
unit tests there is a byte-by-byte streaming decompression test where it tries 
to decompress a buffer with a 1-byte input buffer

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-11-14 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664738#comment-15664738
 ] 

Jason Lowe commented on HADOOP-13578:
-

The problem with ignoring the Codec interface is that code using the 
Compressor/Decompressor interfaces will break if it happens to get the zstd 
codec but work for all others.  That's not really adding a full Codec for zstd. 
 I agree it's not the most ideal interface for the way zstd wants to be called, 
but that doesn't change the fact that existing code that should work as-is on a 
Hadoop-provided codec will not if we don't implement the 
Compressor/Decompressor interfaces.

It looks like the zstd code already includes a wrapper that translates the zlib 
API into the zstd API (see 
https://github.com/facebook/zstd/tree/dev/zlibWrapper) which we can maybe 
leverage or at least use as a model for the Compressor/Decompressor 
implementations.  To be clear, I think it's a good thing that the 
compressor/decompressor streams are using zstd more natively, but I also think 
it's important to not ignore the non-stream Codec APIs.  I'm also OK if the 
initial implementation of the zstd Compressor/Decompressor is not super 
optimal.  One tactic would be to always copy the input data to an internal 
input buffer and track the amount of data zstd wants.  Then the needsInput 
method is easy, just return true when the amount of data in the internal input 
buffer is less than what zstd last requested.  The decompress method would 
return 0 if needInput would return true.  Not super efficient since we make a 
copy of all the input data before passing it to zstd, but it should be 
relatively straightforward to implement.

I'm also not so sure about the minimum buffer assertion.  I saw in the zstd 
unit tests there is a byte-by-byte streaming decompression test where it tries 
to decompress a buffer with a 1-byte input buffer and a 1-byte output buffer.  
I also verified this independently by applying the following changes to the 
streaming_decompression.c example:
{noformat}
$ diff -u streaming_decompression.c.orig streaming_decompression.c
--- streaming_decompression.c.orig  2016-11-14 18:39:43.423069894 +
+++ streaming_decompression.c   2016-11-14 19:08:36.395286748 +
@@ -63,10 +63,10 @@
 static void decompressFile_orDie(const char* fname)
 {
 FILE* const fin  = fopen_orDie(fname, "rb");
-size_t const buffInSize = ZSTD_DStreamInSize();
+size_t const buffInSize = 1;
 void*  const buffIn  = malloc_orDie(buffInSize);
 FILE* const fout = stdout;
-size_t const buffOutSize = ZSTD_DStreamOutSize();  /* Guarantee to 
successfully flush at least one complete compressed block in all circumstances. 
*/
+size_t const buffOutSize = 1;
 void*  const buffOut = malloc_orDie(buffOutSize);
 
 ZSTD_DStream* const dstream = ZSTD_createDStream();
@@ -80,6 +80,7 @@
 while (input.pos < input.size) {
 ZSTD_outBuffer output = { buffOut, buffOutSize, 0 };
 toRead = ZSTD_decompressStream(dstream,  , );  /* 
toRead : size of next compressed block */
+   toRead = (toRead > buffInSize) ? buffInSize : toRead;
 if (ZSTD_isError(toRead)) { fprintf(stderr, 
"ZSTD_decompressStream() error : %s \n", ZSTD_getErrorName(toRead)); exit(12); }
 fwrite_orDie(buffOut, output.pos, fout);
 }
{noformat}

With those changes the streaming_decompression example was able to decompress 
zstd files even though it was using an extremely small input and output buffer. 
 So even though I'm not giving the zstd library a full block of input it's able 
to compensate.

bq. The finished field is more to reinitialize the zstd stream than the actual 
java decompressor stream.

Ah, I see that now.  Thanks for the explanation.

bq.  So we could allow the user to configure and take the larger of the 2 
buffer sizes in case they configure it with a buffer size that is too small.

See input/output buffer discussion above.  I don't think we need to apply 
safety rails to what the user specifies because the zstd library apparently can 
accommodate.

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, 
> HADOOP-13578.v2.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail:

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-11-11 Thread churro morales (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15657674#comment-15657674
 ] 

churro morales commented on HADOOP-13578:
-

bq. I'm confused why the codec returns null for getCompressorType, 
createCompressor, getDecompressorType, and createDecompressor. This is going to 
break code that expects to wield the compressor/decompressor rather than 
streams directly. I would expect it to work more like the DefaultCodec or 
BZip2Codec where the codec returns appropriate Compressor and Decompressor 
classes and leverages the existing CompressorStream, DecompressorStream, and 
CompressionCodec.Util classes to handle the stream interfaces of Codec. For 
example, the CompressorStream abstract class already handles the single-byte 
write method, protection from double-finish and double-close, the 
DecompressorStream already implemetns skip, etc.

The reason for this was that in testing the code, namely the decompressor 
logic.  The smallest unit of work in zstandard is a block.  I noticed many 
cases during decompression where I would fill up the output buffer and still 
have some data left in my input buffer.  The data in my input buffer was not a 
complete block of compressed data and thus I was having quite a few issues when 
using the decompressor API which I felt was really clunky, you have to return 
special values to see if you need input and then you have no control over how 
much of the buffer you fill with compressed data.  For example if I have half a 
block of compressed data at the end of the input buffer, it was really 
difficult to state 

I need more data in the input buffer, because I don't have a complete 
compressed block, 
I have to move that partial compressed block to the beginning of the input 
buffer 
I also have to ensure that needsInput now returns true, even though I have 
compressed data to decompress (but not enough for a block) 

The entire streaming decompressor API is built around Bzip2 and Zlib which if 
you look have the exact same C API's and even return the same values for their 
respective inflate / deflate methods.  I would much prefer to have an 
inputstream / outputstream and just have my library compress / decompress on 
that stream itself because the zstd api does not match that of zlib or bzip2. 

bq. The original approach used direct byte buffers, but the new patch no longer 
does. All the other codecs leverage direct byte buffers, so I'm curious about 
the reasoning for that change. I'm not a JVM expert, but I'm wondering if the 
*PrimitiveArrayCritical methods have unfavorable impacts on other threads in 
the JVM (e.g.: due to pausing GC or other effects). Given that 
GetPrimitiveArrayCritical could trigger an array copy in order to perform it's 
task and we have to copy the bytes from the output buffer into the output 
stream anyway, I would expect direct byte buffers to be faster for at least the 
output buffer case.

This is from the oracle documentation:

Here is the signature of GetPrimativeArrayCritical: void * 
GetPrimitiveArrayCritical(JNIEnv *env, jarray array, jboolean *isCopy);
Note that GetPrimitiveArrayCritical might still make a copy of the array if the 
VM internally represents arrays in a different format. Therefore we need to 
check its return value against NULL for possible out of memory situations.

Looks like for most of the jvm's on linux platforms it references a memory 
address, but we can still use direct byte buffers if you feel strongly about 
it.  At least for the output buffer case as you had stated.   I didnt notice 
any performance hit on my local test environment but can look further into this.

bq. Speaking of double-finish, it looks like that could be problematic and lead 
to double-free's in the native code layer. In addition to leveraging 
CompressorStream/DecompressorStream as mentioned above to help with this, we 
could zero the stream field after we are finished and check for a non-null 
stream context before doing operations.
We should throw NPE if the caller passes a null for the source data buffer.
Similarly the native code should throw an error if it cannot obtain pointers to 
the Java buffers. Currently it just silently returns no progress which will 
result in an infinite loop in practice as it tries to reach the end of the 
input and never gets there if the error keeps occurring on each JNI invocation.

totally agree, will fix this. 

bq. The documentation for the zstd streaming interface mentions that flushing 
or ending a stream may require multiple invocations in order to fully 
accomplish the task, but I don't see corresponding loops in the java code to 
handle that scenario:

Will make sure to take care of this scenario. 

bq. IOUtils.closeQuietly is being called on the output stream which could lead 
to silent data corruption. If there was an issue writing out the last bits of 
data as a result of the

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-11-03 Thread Jason Lowe (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633647#comment-15633647
]

Jason Lowe commented on HADOOP-13578:
-

Thanks for updating the patch! It's great to see compatibility between the
zstd CLI and the Hadoop codec.

I'm confused why the codec returns null for getCompressorType,
createCompressor, getDecompressorType, and createDecompressor. This is going
to break code that expects to wield the compressor/decompressor rather than
streams directly. I would expect it to work more like the DefaultCodec or
BZip2Codec where the codec returns appropriate Compressor and Decompressor
classes and leverages the existing CompressorStream, DecompressorStream, and
CompressionCodec.Util classes to handle the stream interfaces of Codec. For
example, the CompressorStream abstract class already handles the single-byte
write method, protection from double-finish and double-close, the
DecompressorStream already implemetns skip, etc.

The original approach used direct byte buffers, but the new patch no longer
does. All the other codecs leverage direct byte buffers, so I'm curious about
the reasoning for that change. I'm not a JVM expert, but I'm wondering if the
*PrimitiveArrayCritical methods have unfavorable impacts on other threads in
the JVM (e.g.: due to pausing GC or other effects). Given that
GetPrimitiveArrayCritical could trigger an array copy in order to perform it's
task and we have to copy the bytes from the output buffer into the output
stream anyway, I would expect direct byte buffers to be faster for at least the
output buffer case.

Speaking of double-finish, it looks like that could be problematic and lead to
double-free's in the native code layer. In addition to leveraging
CompressorStream/DecompressorStream as mentioned above to help with this, we
could zero the stream field after we are finished and check for a non-null
stream context before doing operations.

We should throw NPE if the caller passes a null for the source data buffer.

Similarly the native code should throw an error if it cannot obtain pointers to
the Java buffers. Currently it just silently returns no progress which will
result in an infinite loop in practice as it tries to reach the end of the
input and never gets there if the error keeps occurring on each JNI invocation.

The documentation for the zstd streaming interface mentions that flushing or
ending a stream may require multiple invocations in order to fully accomplish
the task, but I don't see corresponding loops in the java code to handle that
scenario:
{noformat}
* At any moment, it's possible to flush whatever data remains within buffer,
using ZSTD_flushStream().
* `output->pos` will be updated.
* Note some content might still be left within internal buffer if
`output->size` is too small.
* @return : nb of bytes still present within internal buffer (0 if it's empty)
*or an error code, which can be tested using ZSTD_isError().
*
* ZSTD_endStream() instructs to finish a frame.
* It will perform a flush and write frame epilogue.
* The epilogue is required for decoders to consider a frame completed.
* Similar to ZSTD_flushStream(), it may not be able to flush the full content
if `output->size` is too small.
* In which case, call again ZSTD_endStream() to complete the flush.
{noformat}

IOUtils.closeQuietly is being called on the output stream which could lead to
silent data corruption. If there was an issue writing out the last bits of
data as a result of the close call then this will silently eat the error. The
user is left with a "successful" operation that did not actually succeed.

The getLibraryName native method should do something sane for the non-UNIX
case, like returning the Unavailable string as some codecs do when they can't
compute it in the UNIX case. Bonus points for adding a WINDOWS case, and it
looks like we can model after the Windows implementations of getLibraryName in
the other codecs.

In the decompressor, why are we finished when we decode a frame? I thought it
was valid for a ZStandard compression stream to be made up of one or more
frames, so if there is more input after decoding a frame it seems like the
prudent thing to do is try decoding another frame.

It would be nice to allow the input and output buffer sizes to be configurable.
When used as part of wide merge sorts, there are going to be a lot of codec
instances. It may be necessary to use a smaller buffer size per codec to
reduce the merge's memory footprint due to all those codec I/O buffers. It
makes sense to use the library-recommended size as the default value, but it
should be straightforward to allow the user to override this.

Nit: srcSize is not really a size in the following code but rather a
terminating offset. Renaming it to something like endOffset would be more
clear, since

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-11-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630357#comment-15630357
 ] 

Hadoop QA commented on HADOOP-13578:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 1s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
58s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 12m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
22s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-project-dist . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
35s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  7m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
26s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 37s{color} | {color:orange} root: The patch generated 56 new + 91 unchanged 
- 1 fixed = 147 total (was 92) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
14s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
10s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 132 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
4s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-project-dist . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
33s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 11m 33s{color} 
| {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
31s{color} | {color:red} The patch generated 3 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}118m 29s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.net.TestDNS |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-10-06 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552135#comment-15552135
 ] 

Jason Lowe commented on HADOOP-13578:
-

I don't believe the block stuff has anything to do with HDFS blocks.  Rather it 
describes compression occurring in chunks (blocks) of data at a time.  Without 
the small header at the beginning of each block, it becomes difficult in a 
general way to know how much data is in the next compressed block when 
decompressing it.  Using the Block codec streams doesn't inherently make the 
data splittable since one can't easily locate the codec block boundaries at an 
arbitrary split in the data stream (i.e.: HDFS block boundaries).  IMHO if we 
want to chunk the data for splitting then we can just use a SequenceFile 
configured for block compression with this codec.

Using the Block streams is a big drawback since it makes the format 
incompatible with the compression standard.  This already causes problems with 
LZ4, see HADOOP-12990.  Rather that compressing in blocks that we have to put 
extra headers on to decode we can use the zstd streaming APIs to stream the 
data through the compressor and decompressor.  That lets us keep the file 
format compatible and avoids error scenarios where the codec is configured to 
use a buffer size that is too small to decompress one of the codec blocks.  
With the streaming API we are decoupling our buffer size from the size of the 
data to compress/decompress.

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-10-05 Thread churro morales (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15550599#comment-15550599
 ] 

churro morales commented on HADOOP-13578:
-

@jlowe thank you for the thorough review.  The reason that the zstd cli and 
hadoop can't read each other's compressed / decompressed data is that 
ZStandardCodec uses the Block(Compressor|Decompressor) stream.  I assumed this 
library would be used to compress large amounts of data.  So when you use this 
stream each block gets a header and some compressed data.  I believe the 8 
bytes you are referring are two ints (the size of the compressed and 
uncompressed block).  If you remove these headers then the cli will be able to 
read the zstd blocks and if you use the zstd-cli and compress a file (prepend 
the header for sizes) it will work in hadoop. 

The snappy compressor / decompressor works in the same way.  I do not believe 
you can compress in snappy format using hadoop then transfer the file locally 
and call Snappy.uncompress() without removing the headers. 

If we do not want this to be compressed at a block level, that is fine.  
Otherwise we can just add a utility in hadoop to take care of the block headers 
like they did with hadoop-snappy or some of the CLI libraries for snappy like 
snzip.  

As far as the decompressed bytes, I agree.  I will check to see that the size 
returned from the function that tells you how many bytes are necessary to 
uncompress the buffer and ensure thats not larger than our buffer size.  I can 
also add the isError and getErrorName to the decompression library.  The reason 
I explicitly checked if the expected size was equal to the desired size is 
because the error that zstd provided was too vague.  But I'll add it in case 
there are other errors. 

Yes I will look at Hadoop-13684.  The build of the codec was very similar to 
snappy because the license was BSD so we could package it in like snappy. 

I can also take care of the nits you described as well.  

Are we okay with the compression being at block level?  If we are then this 
implementation will work just like all of the other block compression codecs 
where it will add / require the header for the hadoop blocks.

Thanks again for the review.  



> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-10-05 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15549007#comment-15549007
 ] 

Jason Lowe commented on HADOOP-13578:
-

Thanks for updating the patch!

bq. what do you think about adding a test that would go through all the codecs 
and run the M/R job you used as your example with the MRMiniCluster

I think this would be overkill.  Those tests would not be fast to execute, 
especially across all codecs.  Also they won't run during precommit builds that 
just touch the codecs since they would necessarily have to be in the mapreduce 
project to pick up the MR cluster and MR job support code.  If this was simply 
a case of returning the wrong buffer length then I'm wondering why the basic 
sanity tests of the codec don't catch this.  Those tests should be updated to 
catch this without needing to run the full cluster end-to-end scenario.

Most of the tests above now pass, but the codec is still not able to produce 
output that the zstd CLI utility can understand, nor can the codec parse files 
that were created by the zstd CLI utility.  For example, trying to decode a 
file created by the zstd CLI utility blows up like this:
{noformat}
$ zstd < /etc/services | hadoop fs -put - services.zst
$ hadoop fs -text services.zst
WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of 
HADOOP_PREFIX.
2016-10-05 13:21:36,236 INFO  [main] compress.CodecPool 
(CodecPool.java:getDecompressor(181)) - Got brand-new decompressor [.zst]
-text: Fatal internal error
java.lang.ArrayIndexOutOfBoundsException
at 
org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.setInput(ZStandardDecompressor.java:109)
at 
org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104)
at 
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:91)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:65)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:126)
at 
org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:105)
at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:100)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:321)
at 
org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:293)
at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:275)
at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:259)
at 
org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119)
at org.apache.hadoop.fs.shell.Command.run(Command.java:166)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:326)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:384)
{noformat}

Similary the zstd CLI utility just complains that any output produced by the 
codec is not in in zstd format, e.g.:
{noformat}
$ hadoop fs -cat wcout-zstandard/part-r-0.zst | zstd -d
WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of 
HADOOP_PREFIX.
zstd: stdin: not in zstd format 
{noformat}

Looks like this codec is tacking on an extra 8 bytes, since when I strip those 
off then the zstd CLI utility is able to decode it.  Similarly when the codec 
tries to read this Hadoop-specific header from a file created by the zstd CLI 
utility then it blows up since that header is not there.  The GzipCodec works 
without this Hadoop-specific header, so we can model this codec after it to 
help make this codec compatible with the standard .zst file format.

The code still compiles with warnings:
{noformat}
[INFO] Running make -j 2 VERBOSE=1
[WARNING] 
/hadoop/y-src/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardDecompressor.c:
 In function 
‘Java_org_apache_hadoop_io_compress_zstd_ZStandardDecompressor_decompressBytes’:
[WARNING] 
/hadoop/y-src/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardDecompressor.c:82:
 warning: unused variable ‘uncompressed_direct_buf_len’
[WARNING] 
/hadoop/y-src/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardDecompressor.c:
 In function 
‘Java_org_apache_hadoop_io_compress_zstd_ZStandardDecompressor_decompressBytes’:
[WARNING] 
/hadoop/y-src/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardDecompressor.c:82:
 warning: unused variable ‘uncompressed_direct_buf_len’
{noformat}

Those warnings are a red flag.  The code is not checking if the amount of space 
available in the uncompressed

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-10-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15544232#comment-15544232
 ] 

Hadoop QA commented on HADOOP-13578:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
38s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 12m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
18s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-project-dist . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
45s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  7m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
35s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 35s{color} | {color:orange} root: The patch generated 53 new + 82 unchanged 
- 1 fixed = 135 total (was 83) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
 6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
13s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m  
9s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 7 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
5s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-project-dist . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m  
8s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 99m 50s{color} 
| {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
43s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}205m 48s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency |
|   | hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices |
|   |

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-10-03 Thread churro morales (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543553#comment-15543553
 ] 

churro morales commented on HADOOP-13578:
-

[~jlowe] Great catch!  I figured out the issue I was returning the wrong buffer 
length in the decompressBytes function in ZStandardDecompressor.c .  Made the 
change locally and everything works.  I'll fix the warnings, make sure all your 
test cases work and incorporate your earlier comments as well.  Thanks again 
for the review, I really appreciate you taking the time to take a look. 

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-10-03 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543405#comment-15543405
 ] 

Jason Lowe commented on HADOOP-13578:
-

bq.  As far as the first issue were you using the hadoop-mapreduce-native-task 
code for intermediate compression for MR jobs?

No, I don't believe so unless trunk uses that native code by default.  This was 
just a straightforward mapreduce wordcount job with just the two settings I 
mentioned above, mapreduce.map.output.compress=true and 
mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.ZStandardCodec.


> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-10-03 Thread churro morales (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543332#comment-15543332
 ] 

churro morales commented on HADOOP-13578:
-

[~jlowe] Thanks for the update.  I will look into the issues.  As far as the 
first issue were you using the hadoop-mapreduce-native-task code for 
intermediate compression for MR jobs?  If so I didn't implement that feature 
because I didn't know if enough people were interested.  As far as the 
warnings, I will clean them up first and will run a word count job to ensure 
that the decompression  / compression works correctly.   I'll see if I can 
reproduce your word count results, that is a bit concerning.  Thanks for the 
review, I'll take a look at it today / tomorrow. 

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-10-03 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15542571#comment-15542571
 ] 

Jason Lowe commented on HADOOP-13578:
-

Sorry for the delay in getting a more detailed review.  Before I delved deep 
into the code I ran the codec through some basic tests and found a number of 
problems.

The native code compiles with warnings that should be cleaned up:
{noformat}
[WARNING] 
/hadoop/y-src/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardDecompressor.c:
 In function 
‘Java_org_apache_hadoop_io_compress_zstd_ZStandardDecompressor_decompressBytes’:
[WARNING] 
/hadoop/y-src/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardDecompressor.c:110:
 warning: format ‘%d’ expects type ‘int’, but argument 5 has type ‘size_t’
[WARNING] 
/hadoop/y-src/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardDecompressor.c:110:
 warning: format ‘%d’ expects type ‘int’, but argument 5 has type ‘size_t’
[WARNING] 
/hadoop/y-src/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardDecompressor.c:
 In function 
‘Java_org_apache_hadoop_io_compress_zstd_ZStandardDecompressor_decompressBytes’:
[WARNING] 
/hadoop/y-src/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardDecompressor.c:110:
 warning: format ‘%d’ expects type ‘int’, but argument 5 has type ‘size_t’
[WARNING] 
/hadoop/y-src/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardDecompressor.c:110:
 warning: format ‘%d’ expects type ‘int’, but argument 5 has type ‘size_t’
{noformat}

The codec is not working as an intermediate codec for MapReduce jobs.  Running 
a wordcount job with -Dmapreduce.map.output.compress=true 
-Dmapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.GzipCodec 
works, but specifying -Dmapreduce.map.output.compress=true 
-Dmapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.ZStandardCodec
 causes the reducers to fail while fetching outputs complaining about premature 
EOF:
{noformat}
2016-10-03 13:51:32,140 INFO [fetcher#5] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#5 about to shuffle 
output of map attempt_1475501532481_0007_m_00_0 decomp: 323113 len: 93339 
to MEMORY
2016-10-03 13:51:32,149 WARN [fetcher#5] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to shuffle for fetcher#5
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:209)
at 
org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.doShuffle(InMemoryMapOutput.java:90)
at 
org.apache.hadoop.mapreduce.task.reduce.IFileWrappedMapOutput.shuffle(IFileWrappedMapOutput.java:63)
at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:536)
at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
{noformat}

The codec also has some issues with MapReduce jobs when reading input from a 
previous job's output that has been zstd compressed.  For example, this 
sequence of steps generates output one would expect, where we're effectively 
word counting the output of wordcount on /etc/services (just some sample input 
for wordcount):
{noformat}
$ hadoop fs -put /etc/services wcin
$ hadoop jar 
$HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples*.jar wordcount 
-Dmapreduce.map.output.compress=true 
-Dmapreduce.output.fileoutputformat.compress=true 
-Dmapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
 wcin wcout-gzip
$ hadoop jar 
$HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples*.jar wordcount 
wcout-gzip wcout-gzip2
{noformat}
But if we do the same with org.apache.hadoop.io.compress.ZStandardCodec there's 
an odd record consisting of about 25K of NULs (i.e.: 0x00 bytes) in the output 
of the second job.

The output of the ZStandardCodec is not readable by the zstd CLI utility, nor 
is output generated by the zstd CLI utility readable by ZStandardCodec.

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-09-30 Thread churro morales (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15536534#comment-15536534
 ] 

churro morales commented on HADOOP-13578:
-

[~jlowe]  Just wanted to let you know, I have incorporated your changes, but 
was waiting for a more detailed review before putting a patch up.  Don't want 
to spam you with too many patches :)

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-09-26 Thread churro morales (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524323#comment-15524323
 ] 

churro morales commented on HADOOP-13578:
-

[~jlowe] Thank you for taking a look.
bq. The BUILDING.txt addition must have been copied from snappy, since a lot of 
it still refers to snappy.
Yes did a copy and paste with the Building.txt, will take a look.  

bq.  There's a TODO in the code that should get sorted out:
Yes I think I have it figured out, had to read up a bit :). Since we empty 
compress the entire buffer at once, that means it's transformed into a single 
ZStandard Frame. 
So the overhead is a Magic # (4 bytes) + Header (14 bytes max) + Checksum 
(optional 4 bytes). So I believe that can be a constant of 22 Bytes because of 
the way we compress in hadoop.  I'll update that as well. 

bq. Shouldn't this come from IO_COMPRESSION_CODEC_ZSTD_LEVEL_DEFAULT that's 
already in the config keys?
Correct!, will change that back to 3.  Also will use the constant.  

I'll incorporate your fixes.  Comments much appreciated. 





> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HADOOP-13578.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression

2016-09-22 Thread churro morales (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514052#comment-15514052
 ] 

churro morales commented on HADOOP-13578:
-

i was wondering if people were interested in getting zstandard integrated into 
the hadoop-mapreduce-client-nativetask?  That is my only task left for trunk 
(since that code only resides there).  I have backports ready for the 2.7 and 
2.6 branches as well.  Maybe we can split that feature up into a separate jira 
ticket?  If thats the case I can have patches up very soon.  Was just wondering 
everyone's thoughts here?

> Add Codec for ZStandard Compression
> ---
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: churro morales
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

60 matches

Mail list logo