[jira] [Updated] (PARQUET-1866) Replace Hadoop ZSTD with JNI-ZSTD

Xinli Shang (Jira) Thu, 21 May 2020 16:35:31 -0700


     [ 
https://issues.apache.org/jira/browse/PARQUET-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Xinli Shang updated PARQUET-1866:
---------------------------------
    Description: 
The parquet-mr repo has been using 
[ZSTD-JNI|https://github.com/luben/zstd-jni/tree/master/src/main/java/com/github/luben/zstd]
 for the parquet-cli project. It is a cleaner approach to use this JNI than 
using Hadoop ZSTD compression, because 1) on the developing box, installing 
Hadoop is cumbersome, 2) Older version of Hadoop doesn't support ZSTD. 
Upgrading Hadoop is another pain. This Jira is to replace Hadoop ZSTD with 
ZSTD-JNI for parquet-hadoop project. 

According to the author of ZSTD-JNI, Flink, Spark, Cassandra all use ZSTD-JNI 
for ZSTD.

Another approach is to use https://github.com/airlift/aircompressor which is a 
pure Java implementation. But it seems the compression level is not adjustable 
in aircompressor. 


  was:
The parquet-mr repo has been using 
[ZSTD-JNI|https://github.com/luben/zstd-jni/tree/master/src/main/java/com/github/luben/zstd]
 for the parquet-cli project. It is a cleaner approach to use this JNI than 
using Hadoop ZSTD compression, because 1) on the developing box, installing 
Hadoop is cumbersome, 2) Older version of Hadoop doesn't support ZSTD. 
Upgrading Hadoop is another pain. This Jira is to replace Hadoop ZSTD with 
ZSTD-JNI for parquet-hadoop project. 

 According to the author of ZSTD-JNI, Flink, Spark, Cassandra all use ZSTD-JNI 
for ZSTD.



> Replace Hadoop ZSTD with JNI-ZSTD
> ---------------------------------
>
>                 Key: PARQUET-1866
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1866
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>    Affects Versions: 1.12.0
>            Reporter: Xinli Shang
>            Assignee: Xinli Shang
>            Priority: Major
>             Fix For: 1.12.0
>
>
> The parquet-mr repo has been using 
> [ZSTD-JNI|https://github.com/luben/zstd-jni/tree/master/src/main/java/com/github/luben/zstd]
>  for the parquet-cli project. It is a cleaner approach to use this JNI than 
> using Hadoop ZSTD compression, because 1) on the developing box, installing 
> Hadoop is cumbersome, 2) Older version of Hadoop doesn't support ZSTD. 
> Upgrading Hadoop is another pain. This Jira is to replace Hadoop ZSTD with 
> ZSTD-JNI for parquet-hadoop project. 
> According to the author of ZSTD-JNI, Flink, Spark, Cassandra all use ZSTD-JNI 
> for ZSTD.
> Another approach is to use https://github.com/airlift/aircompressor which is 
> a pure Java implementation. But it seems the compression level is not 
> adjustable in aircompressor. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (PARQUET-1866) Replace Hadoop ZSTD with JNI-ZSTD

Reply via email to