[jira] [Commented] (PARQUET-1866) Replace Hadoop ZSTD with JNI-ZSTD

2020-05-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117370#comment-17117370
 ] 

ASF GitHub Bot commented on PARQUET-1866:
-

shangxinli edited a comment on pull request #793:
URL: https://github.com/apache/parquet-mr/pull/793#issuecomment-634104930


   @luben, Do you have time to review the code? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Replace Hadoop ZSTD with JNI-ZSTD
> -
>
> Key: PARQUET-1866
> URL: https://issues.apache.org/jira/browse/PARQUET-1866
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.12.0
>Reporter: Xinli Shang
>Assignee: Xinli Shang
>Priority: Major
> Fix For: 1.12.0
>
>
> The parquet-mr repo has been using 
> [ZSTD-JNI|https://github.com/luben/zstd-jni/tree/master/src/main/java/com/github/luben/zstd]
>  for the parquet-cli project. It is a cleaner approach to use this JNI than 
> using Hadoop ZSTD compression, because 1) on the developing box, installing 
> Hadoop is cumbersome, 2) Older version of Hadoop doesn't support ZSTD. 
> Upgrading Hadoop is another pain. This Jira is to replace Hadoop ZSTD with 
> ZSTD-JNI for parquet-hadoop project. 
> According to the author of ZSTD-JNI, Flink, Spark, Cassandra all use ZSTD-JNI 
> for ZSTD.
> Another approach is to use https://github.com/airlift/aircompressor which is 
> a pure Java implementation. But it seems the compression level is not 
> adjustable in aircompressor. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-mr] shangxinli edited a comment on pull request #793: PARQUET-1866: Replace Hadoop ZSTD with JNI-ZSTD

2020-05-26 Thread GitBox


shangxinli edited a comment on pull request #793:
URL: https://github.com/apache/parquet-mr/pull/793#issuecomment-634104930


   @luben, Do you have time to review the code? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [parquet-mr] luben commented on pull request #793: PARQUET-1866: Replace Hadoop ZSTD with JNI-ZSTD

2020-05-26 Thread GitBox


luben commented on pull request #793:
URL: https://github.com/apache/parquet-mr/pull/793#issuecomment-634146458


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-1866) Replace Hadoop ZSTD with JNI-ZSTD

2020-05-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117352#comment-17117352
 ] 

ASF GitHub Bot commented on PARQUET-1866:
-

luben commented on pull request #793:
URL: https://github.com/apache/parquet-mr/pull/793#issuecomment-634146458


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Replace Hadoop ZSTD with JNI-ZSTD
> -
>
> Key: PARQUET-1866
> URL: https://issues.apache.org/jira/browse/PARQUET-1866
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.12.0
>Reporter: Xinli Shang
>Assignee: Xinli Shang
>Priority: Major
> Fix For: 1.12.0
>
>
> The parquet-mr repo has been using 
> [ZSTD-JNI|https://github.com/luben/zstd-jni/tree/master/src/main/java/com/github/luben/zstd]
>  for the parquet-cli project. It is a cleaner approach to use this JNI than 
> using Hadoop ZSTD compression, because 1) on the developing box, installing 
> Hadoop is cumbersome, 2) Older version of Hadoop doesn't support ZSTD. 
> Upgrading Hadoop is another pain. This Jira is to replace Hadoop ZSTD with 
> ZSTD-JNI for parquet-hadoop project. 
> According to the author of ZSTD-JNI, Flink, Spark, Cassandra all use ZSTD-JNI 
> for ZSTD.
> Another approach is to use https://github.com/airlift/aircompressor which is 
> a pure Java implementation. But it seems the compression level is not 
> adjustable in aircompressor. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1866) Replace Hadoop ZSTD with JNI-ZSTD

2020-05-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117298#comment-17117298
 ] 

ASF GitHub Bot commented on PARQUET-1866:
-

shangxinli commented on pull request #793:
URL: https://github.com/apache/parquet-mr/pull/793#issuecomment-634104930


   @karavelov, Do you have time to review the code? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Replace Hadoop ZSTD with JNI-ZSTD
> -
>
> Key: PARQUET-1866
> URL: https://issues.apache.org/jira/browse/PARQUET-1866
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.12.0
>Reporter: Xinli Shang
>Assignee: Xinli Shang
>Priority: Major
> Fix For: 1.12.0
>
>
> The parquet-mr repo has been using 
> [ZSTD-JNI|https://github.com/luben/zstd-jni/tree/master/src/main/java/com/github/luben/zstd]
>  for the parquet-cli project. It is a cleaner approach to use this JNI than 
> using Hadoop ZSTD compression, because 1) on the developing box, installing 
> Hadoop is cumbersome, 2) Older version of Hadoop doesn't support ZSTD. 
> Upgrading Hadoop is another pain. This Jira is to replace Hadoop ZSTD with 
> ZSTD-JNI for parquet-hadoop project. 
> According to the author of ZSTD-JNI, Flink, Spark, Cassandra all use ZSTD-JNI 
> for ZSTD.
> Another approach is to use https://github.com/airlift/aircompressor which is 
> a pure Java implementation. But it seems the compression level is not 
> adjustable in aircompressor. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-mr] shangxinli commented on pull request #793: PARQUET-1866: Replace Hadoop ZSTD with JNI-ZSTD

2020-05-26 Thread GitBox


shangxinli commented on pull request #793:
URL: https://github.com/apache/parquet-mr/pull/793#issuecomment-634104930


   @karavelov, Do you have time to review the code? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-1684) [parquet-protobuf] default protobuf field values are stored as nulls

2020-05-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117275#comment-17117275
 ] 

ASF GitHub Bot commented on PARQUET-1684:
-

sarathdeevi commented on pull request #702:
URL: https://github.com/apache/parquet-mr/pull/702#issuecomment-634162166


   Hey, wassup??



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [parquet-protobuf] default protobuf field values are stored as nulls
> 
>
> Key: PARQUET-1684
> URL: https://issues.apache.org/jira/browse/PARQUET-1684
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.10.0, 1.11.0
>Reporter: George Haddad
>Priority: Major
>  Labels: pull-request-available
>
> When the source is a protobuf3 message, and the target file is Parquet, all 
> the default values are stored in the output parquet as `{{null`}} instead of 
> the actual type's default value.
>  For example, if the field is of type `int32`, `double` or `enum` and it 
> hasn't been set, the parquet value is `{{null`}} instead of `0`. When the 
> field's type is a `string` that hasn't been set, the parquet value is 
> {{`null`}} instead of an empty string.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-mr] sarathdeevi commented on pull request #702: PARQUET-1684: dont store default protobuf values as null for proto3

2020-05-26 Thread GitBox


sarathdeevi commented on pull request #702:
URL: https://github.com/apache/parquet-mr/pull/702#issuecomment-634162166


   Hey, wassup??



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Today's Parquet sync Zoom meeting

2020-05-26 Thread Xinli shang
Hi all,

In the meeting invitation, you should have received the password. For your
convenience, I paste it below. See you in 30 minutes!

https://uber.zoom.us/j/3523778975


Meeting ID: 352 377 8975
Password: 030115

-- 
Xinli Shang