Re: Parquet - 41

2020-04-20 Thread Junjie Chen
As far as I know, not implemented yet. The thrift is update-to-date now, would you like to contribute? Things we need are: 1. xxhash c++ implementation 2. reader and writer for the bloom filter 3. filtering logic for row group Implementing the reader would be a good start. On Tue, Apr 21, 2020

Re: Parquet - 41

2020-04-20 Thread ARL122
Hi Is the C++ version of bloom filter implemented in Arrow Parquet C++? https://issues.apache.org/jira/browse/PARQUET-41 [PARQUET-41] Add bloom filters to parquet statistics - ASF JIRA For row groups with no dictionary, we could still produce a

[jira] [Commented] (PARQUET-1381) Add merge blocks command to parquet-tools

2020-04-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088145#comment-17088145 ] ASF GitHub Bot commented on PARQUET-1381: - brimzi commented on a change in pull request #775:

[GitHub] [parquet-mr] brimzi commented on a change in pull request #775: PARQUET-1381: add parquet block merging feature

2020-04-20 Thread GitBox
brimzi commented on a change in pull request #775: URL: https://github.com/apache/parquet-mr/pull/775#discussion_r411741293 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java ## @@ -919,6 +895,59 @@ public void

[jira] [Commented] (PARQUET-1381) Add merge blocks command to parquet-tools

2020-04-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088105#comment-17088105 ] ASF GitHub Bot commented on PARQUET-1381: - brimzi commented on a change in pull request #775:

[GitHub] [parquet-mr] brimzi commented on a change in pull request #775: PARQUET-1381: add parquet block merging feature

2020-04-20 Thread GitBox
brimzi commented on a change in pull request #775: URL: https://github.com/apache/parquet-mr/pull/775#discussion_r411711834 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/RowGroupMerger.java ## @@ -0,0 +1,634 @@ +/* Review comment: Same issue

[jira] [Commented] (PARQUET-1381) Add merge blocks command to parquet-tools

2020-04-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088102#comment-17088102 ] ASF GitHub Bot commented on PARQUET-1381: - brimzi commented on a change in pull request #775:

[GitHub] [parquet-mr] brimzi commented on a change in pull request #775: PARQUET-1381: add parquet block merging feature

2020-04-20 Thread GitBox
brimzi commented on a change in pull request #775: URL: https://github.com/apache/parquet-mr/pull/775#discussion_r411711834 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/RowGroupMerger.java ## @@ -0,0 +1,634 @@ +/* Review comment: Same issue

[GitHub] [parquet-mr] brimzi commented on a change in pull request #775: PARQUET-1381: add parquet block merging feature

2020-04-20 Thread GitBox
brimzi commented on a change in pull request #775: URL: https://github.com/apache/parquet-mr/pull/775#discussion_r411708296 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/RowGroupMerger.java ## @@ -0,0 +1,634 @@ +/* + * Licensed to the Apache Software

[jira] [Commented] (PARQUET-1381) Add merge blocks command to parquet-tools

2020-04-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088099#comment-17088099 ] ASF GitHub Bot commented on PARQUET-1381: - brimzi commented on a change in pull request #775:

[jira] [Commented] (PARQUET-1381) Add merge blocks command to parquet-tools

2020-04-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088088#comment-17088088 ] ASF GitHub Bot commented on PARQUET-1381: - brimzi commented on a change in pull request #775:

[GitHub] [parquet-mr] brimzi commented on a change in pull request #775: PARQUET-1381: add parquet block merging feature

2020-04-20 Thread GitBox
brimzi commented on a change in pull request #775: URL: https://github.com/apache/parquet-mr/pull/775#discussion_r411696939 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/RowGroupMerger.java ## @@ -0,0 +1,634 @@ +/* + * Licensed to the Apache Software

[GitHub] [parquet-mr] brimzi commented on a change in pull request #775: PARQUET-1381: add parquet block merging feature

2020-04-20 Thread GitBox
brimzi commented on a change in pull request #775: URL: https://github.com/apache/parquet-mr/pull/775#discussion_r411692398 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/RowGroupMerger.java ## @@ -0,0 +1,634 @@ +/* + * Licensed to the Apache Software

[jira] [Commented] (PARQUET-1381) Add merge blocks command to parquet-tools

2020-04-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088085#comment-17088085 ] ASF GitHub Bot commented on PARQUET-1381: - brimzi commented on a change in pull request #775:

[GitHub] [parquet-mr] shangxinli commented on issue #776: PARQUET-1229: Parquet MR encryption

2020-04-20 Thread GitBox
shangxinli commented on issue #776: URL: https://github.com/apache/parquet-mr/pull/776#issuecomment-616599564 Is this ready to review? Since there is no comment yet, can you squash it to one single commit to make the review easier?

[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-04-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087801#comment-17087801 ] ASF GitHub Bot commented on PARQUET-1229: - shangxinli commented on issue #776: URL:

[jira] [Resolved] (PARQUET-1699) Could not resolve org.apache.yetus:audience-annotations:0.11.0

2020-04-20 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-1699. --- Resolution: Fixed > Could not resolve org.apache.yetus:audience-annotations:0.11.0

[jira] [Assigned] (PARQUET-1699) Could not resolve org.apache.yetus:audience-annotations:0.11.0

2020-04-20 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-1699: - Assignee: Priyank Bagrecha > Could not resolve

Filtering GitBox e-mails out of dev@?

2020-04-20 Thread Wes McKinney
Infra made some changes to ensure that GitHub notifications are archived, but that has resulted in new e-mails being sent to dev@ In Arrow, we didn't want these so we have * https://issues.apache.org/jira/browse/INFRA-20149 * https://issues.apache.org/jira/browse/ARROW-8520 * Final solution:

[jira] [Commented] (PARQUET-1841) [C++] Experiment to see if using SIMD shuffle operations for DecodeSpaced improves performance

2020-04-20 Thread Frank Du (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087475#comment-17087475 ] Frank Du commented on PARQUET-1841: --- I wrote a draft implementation for AVX512 int32_t/int64_t path