Re: Add FilteredPageReader to filter rows based on page statistics

2022-10-31 Thread Micah Kornfield
Hi Fatemah, I think there are likely two things to consider here: 1. How will expressions be modeled? There are already some examples of using expressions in Arrow for pruning predicates [1]. Do you plan to re-use them? 2. Along these lines is the proposed approach taken because the API to

[jira] [Assigned] (PARQUET-2188) Add SkipRecords API to RecordReader

2022-10-31 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou reassigned PARQUET-2188: --- Assignee: fatemah > Add SkipRecords API to RecordReader >

[jira] [Resolved] (PARQUET-2188) Add SkipRecords API to RecordReader

2022-10-31 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved PARQUET-2188. - Fix Version/s: cpp-11.0.0 Resolution: Fixed Issue resolved by pull request

Add FilteredPageReader to filter rows based on page statistics

2022-10-31 Thread Fatemah Panahi
-- Sending as an email in case Jira messages are filtered out. Please let me know your thoughts on this. Thanks! Jira ticket: https://issues.apache.org/jira/browse/PARQUET-2210 Currently, we do not use the statistics that is stored in the page headers for pruning the rows that we read. Row group

[jira] [Created] (PARQUET-2210) Add FilteredPageReader to filter rows based on page statistics

2022-10-31 Thread fatemah (Jira)
fatemah created PARQUET-2210: Summary: Add FilteredPageReader to filter rows based on page statistics Key: PARQUET-2210 URL: https://issues.apache.org/jira/browse/PARQUET-2210 Project: Parquet

[jira] [Updated] (PARQUET-2209) [C++] Optimize skip for the case that number of values to skip equals page size

2022-10-31 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-2209: Summary: [C++] Optimize skip for the case that number of values to skip equals page size

[jira] [Updated] (PARQUET-2209) [C++] Optimize skip for the case that number of values to skip equals page size

2022-10-31 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-2209: Component/s: parquet-cpp > [C++] Optimize skip for the case that number of values to

[jira] [Updated] (PARQUET-2209) Optimize skip for the case that number of values to skip equals page size

2022-10-31 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated PARQUET-2209: Labels: pull-request-available (was: ) > Optimize skip for the case that number of

[jira] [Created] (PARQUET-2209) Optimize skip for the case that number of values to skip equals page size

2022-10-31 Thread fatemah (Jira)
fatemah created PARQUET-2209: Summary: Optimize skip for the case that number of values to skip equals page size Key: PARQUET-2209 URL: https://issues.apache.org/jira/browse/PARQUET-2209 Project: Parquet

[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type

2022-10-31 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1762#comment-1762 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on PR #995: URL:

[GitHub] [parquet-mr] jinyius commented on pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-31 Thread GitBox
jinyius commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1297295385 @ggershinsky i'd love to just hit the button. i don't see it. the workflow for travis ci had a failure due to a transient connection issue, and so it wasn't giving me the

[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type

2022-10-31 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626665#comment-17626665 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius opened a new pull request, #995: URL:

[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type

2022-10-31 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626663#comment-17626663 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on PR #995: URL:

[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type

2022-10-31 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626664#comment-17626664 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius closed pull request #995: PARQUET-1711:

[GitHub] [parquet-mr] jinyius commented on pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-31 Thread GitBox
jinyius commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1297292700 > -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [parquet-mr] jinyius closed pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-31 Thread GitBox
jinyius closed pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth URL: https://github.com/apache/parquet-mr/pull/995 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

JIRA issue tracker registration

2022-10-31 Thread Antoine Pitrou
Hello, I don't know if everyone here is already aware, but the Apache Software Foundation has decided that user registration on JIRA will very soon be moderated in order to fight against issue/comment spam. Concretely, user creation requests will soon have to be approved by a project's PMC

[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type

2022-10-31 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626624#comment-17626624 ] ASF GitHub Bot commented on PARQUET-1711: - ggershinsky commented on PR #995: URL:

[GitHub] [parquet-mr] ggershinsky commented on pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-31 Thread GitBox
ggershinsky commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1297193514 yep, just the squash/merge button. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: Modular encryption to support arrays and nested arrays

2022-10-31 Thread Gidon Gershinsky
Parquet columnar encryption supports these types. Currently, it requires an explicit full path for each column to be encrypted. Your sample will work with *spark.sparkContext.hadoopConfiguration.set("parquet.encryption.column.keys", "k2:rider.list.element.foo,rider.list.element.bar")* Having said

[jira] [Created] (PARQUET-2208) Add details to nested column encryption config doc and exception text

2022-10-31 Thread Gidon Gershinsky (Jira)
Gidon Gershinsky created PARQUET-2208: - Summary: Add details to nested column encryption config doc and exception text Key: PARQUET-2208 URL: https://issues.apache.org/jira/browse/PARQUET-2208