Re: Add FilteredPageReader to filter rows based on page statistics

2022-11-02 Thread Fatemah Panahi
What do we think about introducing a dependency on arrow::compute::Expression for specifying and evaluating the filter? Is that acceptable? On Tue, Nov 1, 2022 at 10:42 AM Fatemah Panahi wrote: > Hi Micah, > > Answers inline. > > Another item that we need to think about is

Re: Add FilteredPageReader to filter rows based on page statistics

2022-11-01 Thread Fatemah Panahi
[1] > > https://github.com/apache/arrow/blob/5e49174d69deb9d1cbbdf82bc8041b90098f560b/cpp/src/arrow/dataset/file_parquet.cc > > On Mon, Oct 31, 2022 at 9:50 AM Fatemah Panahi > wrote: > > > -- Sending as an email in case Jira messages are filtered out. Please let > > me know your thoughts

Add FilteredPageReader to filter rows based on page statistics

2022-10-31 Thread Fatemah Panahi
-- Sending as an email in case Jira messages are filtered out. Please let me know your thoughts on this. Thanks! Jira ticket: https://issues.apache.org/jira/browse/PARQUET-2210 Currently, we do not use the statistics that is stored in the page headers for pruning the rows that we read. Row group