Dong Chen created HIVE-10252:
--------------------------------

             Summary: Make PPD work for Parquet in row group level
                 Key: HIVE-10252
                 URL: https://issues.apache.org/jira/browse/HIVE-10252
             Project: Hive
          Issue Type: Sub-task
            Reporter: Dong Chen
            Assignee: Dong Chen


In Hive, predicate pushdown figures out the search condition in HQL, serialize 
it, and push to file format. ORC could use the predicate to filter stripes. 
Similarly, Parquet should use the statics saved in row group to filter not 
match row group. But it does not work.

In {{ParquetRecordReaderWrapper}}, it get splits with all row groups (client 
side), and push the filter to Parquet for further processing (parquet side). 
But in  {{ParquetRecordReader.initializeInternalReader()}}, if the splits have 
already been selected by client side, it will not handle filter again.

We should make the behavior consistent in Hive. Maybe we could get splits, 
filter them, and then pass to parquet. This means using client side strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to