Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15403 )

Change subject: IMPALA-6505: Min-Max predicate push down in ORC scanner
......................................................................


Patch Set 2:

Some additional design choices for ORC.

1. Query option minmax_filter_threshold provides the control on when the apply 
a filter. That is, if the overlap of a filter with that of the data unit (i.e. 
a Parquet page) is over the threshold, do not apply the filter. In the case of 
ORC, the min and max value in the column stats (such as 
IntegerColumnStatistics) could be used to facilitate similar pruning.

2. Min/max filters for Parquet are turned on only for the leading lexically 
sorted column, every Z-order sorted column, or partition columns, mainly for 
the sweet spot for performance benefits. I wonder if we should be doing the 
same for ORC. The relevant controls are MINMAX_FILTER_SORTED_COLUMNS and 
MINMAX_FILTER_PARTITION_COLUMNS.

It will be nice if there can be a level of consistency between ORC and PARQUET 
as far as min/max filtering is concerned.


--
To view, visit http://gerrit.cloudera.org:8080/15403
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I136622413db21e0941d238ab6aeea901a6464845
Gerrit-Change-Number: 15403
Gerrit-PatchSet: 2
Gerrit-Owner: Norbert Luksa <[email protected]>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Norbert Luksa <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 19 Aug 2021 15:54:40 +0000
Gerrit-HasComments: No

Reply via email to