Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/15403 )
Change subject: IMPALA-6505: Min-Max predicate push down in ORC scanner ...................................................................... Patch Set 2: Some additional design choices for ORC. 1. Query option minmax_filter_threshold provides the control on when the apply a filter. That is, if the overlap of a filter with that of the data unit (i.e. a Parquet page) is over the threshold, do not apply the filter. In the case of ORC, the min and max value in the column stats (such as IntegerColumnStatistics) could be used to facilitate similar pruning. 2. Min/max filters for Parquet are turned on only for the leading lexically sorted column, every Z-order sorted column, or partition columns, mainly for the sweet spot for performance benefits. I wonder if we should be doing the same for ORC. The relevant controls are MINMAX_FILTER_SORTED_COLUMNS and MINMAX_FILTER_PARTITION_COLUMNS. It will be nice if there can be a level of consistency between ORC and PARQUET as far as min/max filtering is concerned. -- To view, visit http://gerrit.cloudera.org:8080/15403 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I136622413db21e0941d238ab6aeea901a6464845 Gerrit-Change-Number: 15403 Gerrit-PatchSet: 2 Gerrit-Owner: Norbert Luksa <[email protected]> Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Norbert Luksa <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 19 Aug 2021 15:54:40 +0000 Gerrit-HasComments: No
