Qifan Chen has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/17478 )

Change subject: IMPALA-10709: Min/max filters should be enabled for joins into 
sorted columns in Parquet tables
......................................................................

IMPALA-10709: Min/max filters should be enabled for joins into sorted columns 
in Parquet tables

This patch enables min/max filters for equi-joins into sort by
columns in a Parquet table by default. This is to take the addvantage
of the min/max values being fully sorted in each data file for the
table. When there are multiple sort by columns in the table, only the
leading column will be assigned a min/max filter. The control knob
is minmax_filter_sorted_columns, default to true.

When query option minmax_filter_sorted_columns is true and query
option minmax_filter_threshold is 0, the patch automatically assigns
a reasonable value for the threshhold, and selects PAGE to be the
filtering level as normally specified via option
minmax_filtering_level. When minmax_filter_threshold is greater than 0,
then no adjustment will be made to both options
(minmax_filter_threshold and minmax_filtering_level).

When minmax_filter_sorted_columns is set to false, no min/max filters
will be specifically assigned to the leading sort by columns.

Testing:
  1). Added two new tests in overlap_min_max_filters.test to verify
      a) min/max filters are only created for leading sort by column;
      b) query option minmax_filter_sorted_columns works.
  2). Core [TBD]
  3). Performance [TBD]

Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M 
testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
10 files changed, 125 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/17478/7
--
To view, visit http://gerrit.cloudera.org:8080/17478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
Gerrit-Change-Number: 17478
Gerrit-PatchSet: 7
Gerrit-Owner: Qifan Chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>

Reply via email to