Qifan Chen has uploaded a new patch set (#12). (
http://gerrit.cloudera.org:8080/17075 )
Change subject: IMPALA-10494: Making use of the min/max column stats to improve
min/max filters
......................................................................
IMPALA-10494: Making use of the min/max column stats to improve min/max filters
This patch adds the functionality to compute the minimal and
the maximal value for a column of type integers, float or double
for parquet tables, and to make use of the new stats to discard
the min/max filters whose coverage are too close to the actual
range.
Only the min/max values for non-partition columns are stored in HMS.
The min/max values for partition columns are computed in coordinator.
Two new columns 'Min' and 'Max' are added in the output of the
show column stats command as shown below.
show column stats tpcds_parquet.store_sales
+-----------------------+--------------+-...-------+---------+---------+
| Column | Type | #Falses | Min | Max |
+-----------------------+--------------+-...-------+---------+---------+
| ss_sold_time_sk | INT | -1 | 28800 | 75599 |
| ss_item_sk | BIGINT | -1 | 1 | 18000 |
| ss_customer_sk | INT | -1 | 1 | 100000 |
| ss_cdemo_sk | INT | -1 | 15 | 1920797 |
| ss_hdemo_sk | INT | -1 | 1 | 7200 |
| ss_addr_sk | INT | -1 | 1 | 50000 |
| ss_store_sk | INT | -1 | 1 | 10 |
| ss_promo_sk | INT | -1 | 1 | 300 |
| ss_ticket_number | BIGINT | -1 | 1 | 240000 |
| ss_quantity | INT | -1 | 1 | 100 |
| ss_wholesale_cost | DECIMAL(7,2) | -1 | -1 | -1 |
| ss_list_price | DECIMAL(7,2) | -1 | -1 | -1 |
| ss_sales_price | DECIMAL(7,2) | -1 | -1 | -1 |
| ss_ext_discount_amt | DECIMAL(7,2) | -1 | -1 | -1 |
| ss_ext_sales_price | DECIMAL(7,2) | -1 | -1 | -1 |
| ss_ext_wholesale_cost | DECIMAL(7,2) | -1 | -1 | -1 |
| ss_ext_list_price | DECIMAL(7,2) | -1 | -1 | -1 |
| ss_ext_tax | DECIMAL(7,2) | -1 | -1 | -1 |
| ss_coupon_amt | DECIMAL(7,2) | -1 | -1 | -1 |
| ss_net_paid | DECIMAL(7,2) | -1 | -1 | -1 |
| ss_net_paid_inc_tax | DECIMAL(7,2) | -1 | -1 | -1 |
| ss_net_profit | DECIMAL(7,2) | -1 | -1 | -1 |
| ss_sold_date_sk | INT | -1 | 2450816 | 2452642 |
+-----------------------+--------------+-...-------+---------+---------+
Testing:
- Added TestLowAndHighValueShort and TestLowAndHighValueInt to
IncrStatsUtilTest
TODO:
1. Test compute stats for timestamp and date columns;
2. Test filters being disabled at the scan node;
3. Add logic to disable min/max filters inside HJ builder via
the column stats.
Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df
---
M be/src/exec/catalog-op-executor.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/incr-stats-util-test.cc
M be/src/exec/incr-stats-util.cc
M be/src/exec/incr-stats-util.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/service/hs2-util.cc
M be/src/service/hs2-util.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/min-max-filter.h
M common/thrift/CatalogObjects.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M
testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
26 files changed, 847 insertions(+), 68 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/17075/12
--
To view, visit http://gerrit.cloudera.org:8080/17075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df
Gerrit-Change-Number: 17075
Gerrit-PatchSet: 12
Gerrit-Owner: Qifan Chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>