Janaki Lahorani has uploaded a new patch set (#14). ( 
http://gerrit.cloudera.org:8080/12113 )

Change subject: IMPALA-6533: Add min-max filter for decimal types on kudu 
tables.
......................................................................

IMPALA-6533: Add min-max filter for decimal types on kudu tables.

The code mimics the code written for other min-max filters.  Decimal data
can be stored using 4 bytes, 8 bytes and 16 bytes.  The code respectively
handles these 3 storage configurations.  The column definition states the
precision and the precision determines the storage size.

The minimum and maximum values are stored in a union.  The precision from
the column will come in as an input.  Based on the precision the size will be
found, and depending on the size appropriate variable will be used.

The code in min-max-filter* follows the general convention of the file, hence
uses macros.

The test includes 24 decimal columns (as listed below) with the following joins:
1.  Inner Join with broadcast (2 tables)
  1a. 1 predicate
  1b. 4 predicates - all results in decimal min-max filter
  1c. 4 predicates - 3 results in decimal min=max filter; 1 doesn't
2.  Inner Join with Shuffle (3 tables)
3.  Right outer join (2 tables)
4.  Left Semi join (2 tables)
5.  Right Semi join (2 tables)

Decimal Columns:
4bytes:
(5,0), (5,1), (5,3), (5,5)
(9,0), (9,1), (9,5), (9,9)
8 bytes:
(14,0), (14,1), (14,7), (14,14)
(18,0), (18,1), (18,9), (18,18)
16 bytes:
(28,0), (28,1), (28,14), (28,28)
(38,0), (38,1), (38,19), (38,38)

The test aggregates the count of probe rows.  This shows that the min-max filter
is exercised, because the number of probe rows is less than the total number
of rows in the probe side table.  The count of probe rows is considered to be
deterministic.  But, it will be beneficial to look out for changes in Kudu that 
can
change the way data is partitioned.  Such a change could change the probe row 
count
and in that case, the test will have to be updated.

impala_test_suite.py and test_result_verifier.py are enhanced to support saving
of aggregation using update_results.

Change-Id: Ib7e7278e902160d7060f8097290bc172d9031f94
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/runtime/coordinator.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M bin/rat_exclude_files.txt
M common/thrift/Data.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M testdata/data/README
A testdata/data/decimal_rtf_tbl.txt
A testdata/data/decimal_rtf_tiny_tbl.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-query/queries/QueryTest/decimal_min_max_filters.test
M testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test
M tests/common/impala_test_suite.py
M tests/common/test_result_verifier.py
M tests/query_test/test_runtime_filters.py
21 files changed, 7,282 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/12113/14
--
To view, visit http://gerrit.cloudera.org:8080/12113
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib7e7278e902160d7060f8097290bc172d9031f94
Gerrit-Change-Number: 12113
Gerrit-PatchSet: 14
Gerrit-Owner: Janaki Lahorani <jan...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Janaki Lahorani <jan...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <thomasmarsh...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>

Reply via email to