Impala Public Jenkins has submitted this change and it was merged. (
http://gerrit.cloudera.org:8080/15462 )
Change subject: IMPALA-9183: Convert disjunctive predicates to conjunctive
normal form
......................................................................
IMPALA-9183: Convert disjunctive predicates to conjunctive normal form
Added an expression rewrite rule to convert a disjunctive predicate to
conjunctive normal form (CNF). Converting to CNF enables multi-table
predicates that were only evaluated by a Join operator to be converted
into either single-table conjuncts that are eligible for predicate pushdown
to the scan operator or other multi-table conjuncts that are eligible to
be pushed to a Join below. This helps improve performance for such queries.
Since converting to CNF expands the number of expressions, we place a
limit on the maximum number of CNF exprs (each AND is counted as 1 CNF expr)
that are considered. Once the MAX_CNF_EXPRS limit (default is unlimited) is
exceeded, whatever expression was supplied to the rule is returned without
further transformation. A setting of -1 or 0 allows unlimited number of
CNF exprs to be created upto int32 max. Another option ENABLE_CNF_REWRITES
enables or disables the entire rewrite. This is False by default until we
have done more thorough functional testing (tracking JIRA IMPALA-9539).
Examples of rewrites:
original: (a AND b) OR c
rewritten: (a OR c) AND (b OR c)
original: (a AND b) OR (c AND d)
rewritten: (a OR c) AND (a OR d) AND (b OR c) AND (b OR d)
original: NOT(a OR b)
rewritten: NOT(a) AND NOT(b)
Testing:
- Added new unit tests with variations of disjunctive predicates
and verified their Explain plans
- Manually tested the result correctness on impala shell by running
these queries with ENABLE_CNF_REWRITES enabled and disabled
- Added TPC-H q7, q19 and TPC-DS q13 with the CNF rewrite enabled
- Preliminary performance testing of TPC-DS q13 on a 10TB scale factor
shows almost 5x improvement:
Original baseline: 47.5 sec
With this patch and CNF rewrite enabled: 9.4 sec
Change-Id: I5a03cd7239333aaf375416ef5f2b7608fcd4a072
Reviewed-on: http://gerrit.cloudera.org:8080/15462
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
A fe/src/main/java/org/apache/impala/rewrite/ConvertToCNFRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
12 files changed, 831 insertions(+), 2 deletions(-)
Approvals:
Impala Public Jenkins: Looks good to me, approved; Verified
--
To view, visit http://gerrit.cloudera.org:8080/15462
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I5a03cd7239333aaf375416ef5f2b7608fcd4a072
Gerrit-Change-Number: 15462
Gerrit-PatchSet: 10
Gerrit-Owner: Aman Sinha <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>