Aman Sinha has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/19682


Change subject: IMPALA-12006: Improve cardinality estimation for joins 
involving multiple conjuncts
......................................................................

IMPALA-12006: Improve cardinality estimation for joins involving multiple 
conjuncts

When an inner or outer join involves conjuncts such as the following:
 SELECT * FROM t1 inner join
  (SELECT a2, MAX(b2) as max_b2 FROM t2 GROUP BY a2) s2
  ON t1.a1 = s2.a2 AND t1.b1 = s2.max_b2
the stats for the second conjunct involving the MAX expression gets
added to the 'otherEqJoinStats' list. These stats were being used
only when no other equijoin conjunct (involving base columns) was
present but not for the above case, thus leading to over-estimation.

The main change in this patch is an improvement to the cardinality
estimate for such cases by considering the conjuncts in the
'otherEqJoinStats list' in combination with the equijoin conjuncts.

A second change is the addition of a new boolean query option:
   use_multiplied_selectivity_for_join
It defaults to false. Given multiple independent join conjuncts
C1,C2..Cn having comparable selectivity, we sometimes see over-estimation
by several orders of magnitude. Setting the above query option to
TRUE allows the individual selectivities (fraction between 0 and 1)
of the conjuncts to be multiplied instead of taking the minimum. Note
that this applies to all the joins in the query. In the future we can
consider per join hints but in a complex query (or a view that is
shared among users), it is cumbersome to identify exactly where to
set hints.

Testing:
 - Added planner test with a combination of outer and inner join
  (with use_multiplied_selectivity_for_join enabled and disabled)
 - Ran PlannerTest and TpcdsPlannerTest
 TODO: run additional tests

Change-Id: I845d778a58404af834f7501fc8157a5a4b4bcc35
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/card-outer-join.test
6 files changed, 137 insertions(+), 11 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/19682/1
--
To view, visit http://gerrit.cloudera.org:8080/19682
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I845d778a58404af834f7501fc8157a5a4b4bcc35
Gerrit-Change-Number: 19682
Gerrit-PatchSet: 1
Gerrit-Owner: Aman Sinha <[email protected]>

Reply via email to