Aman Sinha has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/19682 )

Change subject: IMPALA-12006: Improve cardinality estimation for joins 
involving multiple conjuncts
......................................................................

IMPALA-12006: Improve cardinality estimation for joins involving multiple 
conjuncts

When an inner or outer join involves conjuncts such as the following:
 SELECT * FROM t1 inner join
  (SELECT a2, MAX(b2) as max_b2 FROM t2 GROUP BY a2) s2
  ON t1.a1 = s2.a2 AND t1.b1 = s2.max_b2
the stats for the second conjunct involving the MAX expression gets
added to the 'otherEqJoinStats' list. These stats were being used
only when no other equijoin conjunct (involving base columns) was
present but not for the above case, thus leading to over-estimation.

The main change in this patch is an improvement to the cardinality
estimate for such cases by considering the conjuncts in the
'otherEqJoinStats list' in combination with the equijoin conjuncts.

A second change is the addition of a new query option which is a
floating point number between 0 to 1 inclusive:
  join_selectivity_correlation_factor
It defaults to 0.0 which preserves the existing behavior of using
the Minimum selectivity of the conjuncts. Given multiple
join conjuncts C1,C2..Cn having comparable selectivity, we sometimes
see over-estimation by several orders of magnitude. Setting the above
query option to a value higher than 0 first computes the product of the
selectivities: sel(C1) * sel(C2) .. sel(Cn)  and then scales it by
dividing by the join_selectivity_correlation_factor.

Note that this setting applies to all the joins in the query so it has
it's trade offs. Another approach is to use per join hints but there
are limitations of the hints approach too.

Testing:
 - Added planner tests with a combination of outer and inner join with
   different values of join_selectivity_correlation_factor
 - Ran PlannerTest and TpcdsPlannerTest

Change-Id: I845d778a58404af834f7501fc8157a5a4b4bcc35
Reviewed-on: http://gerrit.cloudera.org:8080/19682
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Kurt Deschler <[email protected]>
Reviewed-by: Quanlong Huang <[email protected]>
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/card-outer-join.test
6 files changed, 297 insertions(+), 17 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Kurt Deschler: Looks good to me, but someone else must approve
  Quanlong Huang: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/19682
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I845d778a58404af834f7501fc8157a5a4b4bcc35
Gerrit-Change-Number: 19682
Gerrit-PatchSet: 9
Gerrit-Owner: Aman Sinha <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>

Reply via email to