Hello Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/17387 to look at the new patch set (#2). Change subject: IMPALA-10681: Improve join cardinality estimates ...................................................................... IMPALA-10681: Improve join cardinality estimates During cardinality estimation for inner joins, if the join conjunct involves a scan slot on left side and a function (e.g MAX) on the right, currently we determine that the NDV stats of either side is not useful and return the left side's cardinality even though it may be a significant over-estimate. In this patch, we handle join conjuncts of such types by keeping them in an 'other' eligible conjuncts list as long as the NDV for expressions on both sides of the join can be reasonably estimated and the input cardinality is also available. For example, if the conjunct is int_col = MAX(int_col) and the right input does not have a group-by, the right NDV = 1 and can be safely used. If it has a group-by and the group-by columns alread have associated NDV, we can can still know the combined NDV. Other such examples exist. An auxiliary struct is introduced to keep track of the ndv and row count. Once these 'other' eligible conjuncts are populated, we do the join cardinality estimation in a manner similar to the normal join conjuncts by fetching the stats from the auxiliary struct. Testing: - Added new planner tests for inner join cardinality - Modified expected plans for certains tests including TPC-DS queries and ran end-to-end TPC-DS queries - Since TPC-DS plans are complex, I did a check of the cardinality changes for some of the hash joins but not the changes in the shape of a plan (e.g whether the join order changed). TODO: We would want to run a performance test to validate the plan changes for TPC-DS at a sufficiently high scale factor. Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc --- M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M testdata/workloads/functional-planner/queries/PlannerTest/card-inner-join.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans-default.test M testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test M testdata/workloads/functional-planner/queries/PlannerTest/views.test 15 files changed, 3,675 insertions(+), 3,331 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/17387/2 -- To view, visit http://gerrit.cloudera.org:8080/17387 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc Gerrit-Change-Number: 17387 Gerrit-PatchSet: 2 Gerrit-Owner: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>