Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18731 )
Change subject: IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE ...................................................................... Patch Set 6: (23 comments) Thanks for your contribution, Baike! This is an important improvement. I still need some time to finish my first round of review. Left some comments first. http://gerrit.cloudera.org:8080/#/c/18731/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18731/6//COMMIT_MSG@9 PS6, Line 9: In order to reduce the amount of data read and transmitted, the non-equivalent condition of Join can be pushed to SCAN_NODE. nit: each line of the commit message should have 72 or fewer characters. The commit title is ok. http://gerrit.cloudera.org:8080/#/c/18731/6//COMMIT_MSG@16 PS6, Line 16: Please introduce the new query option, ENABLE_NONE_EQUAL_PREDICATE_PUSH_DOWN, in the commit message. http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@142 PS6, Line 142: pushdownNonEquiConjunct(analyzer); Should we move these before computeStats() at line 136 to have better cardinality? http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@463 PS6, Line 463: nit: 4 spaces indent http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@507 PS6, Line 507: nit: 4 spaces indent http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@580 PS6, Line 580: nit: 4 spaces indent http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@592 PS6, Line 592: if (!(predicate.getChild(0) instanceof LiteralExpr : && !(predicate.getChild(0) instanceof NullLiteral)) : && !(predicate.getChild(1) instanceof LiteralExpr : && !(predicate.getChild(1) instanceof NullLiteral))) { This is not that readable. We can simplify it to if (!Expr.IS_NON_NULL_LITERAL.apply(predicate.getChild(0)) && !Expr.IS_NON_NULL_LITERAL.apply(predicate.getChild(1))) { continue; } http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@631 PS6, Line 631: groupOtherJoinConjunctsAccordingToSlotRef This function is similar to the above one. Can we refactor them into one? http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@632 PS6, Line 632: nit: 4 spaces indent http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@675 PS6, Line 675: nit: 4 spaces indent http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@731 PS6, Line 731: nit: 4 spaces indent http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@746 PS6, Line 746: if (child instanceof LiteralExpr) { : LiteralExpr currentValue = (LiteralExpr) child; : if (minValue == null) { : minValue = currentValue; : } else { : if (minValue.compareTo(currentValue) > 0) { : minValue = currentValue; : } : } : } This is a common patten of the code. We can extract this into a method to deduplicate some codes. EDIT: the code structure in getMaxLiteralFromPredicates() looks better. It'd be nice if we can refactor them into one method. http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@775 PS6, Line 775: < 0 Shouldn't this be "> 0" ? http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@784 PS6, Line 784: i = 1 Could you explain why we don't need "i = 0" ? http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@785 PS6, Line 785: !(predicate.getChild(i) instanceof LiteralExpr) : || (predicate.getChild(i) instanceof NullLiteral) This can be simplified to !Expr.IS_NON_NULL_LITERAL.apply(predicate.getChild(i)) http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@796 PS6, Line 796: minValue = literalValues.get(0); Shouldn't we update 'minValue' instead of replacing it? There could be more than one IN-list, e.g. "x in [1, 2, 3] and x in [0, 1]". It seems that we should get the intersect of them first. "x >= 1" is a better result. I think the current code gets "x >= 0". http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@806 PS6, Line 806: nit: 4 spaces indent http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@2024 PS6, Line 2024: nit: 4 spaces indent here http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@2031 PS6, Line 2031: nit: 4 spaces indent here http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@2034 PS6, Line 2034: // conjuncts to produce correct results. Could you update the comment for the new case? It's not quite clear to me why we need the above change. http://gerrit.cloudera.org:8080/#/c/18731/6/testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test File testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test: http://gerrit.cloudera.org:8080/#/c/18731/6/testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test@63 PS6, Line 63: nit: trailing space http://gerrit.cloudera.org:8080/#/c/18731/6/testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test@265 PS6, Line 265: nit: trailing space http://gerrit.cloudera.org:8080/#/c/18731/6/tests/query_test/test_none_equi_predicate_pushdown.py File tests/query_test/test_none_equi_predicate_pushdown.py: http://gerrit.cloudera.org:8080/#/c/18731/6/tests/query_test/test_none_equi_predicate_pushdown.py@25 PS6, Line 25: nit: we use 2 spaces indention even in Python codes. -- To view, visit http://gerrit.cloudera.org:8080/18731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263 Gerrit-Change-Number: 18731 Gerrit-PatchSet: 6 Gerrit-Owner: Baike Xia <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Fri, 19 Aug 2022 08:46:16 +0000 Gerrit-HasComments: Yes
