Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18731 )

Change subject: IMPALA-11424: Support pushdown non-equi join predicate from 
OUTER/INNER JOIN to SCANNODE
......................................................................


Patch Set 6:

(23 comments)

Thanks for your contribution, Baike! This is an important improvement. I still 
need some time to finish my first round of review. Left some comments first.

http://gerrit.cloudera.org:8080/#/c/18731/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18731/6//COMMIT_MSG@9
PS6, Line 9: In order to reduce the amount of data read and transmitted, the 
non-equivalent condition of Join can be pushed to SCAN_NODE.
nit: each line of the commit message should have 72 or fewer characters. The 
commit title is ok.


http://gerrit.cloudera.org:8080/#/c/18731/6//COMMIT_MSG@16
PS6, Line 16:
Please introduce the new query option, ENABLE_NONE_EQUAL_PREDICATE_PUSH_DOWN, 
in the commit message.


http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@142
PS6, Line 142:         pushdownNonEquiConjunct(analyzer);
Should we move these before computeStats() at line 136 to have better 
cardinality?


http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@463
PS6, Line 463:
nit: 4 spaces indent


http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@507
PS6, Line 507:
nit: 4 spaces indent


http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@580
PS6, Line 580:
nit: 4 spaces indent


http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@592
PS6, Line 592:       if (!(predicate.getChild(0) instanceof LiteralExpr
             :               && !(predicate.getChild(0) instanceof NullLiteral))
             :             && !(predicate.getChild(1) instanceof LiteralExpr
             :               && !(predicate.getChild(1) instanceof 
NullLiteral))) {
This is not that readable. We can simplify it to

      if (!Expr.IS_NON_NULL_LITERAL.apply(predicate.getChild(0))
          && !Expr.IS_NON_NULL_LITERAL.apply(predicate.getChild(1))) {
        continue;
      }


http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@631
PS6, Line 631: groupOtherJoinConjunctsAccordingToSlotRef
This function is similar to the above one. Can we refactor them into one?


http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@632
PS6, Line 632:
nit: 4 spaces indent


http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@675
PS6, Line 675:
nit: 4 spaces indent


http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@731
PS6, Line 731:
nit: 4 spaces indent


http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@746
PS6, Line 746:           if (child instanceof LiteralExpr) {
             :             LiteralExpr currentValue = (LiteralExpr) child;
             :             if (minValue == null) {
             :               minValue = currentValue;
             :             } else {
             :               if (minValue.compareTo(currentValue) > 0) {
             :                 minValue = currentValue;
             :               }
             :             }
             :           }
This is a common patten of the code. We can extract this into a method to 
deduplicate some codes.

EDIT: the code structure in getMaxLiteralFromPredicates() looks better. It'd be 
nice if we can refactor them into one method.


http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@775
PS6, Line 775: < 0
Shouldn't this be "> 0" ?


http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@784
PS6, Line 784: i = 1
Could you explain why we don't need "i = 0" ?


http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@785
PS6, Line 785: !(predicate.getChild(i) instanceof LiteralExpr)
             :                   || (predicate.getChild(i) instanceof 
NullLiteral)
This can be simplified to

 !Expr.IS_NON_NULL_LITERAL.apply(predicate.getChild(i))


http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@796
PS6, Line 796: minValue = literalValues.get(0);
Shouldn't we update 'minValue' instead of replacing it? There could be more 
than one IN-list, e.g. "x in [1, 2, 3] and x in [0, 1]". It seems that we 
should get the intersect of them first. "x >= 1" is a better result. I think 
the current code gets "x >= 0".


http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@806
PS6, Line 806:
nit: 4 spaces indent


http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@2024
PS6, Line 2024:
nit: 4 spaces indent here


http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@2031
PS6, Line 2031:
nit: 4 spaces indent here


http://gerrit.cloudera.org:8080/#/c/18731/6/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@2034
PS6, Line 2034:       // conjuncts to produce correct results.
Could you update the comment for the new case? It's not quite clear to me why 
we need the above change.


http://gerrit.cloudera.org:8080/#/c/18731/6/testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test:

http://gerrit.cloudera.org:8080/#/c/18731/6/testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test@63
PS6, Line 63:
nit: trailing space


http://gerrit.cloudera.org:8080/#/c/18731/6/testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test@265
PS6, Line 265:
nit: trailing space


http://gerrit.cloudera.org:8080/#/c/18731/6/tests/query_test/test_none_equi_predicate_pushdown.py
File tests/query_test/test_none_equi_predicate_pushdown.py:

http://gerrit.cloudera.org:8080/#/c/18731/6/tests/query_test/test_none_equi_predicate_pushdown.py@25
PS6, Line 25:
nit: we use 2 spaces indention even in Python codes.



--
To view, visit http://gerrit.cloudera.org:8080/18731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263
Gerrit-Change-Number: 18731
Gerrit-PatchSet: 6
Gerrit-Owner: Baike Xia <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Fri, 19 Aug 2022 08:46:16 +0000
Gerrit-HasComments: Yes

Reply via email to