Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16915 )
Change subject: IMPALA-9356: fix the predicate with empty tuple id invalid pushed down ...................................................................... Patch Set 2: (7 comments) Thanks for the patch ! I have been meaning to review it sooner. I left some comments about the plan correctness in the JIRA. It might be easier to discuss it there. http://gerrit.cloudera.org:8080/#/c/16915/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16915/2//COMMIT_MSG@7 PS2, Line 7: IMPALA-9356: fix the predicate with empty tuple id invalid pushed down The 'empty tuple id' is an internal detail of the implementation. Since this issue can occur for any non-deterministic predicate such as rand(), can we call it 'Fix the predicate pushdown for non-deterministic predicates for outer joins' http://gerrit.cloudera.org:8080/#/c/16915/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/16915/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2655 PS2, Line 2655: unssigned typo: unssigned-> unassigned http://gerrit.cloudera.org:8080/#/c/16915/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2655 PS2, Line 2655: can reassigned nit: 'can be' http://gerrit.cloudera.org:8080/#/c/16915/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2655 PS2, Line 2655: node nit: 'nodes' http://gerrit.cloudera.org:8080/#/c/16915/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/16915/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1767 PS2, Line 1767: if (analyzer.isOuterJoined(tid)) { Could you move this check into a separate Analyzer method ? http://gerrit.cloudera.org:8080/#/c/16915/2/testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test File testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test: http://gerrit.cloudera.org:8080/#/c/16915/2/testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test@1608 PS2, Line 1608: | predicates: rand() = 2 I think for non-deterministic functions such as rand(), it would be wrong to push to both the scans in case of the Full OJ. I think it should only be evaluated by the Hash Join above. I left some comments in the JIRA instead of re-typing it here. http://gerrit.cloudera.org:8080/#/c/16915/2/testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test@1744 PS2, Line 1744: | predicates: rand() = 2 For ROJ or even for LOJ, for reasons mentioned above, it would be wrong to push to both sides of the join. Either we push only to 1 side (the non-null producing side) or keep it at the outer join node as other predicate. Specifically for this query, since ROJ will produce non-null rows from the right child, we should only push to the right child which is the 00:SCAN HDFS node. -- To view, visit http://gerrit.cloudera.org:8080/16915 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I50ae71ef31ff138adeeb57008a01eb02e7bb976f Gerrit-Change-Number: 16915 Gerrit-PatchSet: 2 Gerrit-Owner: Xianqing He <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Sat, 20 Feb 2021 23:51:30 +0000 Gerrit-HasComments: Yes
