Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16915 )

Change subject: IMPALA-9356: fix the predicate with empty tuple id invalid 
pushed down
......................................................................


Patch Set 2:

(7 comments)

Thanks for the patch ! I have been meaning to review it sooner. I left some 
comments about the plan correctness in the JIRA. It might be easier to discuss 
it there.

http://gerrit.cloudera.org:8080/#/c/16915/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16915/2//COMMIT_MSG@7
PS2, Line 7: IMPALA-9356: fix the predicate with empty tuple id invalid pushed 
down
The 'empty tuple id' is an internal detail of the implementation.  Since this 
issue can occur for any non-deterministic predicate such as rand(),  can we 
call it 'Fix the predicate pushdown for non-deterministic predicates for outer 
joins'


http://gerrit.cloudera.org:8080/#/c/16915/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/16915/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2655
PS2, Line 2655: unssigned
typo: unssigned-> unassigned


http://gerrit.cloudera.org:8080/#/c/16915/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2655
PS2, Line 2655: can reassigned
nit: 'can be'


http://gerrit.cloudera.org:8080/#/c/16915/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2655
PS2, Line 2655: node
nit: 'nodes'


http://gerrit.cloudera.org:8080/#/c/16915/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/16915/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1767
PS2, Line 1767:     if (analyzer.isOuterJoined(tid)) {
Could you move this check into a separate Analyzer method ?


http://gerrit.cloudera.org:8080/#/c/16915/2/testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test:

http://gerrit.cloudera.org:8080/#/c/16915/2/testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test@1608
PS2, Line 1608: |     predicates: rand() = 2
I think for non-deterministic functions such as rand(), it would be wrong to 
push to both the scans in case of the Full OJ.  I think it should only be 
evaluated by the Hash Join above.  I left some comments in the JIRA instead of 
re-typing it here.


http://gerrit.cloudera.org:8080/#/c/16915/2/testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test@1744
PS2, Line 1744: |     predicates: rand() = 2
For ROJ or even for LOJ, for reasons mentioned above,  it would be wrong to 
push to both sides of the join.  Either we push only to 1 side (the non-null 
producing side) or keep it at the outer join node as other predicate.  
Specifically for this query, since ROJ will produce non-null rows from the 
right child, we should only push to the right child which is the 00:SCAN HDFS 
node.



--
To view, visit http://gerrit.cloudera.org:8080/16915
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50ae71ef31ff138adeeb57008a01eb02e7bb976f
Gerrit-Change-Number: 16915
Gerrit-PatchSet: 2
Gerrit-Owner: Xianqing He <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Sat, 20 Feb 2021 23:51:30 +0000
Gerrit-HasComments: Yes

Reply via email to