[GitHub] [spark] shardulm94 commented on pull request #38676: [SPARK-41162][SQL] Do not push down anti-join predicates that become ambiguous

GitBox Fri, 16 Dec 2022 20:34:32 -0800


shardulm94 commented on PR #38676:
URL: https://github.com/apache/spark/pull/38676#issuecomment-1356030772


   I tried looking into this a bit
   > @EnricoMi @cloud-fan Could we fix the DeduplicateRelations? It did not 
generate different expression IDs for all conflicting attributes:
   
   As @EnricoMi said `DeduplicateRelations` only considers the output attrs of 
the left and right, which do not conflict here. Also the `Project` case in 
`PushDownLeftSemiAntiJoin` calls [this 
method](https://github.com/apache/spark/blob/45d9daa2ecf6081ef1d031065a9c0e9a3a7f7a58/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PushDownLeftSemiAntiJoin.scala#L119)
 which seems to check for self-join case based on conflicting expression ids. 
This makes me believe the duplicate expression IDs are expected here and hence 
DeduplicateRelations may not be at fault.
   
   Similar to the `Project` case, should we add a check like 
`canPushThroughCondition(Seq(agg.child), joinCond, rightOp)` to ensure that it 
is safe to push the join down an `Aggregate` node too?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] shardulm94 commented on pull request #38676: [SPARK-41162][SQL] Do not push down anti-join predicates that become ambiguous

Reply via email to