shardulm94 commented on PR #38676: URL: https://github.com/apache/spark/pull/38676#issuecomment-1356030772
I tried looking into this a bit > @EnricoMi @cloud-fan Could we fix the DeduplicateRelations? It did not generate different expression IDs for all conflicting attributes: As @EnricoMi said `DeduplicateRelations` only considers the output attrs of the left and right, which do not conflict here. Also the `Project` case in `PushDownLeftSemiAntiJoin` calls [this method](https://github.com/apache/spark/blob/45d9daa2ecf6081ef1d031065a9c0e9a3a7f7a58/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PushDownLeftSemiAntiJoin.scala#L119) which seems to check for self-join case based on conflicting expression ids. This makes me believe the duplicate expression IDs are expected here and hence DeduplicateRelations may not be at fault. Similar to the `Project` case, should we add a check like `canPushThroughCondition(Seq(agg.child), joinCond, rightOp)` to ensure that it is safe to push the join down an `Aggregate` node too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
