cloud-fan commented on PR #37074: URL: https://github.com/apache/spark/pull/37074#issuecomment-1181935798
After more thoughts, I think we should treat correlated subquery as a join in optimizer rules. So in this case, once we remove the `Project`, the plan becomes invalid, because the subquery's outer reference, which will be in the join condition, becomes ambiguous. I think your inital approach is the right direction. But let's make it more precise. We should only keep the `Project`, if: 1. the filter condition contains correlated subqueries 2. the subquery's outer references exists in both join sides if we remove the project. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
