cloud-fan commented on pull request #30555: URL: https://github.com/apache/spark/pull/30555#issuecomment-764811456
UPDATE/DELETE/MERGE are just logical plans in Spark, we need third-party libraries or vendors to provide proper implementations. So it's not about fully support this feature in Spark (as Spark can't), but about what Spark can do to make it easier for others to support this feature. Before this PR, the UPDATE/DELETE/MERGE implementation (physical plans) is fully responsible to handle correlated subqueries, as correlated subqueries inside UPDATE/DELETE/MERGE are not decorrelated. As an example, in physical plan's `doExecute` method, people can put UPDATE/DELETE/MERGE conditions in filter and build a DataFrame to evaluate the condition and collect the result. After this PR, correlated subqueries inside UPDATE/DELETE/MERGE are half-decorrelated. I don't know how UPDATE/DELETE/MERGE implementation can handle it. At least our internal UPDATE/DELETE/MERGE implementation is broken after this commit. If you guys have a good idea about how to handle half-decorrelated correlated subqueries, let's document it so that others can follow. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
