cloud-fan commented on issue #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2 URL: https://github.com/apache/spark/pull/25626#issuecomment-529913436 > we would add an implementation that reads the rows that might match the where query, finds all the rows that actually match, updates those rows, and saves the changed rows back to the data source. This applies to DELETE as well. Spark should be responsible for finding the matched rows, and tell data source which rows need to be deleted/updated. Think about `DELETE FROM t1 WHERE t1.col IN (SELECT col FROM t2)`. It's not a metadata-only operation as Spark needs to scan `t2` to find the matched rows. We need both APIs: one for simple DELETE/UPDATE which is metadata-only. one for general DELETE/UPDATE which needs to notify the data source about deleted/updated rows. I don't have a strong preference on which version should be done first. But since this PR is already here, I'm OK to have the simpler version first.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
