xianyinxin commented on issue #25115: [SPARK-28351][SQL] Support DELETE in DataSource V2 URL: https://github.com/apache/spark/pull/25115#issuecomment-510722944 > I have no idea what is the meaning of "maintenance" here. Could you elaborate a bit? UPDATE and DELETE are just DMLs. > (UPSERT would be needed for streaming query to restore UPDATE mode in Structured Streaming, so we may add it eventually, then for me it's unclear where we can add SupportUpsert, directly, or under maintenance.) > > Sorry for the dumb question if it's just obvious one for others as well. Thank you for the comments @HeartSaVioR . Maybe maintenance is not a good word here. Why I propose to introduce a maintenance interface is that it's hard to embed the UPDATE/DELETE, or UPSERTS or MERGE to the current `SupportsWrite` framework, because `SupportsWrite` considered insert/overwrite/append data which backed up by the spark RDD distributed execution framework, i.e., by submitting a spark job. The pattern is fix, explicit, and suitable for insert/overwrite/append data. However, UPDATE/DELETE or UPSERTS/MERGE are different: 1. for simple case like DELETE by filters in this pr, just pass the filter to datasource is more suitable, a 'spark job' is not needed. 2. for complicated case like UPSERTS or MERGE, one 'spark job' is not enough. This kind of work need to be splited to multi steps, and ensure the atomic of the whole logic goes out of the ability of current commit protocol for insert/overwrite/append data. As for why implement DELETE/UPDATE, not just UPSERTS, the reason is we want to introduce kudu as a datasource. Thus spark SQL can deal with the business which may update/delete data with high frequency. Some other datasource like delta also supports DELETE/UPDATE, we need a SQL entrance for that. I'm not sure if i answered your question @HeartSaVioR
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
