xianyinxin commented on issue #25115: [SPARK-28351][SQL] Support DELETE in 
DataSource V2
URL: https://github.com/apache/spark/pull/25115#issuecomment-510722944
 
 
   > I have no idea what is the meaning of "maintenance" here. Could you 
elaborate a bit? UPDATE and DELETE are just DMLs.
   > (UPSERT would be needed for streaming query to restore UPDATE mode in 
Structured Streaming, so we may add it eventually, then for me it's unclear 
where we can add SupportUpsert, directly, or under maintenance.)
   > 
   > Sorry for the dumb question if it's just obvious one for others as well.
   
   Thank you for the comments @HeartSaVioR . Maybe maintenance is not a good 
word here. Why I propose to introduce a maintenance interface is that it's hard 
to embed the UPDATE/DELETE, or UPSERTS or MERGE to the current `SupportsWrite` 
framework, because `SupportsWrite` considered insert/overwrite/append data 
which backed up by the spark RDD distributed execution framework, i.e., by 
submitting a spark job. The pattern is fix, explicit, and suitable for 
insert/overwrite/append data. However, UPDATE/DELETE or UPSERTS/MERGE are 
different:
   1. for simple case like DELETE by filters in this pr, just pass the filter 
to datasource is more suitable, a 'spark job' is not needed. 
   2. for complicated case like UPSERTS or MERGE, one 'spark job' is not 
enough. This kind of work need to be splited to multi steps, and ensure the 
atomic of the whole logic goes out of the ability of current commit protocol 
for insert/overwrite/append data.
   As for why implement DELETE/UPDATE, not just UPSERTS, the reason is we want 
to introduce kudu as a datasource. Thus spark SQL can deal with the business 
which may update/delete data with high frequency. Some other datasource like 
delta also supports DELETE/UPDATE, we need a SQL entrance for that. 
   I'm not sure if i answered your question @HeartSaVioR 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to