rdblue commented on issue #25115: [SPARK-28351][SQL] Support DELETE in 
DataSource V2
URL: https://github.com/apache/spark/pull/25115#issuecomment-512569938
 
 
   @xianyinxin, I think we should consider what kind of delete support you're 
proposing to add, and whether we need to add a new builder pattern.
   
   I don't think that we need one for `DELETE FROM`. Above, you commented:
   
   > for simple case like DELETE by filters in this pr, just pass the filter to 
datasource is more suitable, a 'spark job' is not needed.
   
   I think we may need a builder for more complex row-level deletes, but if the 
intent here is to pass filters to a data source and delete if those filters are 
supported, then we can add a more direct trait to the table, `SupportsDelete`. 
I have an open PR that takes this approach: 
https://github.com/apache/spark/pull/21308.
   
   Alternatively, we could support deletes using 
[`SupportsOverwrite`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/sources/v2/writer/SupportsOverwrite.java#L39),
 which allows passing delete filters. An overwrite with no appended data is the 
same as a delete. The drawback to this is that the source would use 
`SupportsOverwrite` but may only support delete. We could handle this by using 
separate [table 
capabilities](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/sources/v2/TableCapability.java).
   
   If either of those approaches would work, then we don't need to add a new 
builder or make decisions that would affect the future design of `MERGE INTO` 
or `UPSERT`. For row-level operations like those, we need to have a clear 
design doc. But **if the need here is to be able to pass a set of delete 
filters, then that is a much smaller change and we can move forward with a 
simple trait**.
   
   What do you think? Would you like to discuss this in the next DSv2 sync in a 
week? I can add this to the topics.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to