rdblue commented on issue #25115: [SPARK-28351][SQL] Support DELETE in DataSource V2 URL: https://github.com/apache/spark/pull/25115#issuecomment-512569938 @xianyinxin, I think we should consider what kind of delete support you're proposing to add, and whether we need to add a new builder pattern. I don't think that we need one for `DELETE FROM`. Above, you commented: > for simple case like DELETE by filters in this pr, just pass the filter to datasource is more suitable, a 'spark job' is not needed. I think we may need a builder for more complex row-level deletes, but if the intent here is to pass filters to a data source and delete if those filters are supported, then we can add a more direct trait to the table, `SupportsDelete`. I have an open PR that takes this approach: https://github.com/apache/spark/pull/21308. Alternatively, we could support deletes using [`SupportsOverwrite`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/sources/v2/writer/SupportsOverwrite.java#L39), which allows passing delete filters. An overwrite with no appended data is the same as a delete. The drawback to this is that the source would use `SupportsOverwrite` but may only support delete. We could handle this by using separate [table capabilities](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/sources/v2/TableCapability.java). If either of those approaches would work, then we don't need to add a new builder or make decisions that would affect the future design of `MERGE INTO` or `UPSERT`. For row-level operations like those, we need to have a clear design doc. But **if the need here is to be able to pass a set of delete filters, then that is a much smaller change and we can move forward with a simple trait**. What do you think? Would you like to discuss this in the next DSv2 sync in a week? I can add this to the topics.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
