HeartSaVioR edited a comment on pull request #30521: URL: https://github.com/apache/spark/pull/30521#issuecomment-737505823
> The complete mode doesn't require truncate + insert. It's just telling the sink to overwrite the table entirely, and overwrite doesn't have to be implemented as truncate + insert. truncate is defined as overwrite with where condition is literally true. We are talking about the same, and my point is that the availability is checked by Spark. If that's not a big deal, OK. > IMO, table can be viewed as an alias of a path. That is limited to the file based tables - with DSv2 you should be able to match anything feasible with table. I see some efforts were done to support JDBC specific catalog even in community, and there was a talk about applying DSv2 with Cassandra. We can even create a Kafka specific catalog, which I considered a bit but stuck about schema as we wouldn't want to continue providing just key and value in binary form even for Kafka table. For me, `table is an alias of a path` isn't correct, at least for DSv2. > I doubt we can support v2 table perfectly in the existing DataStreamWriter. It's likely we would need to add DataStreamWriterV2 similar to DataFrameWriterV2. That has been the main concern. The `saveAsTable` API was initially proposed to be added to DataStreamWriterV2, but I got rejected as it doesn't have enough worth, hence it has added to DataStreamWriter "unlike" my initial intention. It should have been cleared if we just follow the same path on DataStreamWriterV2 as we did for DataFrameWriterV2. DataFrameWriterV2 should be able to deal with v1 table hence this won't be a problem for streaming case as well, and that enables us to "focus" on the "table" with v2 table full support as requirement. @cloud-fan Can we please consider this again? > I prefer to focus on making v1 table work since the file format in streaming doesn't support DSv2 IMO supporting DSv2 for file formats is what we need to spend efforts to fix ASAP. If DSv2 lacks something so cannot be done, we should see what is missing and also fix. DSv1 streaming API isn't officially supported - it's behind the private package. That said we are not dogfooding with the DSv2 which is the only the official way to implement data source from the ecosystem. With documenting that DataStreamWriter doesn't fully support DSv2 and "promising" DataStreamWriterV2 (TODO comment in codebase, JIRA issue, etc.) I'm OK with tolerate it as of now. As I said, DataFrameWriter already is. Just there's an alternative for the batch query whereas there's no alternative for now for the streaming query. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
