zsxwing commented on pull request #30521: URL: https://github.com/apache/spark/pull/30521#issuecomment-737488404
> but at least complete mode DSv2 enforces truncate. The `complete` mode doesn't require `truncate + insert`. It's just telling the sink to overwrite the table entirely, and `overwrite` doesn't have to be implemented as `truncate + insert`. I haven't tracked the work of DSv2 for a while, please correct me if I miss any new work in DSv2. > We strongly recommend (known to be a best practice if I'm not mistaken) to create a Kafka topic in prior to run the streaming query as in many cases creating topic by default configuration is tend to be not sufficient (most probably num of partitions), and same here I haven't heard complaints about this. Fair enough? I don't remember this recommendation. Did I miss any document? Kafka's `auto.create.topics.enable` is [true](https://github.com/apache/kafka/blob/2.6.0/core/src/main/scala/kafka/server/KafkaConfig.scala#L130) by default by the way. > I don't think both needs to be consistent, otherwise we should just remove append mode in batch query on saveAsTable. If we think about consistency with batch path, it should be just possible to not create table even if the table doesn't exist. I was talking about the streaming case. IMO, table can be viewed as an alias of a path. I think making the streaming write for both table and path consistent makes more sense. > In overall, I don't see any deep consideration about v2 table here, whereas my initial rationalization of adding the API was to enable support v2 table. Can we please stop thinking only on v1 table and ensure we also cover v2 table? I doubt we can support v2 table perfectly in the existing DataStreamWriter. It's likely we would need to add DataStreamWriterV2 similar to DataFrameWriterV2. I prefer to focus on making v1 table work since the file format in streaming doesn't support DSv2, and the file format is the most common case for tables. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
