[GitHub] [spark] zsxwing commented on pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API

GitBox Wed, 02 Dec 2020 12:52:50 -0800


zsxwing commented on pull request #30521:
URL: https://github.com/apache/spark/pull/30521#issuecomment-737488404

> but at least complete mode DSv2 enforces truncate.

The `complete` mode doesn't require `truncate + insert`. It's just telling
the sink to overwrite the table entirely, and `overwrite` doesn't have to be
implemented as `truncate + insert`. I haven't tracked the work of DSv2 for a
while, please correct me if I miss any new work in DSv2.

> We strongly recommend (known to be a best practice if I'm not mistaken) to
create a Kafka topic in prior to run the streaming query as in many cases
creating topic by default configuration is tend to be not sufficient (most
probably num of partitions), and same here I haven't heard complaints about
this. Fair enough?

I don't remember this recommendation. Did I miss any document? Kafka's
`auto.create.topics.enable` is
[true](https://github.com/apache/kafka/blob/2.6.0/core/src/main/scala/kafka/server/KafkaConfig.scala#L130)
by default by the way.

> I don't think both needs to be consistent, otherwise we should just remove
append mode in batch query on saveAsTable. If we think about consistency with
batch path, it should be just possible to not create table even if the table
doesn't exist.

I was talking about the streaming case. IMO, table can be viewed as an alias
of a path. I think making the streaming write for both table and path
consistent makes more sense.

> In overall, I don't see any deep consideration about v2 table here,
whereas my initial rationalization of adding the API was to enable support v2
table. Can we please stop thinking only on v1 table and ensure we also cover v2
table?

I doubt we can support v2 table perfectly in the existing DataStreamWriter.
It's likely we would need to add DataStreamWriterV2 similar to
DataFrameWriterV2. I prefer to focus on making v1 table work since the file
format in streaming doesn't support DSv2, and the file format is the most
common case for tables.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zsxwing commented on pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API

Reply via email to