HeartSaVioR commented on a change in pull request #30521:
URL: https://github.com/apache/spark/pull/30521#discussion_r532267715
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala
##########
@@ -304,46 +308,68 @@ final class DataStreamWriter[T] private[sql](ds:
Dataset[T]) {
* @since 3.1.0
*/
@throws[TimeoutException]
- def saveAsTable(tableName: String): StreamingQuery = {
- this.source = SOURCE_NAME_TABLE
+ def table(tableName: String): StreamingQuery = {
Review comment:
Even in DataStreamReader, `table` works differently with others.
There're json, orc, parquet, etc which trigger load directly, but they're just
shorthand forms of `format(...).load()`. For the lifecycle of DataStreamReader,
the original form of usage is `format(...).options(...).load()`, and `table`
becomes a thing which cannot do with `format(...).load()` (It's not even a
specialized form of format. It just doesn't follow the path of
`format(...).load()`.)
I don't think we should make consistent with `DataStreamReader.table`. I
think we should also fix `DataStreamReader.table` as well to make consistent
with others. After then, `DataStreamWriter.table` should be special form of
`format(...)`, like we did for `DataStreamWriter.foreach`, as I initially
proposed in #29767 .
(`saveAsTable` was just an output of compromise, and I regret I shouldn't
compromise for the public API.)
Both DataStreamReader and DataStreamWriter, the original form of usage has
been `format(...).options(...).load()` and `format(...).options(...).start()`
for years and it's not ideal to make branches. It doesn't look consistent to
have a new form of `readStream.option(...).option(...).table(...)` and
`writeStream.option(...).option(...).saveAsTable(...)`. In former, it may look
less odd in practice, as in many case table can be read without specifying
options. (applies to other shorthands as well) But in latter, we will most
likely always want to specify options (as SS itself requires options like
checkpoint) which ends up always have such form.
That warrants new path of API, like we did for DataFrameWriterV2. If we are
strongly insist to not add them to the part of new `format`, let's revert both
and discuss the design again in dev@ mailing list.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]