HeartSaVioR commented on a change in pull request #30521:
URL: https://github.com/apache/spark/pull/30521#discussion_r532267715



##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala
##########
@@ -304,46 +308,68 @@ final class DataStreamWriter[T] private[sql](ds: 
Dataset[T]) {
    * @since 3.1.0
    */
   @throws[TimeoutException]
-  def saveAsTable(tableName: String): StreamingQuery = {
-    this.source = SOURCE_NAME_TABLE
+  def table(tableName: String): StreamingQuery = {

Review comment:
       Even in DataStreamReader, `table` works differently with others. 
There're json, orc, parquet, etc which trigger load directly, but they're just 
shorthand forms of `format(...).load()`. For the lifecycle of DataStreamReader, 
the original form of usage is `format(...).options(...).load()`, and `table` 
becomes a thing which cannot do with `format(...).load()` (It's not even a 
specialized form of format. It just doesn't follow the path of 
`format(...).load()`.)
   
   I don't think we should make consistent with `DataStreamReader.table`. I 
think we should also fix `DataStreamReader.table` as well to make consistent 
with others. After then, `DataStreamWriter.table` should be special form of 
`format(...)`, like we did for `DataStreamWriter.foreach`, as I initially 
proposed in #29767 .
   (`saveAsTable` was just an output of compromise, and I regret I shouldn't 
compromise for the public API.)
   
   Both DataStreamReader and DataStreamWriter, the original form of usage has 
been `format(...).options(...).load()` and `format(...).options(...).start()` 
for years and it's not ideal to make branches. It doesn't look consistent to 
have a new form of `readStream.option(...).option(...).table(...)` and 
`writeStream.option(...).option(...).saveAsTable(...)`. In former, it may look 
less odd in practice, as in many case table can be read without specifying 
options. (applies to other shorthands as well) But in latter, we will most 
likely always want to specify options (as SS itself requires options like 
checkpoint) which ends up always have such form.
   
   That warrants new path of API, like we did for DataFrameWriterV2. If we are 
strongly insist to not add them to the part of new `format`, let's revert both 
and discuss the design again in dev@ mailing list.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to