HeartSaVioR commented on a change in pull request #29767:
URL: https://github.com/apache/spark/pull/29767#discussion_r500188785
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala
##########
@@ -457,6 +470,17 @@ final class DataStreamWriter[T] private[sql](ds:
Dataset[T]) {
foreachBatch((batchDs: Dataset[T], batchId: Long) =>
function.call(batchDs, batchId))
}
+ /**
+ * Specifies the underlying output table.
+ *
+ * @since 3.1.0
+ */
+ def table(tableName: String): DataStreamWriter[T] = {
Review comment:
I have a bit different view on DataStreamWriter (and probably
DataFrameWriter as well):
While we don't restrict the order, actually I think it's pretty much natural
to have a flow, like `define a sink` -> `set options to the sink` -> `set
options to the streaming query` -> `start the query`. (A couple of parts can be
consolidated or the sequence can be swapped.)
```
df.writeStream
.format("...")
.option("...")
.outputMode(...)
.trigger(...)
.start()
```
Now it looks to be simply arbitrary and something got mixed up.
`checkpointLocation` isn't something being tied to the sink but we let end
users to put into `option` which is also used for sink. `queryName` as well.
I intended the addition of `table` method as `defining a sink`, but if we'd
like to care for tables specially, `DataFrameWriter.insertInto` would match the
intention and I can change the method name to `insertInto` here as well. WDYT?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]