[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42430: [SPARK-44761][CONNECT] Support DataStreamWriter.foreachBatch(VoidFunction2)

via GitHub Thu, 10 Aug 2023 18:28:17 -0700


HyukjinKwon commented on code in PR #42430:
URL: https://github.com/apache/spark/pull/42430#discussion_r1290817310



##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala:
##########
@@ -247,6 +247,24 @@ final class DataStreamWriter[T] private[sql] (ds: 
Dataset[T]) extends Logging {
     this
   }
 
+  /**
+   * :: Experimental ::
+   *
+   * (Java-specific) Sets the output of the streaming query to be processed 
using the provided
+   * function. This is supported only in the micro-batch execution modes (that 
is, when the
+   * trigger is not continuous). In every micro-batch, the provided function 
will be called in
+   * every micro-batch with (i) the output rows as a Dataset and (ii) the 
batch identifier. The
+   * batchId can be used to deduplicate and transactionally write the output 
(that is, the
+   * provided Dataset) to external systems. The output Dataset is guaranteed 
to be exactly the
+   * same for the same batchId (assuming all operations are deterministic in 
the query).
+   *
+   * @since 2.5.0

Review Comment:
   ```suggestion
      * @since 3.5.0
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42430: [SPARK-44761][CONNECT] Support DataStreamWriter.foreachBatch(VoidFunction2)

Reply via email to