Github user tdas commented on a diff in the pull request:
https://github.com/apache/spark/pull/21571#discussion_r195882373
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala
---
@@ -322,6 +338,45 @@ final class DataStreamWriter[T] private[sql](ds:
Dataset[T]) {
this
}
+ /**
+ * :: Experimental ::
+ *
+ * (Scala-specific) Sets the output of the streaming query to be
processed using the provided
+ * function. This is supported only the in the micro-batch execution
modes (that is, when the
+ * trigger is not continuous). In every micro-batch, the provided
function will be called in
+ * every micro-batch with (i) the output rows as a Dataset and (ii) the
batch identifier.
+ * The batchId can be used deduplicate and transactionally write the
output
+ * (that is, the provided Dataset) to external systems. The output
Dataset is guaranteed
+ * to exactly same for the same batchId (assuming all operations are
deterministic in the query).
+ *
+ * @since 2.4.0
+ */
+ @InterfaceStability.Evolving
+ def foreachBatch(function: (Dataset[T], Long) => Unit):
DataStreamWriter[T] = {
--- End diff --
goood point.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]