Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/20631#discussion_r169438035
--- Diff: docs/structured-streaming-programming-guide.md ---
@@ -1979,6 +2006,172 @@ which has methods that get called whenever there is
a sequence of rows generated
- Whenever `open` is called, `close` will also be called (unless the JVM
exits due to some error). This is true even if `open` returns false. If there
is any error in processing and writing the data, `close` will be called with
the error. It is your responsibility to clean up state (e.g. connections,
transactions, etc.) that have been created in `open` such that there are no
resource leaks.
+#### Triggers
+The trigger settings of a streaming query defines the timing of streaming
data processing, whether
+the query is going to executed as micro-batch query with a fixed batch
interval or as a continuous processing query.
+Here are the different kinds of triggers that are supported.
+
+<table class="table">
+ <tr>
+ <th>Trigger Type</th>
+ <th>Description</th>
+ </tr>
+ <tr>
+ <td><i>unspecified (default)</i></td>
+ <td>
+ If no trigger setting is explicitly specified, then by default,
the query will be
+ executed in micro-batch mode, where micro-batches will be
generated as soon as
+ the previous micro-batch has completed processing.
+ </td>
+ </tr>
+ <tr>
+ <td><b>Fixed interval micro-batches</b></td>
+ <td>
+ The query will be executed with micro-batches mode, where
micro-batches will be kicked off
+ at the user-specified intervals.
+ <ul>
+ <li>If the previous micro-batch completes within the interval,
then the engine will wait until
+ the interval is over before kicking off the next
micro-batch.</li>
+
+ <li>If the previous micro-batch takes longer than the interval
to complete (i.e. if an
+ interval boundary is missed), then the next micro-batch will
start as soon as the
+ previous one completes (i.e., it will not wait for the next
interval boundary).</li>
+
+ <li>If no new data is available, then no micro-batch will be
kicked off.</li>
+ </ul>
+ </td>
+ </tr>
+ <tr>
+ <td><b>One-time micro-batch</b></td>
+ <td>
+ The query will execute *only one* micro-batch to process all the
available data and then
+ stop on its own. This is useful in scenarios you want to
periodically spin up a cluster,
+ process everything that is available since the last period, and
then the shutdown the
+ cluster. In some case, this may lead to significant cost savings.
+ </td>
+ </tr>
+ <tr>
+ <td><b>Continuous with fixed checkpoint
interval</b><br/><i>(experimental)</i></td>
+ <td>
+ The query will be executed in the new low-latency, continuous
processing mode. Read more
+ about this in the <a
href="#continuous-processing-experimental">Continuous Processing section</a>
below.
+ </td>
+ </tr>
+</table>
+
+Here are a few code examples.
+
+<div class="codetabs">
+<div data-lang="scala" markdown="1">
+
+{% highlight scala %}
+import org.apache.spark.sql.streaming.Trigger
+
+// Default trigger (runs micro-batch as soon as it can)
+df.writeStream
+ .format("console")
+ .start()
+
+// ProcessingTime trigger with two-second micro-batch interval
+df.writeStream
+ .format("console")
+ .trigger(Trigger.ProcessingTime("2 seconds"))
+ .start()
+
+// One-time trigger
+df.writeStream
+ .format("console")
+ .trigger(Trigger.Once())
+ .start()
+
+// Continuous trigger with one-second checkpointing interval
+df.writeStream
+ .format("console")
+ .trigger(Trigger.Continuous())
+ .start()
+
+{% endhighlight %}
+
+
+</div>
+<div data-lang="java" markdown="1">
+
+{% highlight java %}
+import org.apache.spark.sql.streaming.Trigger
+
+// Default trigger (runs micro-batch as soon as it can)
+df.writeStream
+ .format("console")
+ .start();
+
+// ProcessingTime trigger with two-second micro-batch interval
+df.writeStream
+ .format("console")
+ .trigger(Trigger.ProcessingTime("2 seconds"))
+ .start();
+
+// One-time trigger
+df.writeStream
+ .format("console")
+ .trigger(Trigger.Once())
+ .start();
+
+// Continuous trigger with one-second checkpointing interval
+df.writeStream
+ .format("console")
+ .trigger(Trigger.Continuous())
--- End diff --
ditto
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]