Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/20631#discussion_r169437575 --- Diff: docs/structured-streaming-programming-guide.md --- @@ -1979,6 +2006,172 @@ which has methods that get called whenever there is a sequence of rows generated - Whenever `open` is called, `close` will also be called (unless the JVM exits due to some error). This is true even if `open` returns false. If there is any error in processing and writing the data, `close` will be called with the error. It is your responsibility to clean up state (e.g. connections, transactions, etc.) that have been created in `open` such that there are no resource leaks. +#### Triggers +The trigger settings of a streaming query defines the timing of streaming data processing, whether +the query is going to executed as micro-batch query with a fixed batch interval or as a continuous processing query. +Here are the different kinds of triggers that are supported. + +<table class="table"> + <tr> + <th>Trigger Type</th> + <th>Description</th> + </tr> + <tr> + <td><i>unspecified (default)</i></td> + <td> + If no trigger setting is explicitly specified, then by default, the query will be + executed in micro-batch mode, where micro-batches will be generated as soon as + the previous micro-batch has completed processing. + </td> + </tr> + <tr> + <td><b>Fixed interval micro-batches</b></td> + <td> + The query will be executed with micro-batches mode, where micro-batches will be kicked off + at the user-specified intervals. + <ul> + <li>If the previous micro-batch completes within the interval, then the engine will wait until + the interval is over before kicking off the next micro-batch.</li> + + <li>If the previous micro-batch takes longer than the interval to complete (i.e. if an + interval boundary is missed), then the next micro-batch will start as soon as the + previous one completes (i.e., it will not wait for the next interval boundary).</li> + + <li>If no new data is available, then no micro-batch will be kicked off.</li> + </ul> + </td> + </tr> + <tr> + <td><b>One-time micro-batch</b></td> + <td> + The query will execute *only one* micro-batch to process all the available data and then + stop on its own. This is useful in scenarios you want to periodically spin up a cluster, + process everything that is available since the last period, and then the shutdown the + cluster. In some case, this may lead to significant cost savings. + </td> + </tr> + <tr> + <td><b>Continuous with fixed checkpoint interval</b><br/><i>(experimental)</i></td> + <td> + The query will be executed in the new low-latency, continuous processing mode. Read more + about this in the <a href="#continuous-processing-experimental">Continuous Processing section</a> below. + </td> + </tr> +</table> + +Here are a few code examples. + +<div class="codetabs"> +<div data-lang="scala" markdown="1"> + +{% highlight scala %} +import org.apache.spark.sql.streaming.Trigger + +// Default trigger (runs micro-batch as soon as it can) +df.writeStream + .format("console") + .start() + +// ProcessingTime trigger with two-second micro-batch interval --- End diff -- nit: two-second`s`
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org