[GitHub] spark pull request #20631: [SPARK-23454][SS][DOCS] Added trigger information...

zsxwing Tue, 20 Feb 2018 11:50:28 -0800

Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20631#discussion_r169438035
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -1979,6 +2006,172 @@ which has methods that get called whenever there is 
a sequence of rows generated
     
     - Whenever `open` is called, `close` will also be called (unless the JVM 
exits due to some error). This is true even if `open` returns false. If there 
is any error in processing and writing the data, `close` will be called with 
the error. It is your responsibility to clean up state (e.g. connections, 
transactions, etc.) that have been created in `open` such that there are no 
resource leaks.
     
    +#### Triggers
    +The trigger settings of a streaming query defines the timing of streaming 
data processing, whether
    +the query is going to executed as micro-batch query with a fixed batch 
interval or as a continuous processing query.
    +Here are the different kinds of triggers that are supported.
    +
    +<table class="table">
    +  <tr>
    +    <th>Trigger Type</th>
    +    <th>Description</th>
    +  </tr>
    +  <tr>
    +    <td><i>unspecified (default)</i></td>
    +    <td>
    +        If no trigger setting is explicitly specified, then by default, 
the query will be
    +        executed in micro-batch mode, where micro-batches will be 
generated as soon as
    +        the previous micro-batch has completed processing.
    +    </td>
    +  </tr>
    +  <tr>
    +    <td><b>Fixed interval micro-batches</b></td>
    +    <td>
    +        The query will be executed with micro-batches mode, where 
micro-batches will be kicked off
    +        at the user-specified intervals.
    +        <ul>
    +          <li>If the previous micro-batch completes within the interval, 
then the engine will wait until
    +          the interval is over before kicking off the next 
micro-batch.</li>
    +
    +          <li>If the previous micro-batch takes longer than the interval 
to complete (i.e. if an
    +          interval boundary is missed), then the next micro-batch will 
start as soon as the
    +          previous one completes (i.e., it will not wait for the next 
interval boundary).</li>
    +
    +          <li>If no new data is available, then no micro-batch will be 
kicked off.</li>
    +        </ul>
    +    </td>
    +  </tr>
    +  <tr>
    +    <td><b>One-time micro-batch</b></td>
    +    <td>
    +        The query will execute *only one* micro-batch to process all the 
available data and then
    +        stop on its own. This is useful in scenarios you want to 
periodically spin up a cluster,
    +        process everything that is available since the last period, and 
then the shutdown the
    +        cluster. In some case, this may lead to significant cost savings.
    +    </td>
    +  </tr>
    +  <tr>
    +    <td><b>Continuous with fixed checkpoint 
interval</b><br/><i>(experimental)</i></td>
    +    <td>
    +        The query will be executed in the new low-latency, continuous 
processing mode. Read more
    +        about this in the <a 
href="#continuous-processing-experimental">Continuous Processing section</a> 
below.
    +    </td>
    +  </tr>
    +</table>
    +
    +Here are a few code examples.
    +
    +<div class="codetabs">
    +<div data-lang="scala"  markdown="1">
    +
    +{% highlight scala %}
    +import org.apache.spark.sql.streaming.Trigger
    +
    +// Default trigger (runs micro-batch as soon as it can)
    +df.writeStream
    +  .format("console")
    +  .start()
    +
    +// ProcessingTime trigger with two-second micro-batch interval
    +df.writeStream
    +  .format("console")
    +  .trigger(Trigger.ProcessingTime("2 seconds"))
    +  .start()
    +
    +// One-time trigger
    +df.writeStream
    +  .format("console")
    +  .trigger(Trigger.Once())
    +  .start()
    +
    +// Continuous trigger with one-second checkpointing interval
    +df.writeStream
    +  .format("console")
    +  .trigger(Trigger.Continuous())
    +  .start()
    +
    +{% endhighlight %}
    +
    +
    +</div>
    +<div data-lang="java"  markdown="1">
    +
    +{% highlight java %}
    +import org.apache.spark.sql.streaming.Trigger
    +
    +// Default trigger (runs micro-batch as soon as it can)
    +df.writeStream
    +  .format("console")
    +  .start();
    +
    +// ProcessingTime trigger with two-second micro-batch interval
    +df.writeStream
    +  .format("console")
    +  .trigger(Trigger.ProcessingTime("2 seconds"))
    +  .start();
    +
    +// One-time trigger
    +df.writeStream
    +  .format("console")
    +  .trigger(Trigger.Once())
    +  .start();
    +
    +// Continuous trigger with one-second checkpointing interval
    +df.writeStream
    +  .format("console")
    +  .trigger(Trigger.Continuous())
    --- End diff --
    
    ditto



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20631: [SPARK-23454][SS][DOCS] Added trigger information...

Reply via email to