GitHub user tcondie opened a pull request:

    https://github.com/apache/spark/pull/17219

    [SPARK-19876][SS][WIP] OneTime Trigger Executor

    ## What changes were proposed in this pull request?
    
    An additional trigger and trigger executor that will execute a single 
trigger only. One can use this OneTime trigger to have more control over the 
scheduling of triggers. 
    
    In addition, this patch requires an optimization to StreamExecution that 
logs a commit record at the end of successfully processing a batch. This new 
commit log will be used to determine the next batch (offsets) to process after 
a restart, instead of using the offset log itself to determine what batch to 
process next after restart; using the offset log to determine this would 
process the previously logged batch, always, thus not permitting a OneTime 
trigger feature.  
    
    ## How was this patch tested?
    
    A number of existing tests have been revised. These tests all assumed that 
when restarting a stream, the last batch in the offset log is to be 
re-processed. Given that we now have a commit log that will tell us if that 
last batch was processed successfully, the results/assumptions of those tests 
needed to be revised accordingly. 
    
    In addition, a OneTime trigger test was added to StreamingQuerySuite, which 
tests:
    - The semantics of OneTime trigger (i.e., on start, execute a single batch, 
then stop).
    - The case when the commit log was not able to successfully log the 
completion of a batch before restart, which would mean that we should fall back 
to what's in the offset log.
    - A OneTime trigger execution that results in an exception being thrown.
    
    @marmbrus @tdas @zsxwing 
    
    Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tcondie/spark stream-commit

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17219.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17219
    
----
commit 08a36f8c5a833da1deb5db99dc620ba4e98d67a1
Author: Tyson Condie <[email protected]>
Date:   2017-03-06T19:05:58Z

    update

commit 0e21d0e829a134d050b1c881745dbcfca8986378
Author: Tyson Condie <[email protected]>
Date:   2017-03-07T00:46:30Z

    Merge branch 'master' of https://github.com/apache/spark into stream-commit

commit 9b8abb445725828e73e392ee58ffa844c2506ca3
Author: Tyson Condie <[email protected]>
Date:   2017-03-08T18:55:57Z

    update existing tests

commit 682eb1a3987c0f481de4bebf72926f14816d7607
Author: Tyson Condie <[email protected]>
Date:   2017-03-09T00:10:06Z

    add onetime trigger test

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to