GitHub user tcondie opened a pull request:
https://github.com/apache/spark/pull/17219
[SPARK-19876][SS][WIP] OneTime Trigger Executor
## What changes were proposed in this pull request?
An additional trigger and trigger executor that will execute a single
trigger only. One can use this OneTime trigger to have more control over the
scheduling of triggers.
In addition, this patch requires an optimization to StreamExecution that
logs a commit record at the end of successfully processing a batch. This new
commit log will be used to determine the next batch (offsets) to process after
a restart, instead of using the offset log itself to determine what batch to
process next after restart; using the offset log to determine this would
process the previously logged batch, always, thus not permitting a OneTime
trigger feature.
## How was this patch tested?
A number of existing tests have been revised. These tests all assumed that
when restarting a stream, the last batch in the offset log is to be
re-processed. Given that we now have a commit log that will tell us if that
last batch was processed successfully, the results/assumptions of those tests
needed to be revised accordingly.
In addition, a OneTime trigger test was added to StreamingQuerySuite, which
tests:
- The semantics of OneTime trigger (i.e., on start, execute a single batch,
then stop).
- The case when the commit log was not able to successfully log the
completion of a batch before restart, which would mean that we should fall back
to what's in the offset log.
- A OneTime trigger execution that results in an exception being thrown.
@marmbrus @tdas @zsxwing
Please review http://spark.apache.org/contributing.html before opening a
pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tcondie/spark stream-commit
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/17219.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #17219
----
commit 08a36f8c5a833da1deb5db99dc620ba4e98d67a1
Author: Tyson Condie <[email protected]>
Date: 2017-03-06T19:05:58Z
update
commit 0e21d0e829a134d050b1c881745dbcfca8986378
Author: Tyson Condie <[email protected]>
Date: 2017-03-07T00:46:30Z
Merge branch 'master' of https://github.com/apache/spark into stream-commit
commit 9b8abb445725828e73e392ee58ffa844c2506ca3
Author: Tyson Condie <[email protected]>
Date: 2017-03-08T18:55:57Z
update existing tests
commit 682eb1a3987c0f481de4bebf72926f14816d7607
Author: Tyson Condie <[email protected]>
Date: 2017-03-09T00:10:06Z
add onetime trigger test
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]