GitHub user frreiss opened a pull request:
https://github.com/apache/spark/pull/14553
[WIP] [SPARK-16963] Initial version of changes to Source trait
## What changes were proposed in this pull request?
Initial proposed changes to the Source trait such that the scheduler can
notify data sources when it is safe to discard buffered data. Major changes so
far:
* Added a method `commit(end: Offset)` that tells the Source that is OK to
discard all offsets up `end`, inclusive.
* Changed the semantics of a `None` value for the `getBatch` method to mean
"from the very beginning of the stream"; as opposed to "all data present in the
Source's buffer".
* Added notes that the upper layers of the system will never call
`getBatch` with a start value less than the last value passed to `commit`.
* Added a `getMinOffset` method to allow the scheduler to query the status
of each Source on restart. This addition is not strictly necessary, but it
seemed like a good idea -- Sources will be maintaining their own persistent
state, and there may be bugs in the checkpointing code.
* Changed the name of `getOffset` to `getMaxOffset`
## How was this patch tested?
Testing is still TBD.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/frreiss/spark-fred fred-16963
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/14553.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #14553
----
commit 6c9acdefe1c791bad3f00d845a72d07e2113b214
Author: frreiss <[email protected]>
Date: 2016-08-09T02:28:32Z
Initial version of changes to Source trait
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]