GitHub user marmbrus opened a pull request:
https://github.com/apache/spark/pull/11804
[SPARK-13985][SQL] Deterministic batches with ids
This PR relaxes the requirements of a `Sink` for structured streaming to
only require idempotent appending of data. Previously the `Sink` needed to be
able to transactionally append data while recording an opaque offset indicated
how far in a stream we have processed.
In order to do this, a new write-ahead-log has been added to stream
execution, which records the offsets that will are present in each batch. The
log is created in the newly added `checkpointLocation`, which defaults to
`${spark.sql.streaming.checkpointLocation}/${queryName}` but can be overriden
by setting `checkpointLocation` in `DataFrameWriter`.
In addition to making sinks easier to write the addition of batchIds and a
checkpoint location is done in anticipation of integration with the the
`StateStore` (#11645).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/marmbrus/spark batchIds
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11804.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11804
----
commit 53ba226e97cc1b216b3333239042f53a74bb13f6
Author: Michael Armbrust <[email protected]>
Date: 2016-03-17T00:14:09Z
[SPARK-13985][SQL] Deterministic batches with ids
commit 97503f1b45d45c4e4524cf43853e9faab43d0032
Author: Michael Armbrust <[email protected]>
Date: 2016-03-17T23:58:34Z
cleanup
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]