GitHub user marmbrus opened a pull request:

    https://github.com/apache/spark/pull/11804

    [SPARK-13985][SQL] Deterministic batches with ids

    This PR relaxes the requirements of a `Sink` for structured streaming to 
only require idempotent appending of data.  Previously the `Sink` needed to be 
able to transactionally append data while recording an opaque offset indicated 
how far in a stream we have processed.
    
    In order to do this, a new write-ahead-log has been added to stream 
execution, which records the offsets that will are present in each batch.  The 
log is created in the newly added `checkpointLocation`, which defaults to 
`${spark.sql.streaming.checkpointLocation}/${queryName}` but can be overriden 
by setting `checkpointLocation` in `DataFrameWriter`.
    
    In addition to making sinks easier to write the addition of batchIds and a 
checkpoint location is done in anticipation of integration with the the 
`StateStore` (#11645).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/marmbrus/spark batchIds

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11804.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11804
    
----
commit 53ba226e97cc1b216b3333239042f53a74bb13f6
Author: Michael Armbrust <[email protected]>
Date:   2016-03-17T00:14:09Z

    [SPARK-13985][SQL] Deterministic batches with ids

commit 97503f1b45d45c4e4524cf43853e9faab43d0032
Author: Michael Armbrust <[email protected]>
Date:   2016-03-17T23:58:34Z

    cleanup

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to