GitHub user tcondie opened a pull request:

    https://github.com/apache/spark/pull/15626

    SPARK-17829 [SQL] Stable format for offset log

    ## What changes were proposed in this pull request?
    
    Currently we use java serialization for the WAL that stores the offsets 
contained in each batch. This has two main issues:
    It can break across spark releases (though this is not the only thing 
preventing us from upgrading a running query)
    It is unnecessarily opaque to the user.
    I'd propose we require offsets to provide a user readable serialization and 
use that instead. JSON is probably a good option.
    
    ## How was this patch tested?
    
    Tests were added for KafkaOffset in KafkaOffsetSuite and for LongOffset in 
OffsetSuite.
    
    Please review 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before 
opening a pull request.
    
    @zsxwing @marmbrus 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tcondie/spark spark-8360

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15626.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15626
    
----
commit 60a71fae0decaebaa4f869b78283295b6491992a
Author: Tyson Condie <[email protected]>
Date:   2016-10-21T23:53:34Z

    initial version of offse json serialization

commit 52431141ce4c7efbf82da5a204ba77d16c03f16d
Author: Tyson Condie <[email protected]>
Date:   2016-10-22T00:26:51Z

    remove CompositeOffsetSuite

commit 3ccdc5c00b7043570a3567560f1f9ffaeb1ba688
Author: Tyson Condie <[email protected]>
Date:   2016-10-24T22:54:53Z

    update

commit 16c6cea8f91e16dc39c2a0796295f233d33054c4
Author: Tyson Condie <[email protected]>
Date:   2016-10-24T23:00:38Z

    update test parameters to avoid test name conflict

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to