GitHub user tcondie opened a pull request:
https://github.com/apache/spark/pull/15626
SPARK-17829 [SQL] Stable format for offset log
## What changes were proposed in this pull request?
Currently we use java serialization for the WAL that stores the offsets
contained in each batch. This has two main issues:
It can break across spark releases (though this is not the only thing
preventing us from upgrading a running query)
It is unnecessarily opaque to the user.
I'd propose we require offsets to provide a user readable serialization and
use that instead. JSON is probably a good option.
## How was this patch tested?
Tests were added for KafkaOffset in KafkaOffsetSuite and for LongOffset in
OffsetSuite.
Please review
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before
opening a pull request.
@zsxwing @marmbrus
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tcondie/spark spark-8360
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15626.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15626
----
commit 60a71fae0decaebaa4f869b78283295b6491992a
Author: Tyson Condie <[email protected]>
Date: 2016-10-21T23:53:34Z
initial version of offse json serialization
commit 52431141ce4c7efbf82da5a204ba77d16c03f16d
Author: Tyson Condie <[email protected]>
Date: 2016-10-22T00:26:51Z
remove CompositeOffsetSuite
commit 3ccdc5c00b7043570a3567560f1f9ffaeb1ba688
Author: Tyson Condie <[email protected]>
Date: 2016-10-24T22:54:53Z
update
commit 16c6cea8f91e16dc39c2a0796295f233d33054c4
Author: Tyson Condie <[email protected]>
Date: 2016-10-24T23:00:38Z
update test parameters to avoid test name conflict
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]