GitHub user maropu opened a pull request:
https://github.com/apache/spark/pull/16213
[SPARK-18020][Streaming][Kinesis] Checkpoint SHARD_END to finish reading
closed shards
## What changes were proposed in this pull request?
This pr is to fix an issue occurred when resharding Kinesis streams; the
resharding makes the KCL throw an exception because Spark does not checkpoint
`SHARD_END` when finishing reading closed shards in
`KinesisRecordProcessor#shutdown`. This bug finally leads to stopping
subscribing new split (or merged) shards.
## How was this patch tested?
Added a test in `KinesisStreamSuite` to check if it works well when
splitting/merging shards.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/maropu/spark SPARK-18020
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/16213.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #16213
----
commit e556d3ca64b9c0bf74dd7d4466425b4656342460
Author: Takeshi YAMAMURO <[email protected]>
Date: 2016-12-04T11:26:31Z
Force advancing a position to SHARD_END by using the KCL internal function,
this is workaround
commit 7c51da8c1d31e2d963c8ca724c83010c278ea65a
Author: Takeshi YAMAMURO <[email protected]>
Date: 2016-12-08T02:02:29Z
Add tests for splitting and merging shards
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]