GitHub user yssharma opened a pull request:
https://github.com/apache/spark/pull/17467
Ysharma/spark kinesis retries
## What changes were proposed in this pull request?
The pull requests proposes to remove the hardcoded values for Amazon
Kinesis - MIN_RETRY_WAIT_TIME_MS, MAX_RETRIES.
This change is critical for kinesis checkpoint recovery when the kinesis
backed rdd is huge.
Following happens in a typical kinesis recovery :
- kinesis throttles large number of requests while recovering
- retries in case of throttling are not able to recover due to the small
wait period
- kinesis throttles per second, the wait period should be configurable for
recovery
The patch picks the spark kinesis configs from:
- spark.streaming.kinesis.retry.wait.time
- spark.streaming.kinesis.retry.max.attempts
Jira : https://issues.apache.org/jira/browse/SPARK-20140
## How was this patch tested?
Modified the KinesisBackedBlockRDDSuite.scala to run kinesis tests with the
modified configurations. Wasn't able to test the patch with actual throttling.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/yssharma/spark ysharma/spark-kinesis-retries
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/17467.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #17467
----
commit 67306cf76455c0ac080357d7aa7dbc4a5644896e
Author: Yash Sharma <[email protected]>
Date: 2017-03-29T09:43:56Z
Remove hardcoded retries for kinesis backed block rdd
commit 3aabde82e13de61ad5ec63b491854fb2576e97cc
Author: Yash Sharma <[email protected]>
Date: 2017-03-29T12:16:32Z
add testcase with the modified configurations
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]