GitHub user yssharma opened a pull request:

    https://github.com/apache/spark/pull/17467

    Ysharma/spark kinesis retries

    ## What changes were proposed in this pull request?
    
    The pull requests proposes to remove the hardcoded values for Amazon 
Kinesis - MIN_RETRY_WAIT_TIME_MS, MAX_RETRIES.
    
    This change is critical for kinesis checkpoint recovery when the kinesis 
backed rdd is huge.
    Following happens in a typical kinesis recovery :
    - kinesis throttles large number of requests while recovering
    - retries in case of throttling are not able to recover due to the small 
wait period
    - kinesis throttles per second, the wait period should be configurable for 
recovery
    
    The patch picks the spark kinesis configs from:
    - spark.streaming.kinesis.retry.wait.time
    - spark.streaming.kinesis.retry.max.attempts
    
    Jira : https://issues.apache.org/jira/browse/SPARK-20140
    
    ## How was this patch tested?
    
    Modified the KinesisBackedBlockRDDSuite.scala to run kinesis tests with the 
modified configurations. Wasn't able to test the patch with actual throttling.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yssharma/spark ysharma/spark-kinesis-retries

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17467.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17467
    
----
commit 67306cf76455c0ac080357d7aa7dbc4a5644896e
Author: Yash Sharma <[email protected]>
Date:   2017-03-29T09:43:56Z

    Remove hardcoded retries for kinesis backed block rdd

commit 3aabde82e13de61ad5ec63b491854fb2576e97cc
Author: Yash Sharma <[email protected]>
Date:   2017-03-29T12:16:32Z

    add testcase with the modified configurations

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to