GitHub user brucezhao11 opened a pull request:

    https://github.com/apache/spark/pull/21929

    [Streaming-24970][Kinesis] Create WriteAheadLogBackedBlockRDD for Kinesis 
Streaming if WAL is enabled.

    
    ## What changes were proposed in this pull request?
    
    By default, KinesisInputDStream creates KinesisBackedBlockRDD. When an 
application tries to recover from streaming checkpoint, the RDD will access 
Kinesis directly to re-read data. As all partitions in the BlockRDD accesses 
Kinesis, and AWS Kinesis only supports 5 concurrency reads per shard per 
second, it will touch ProvisionedThroughputExceededException easily. And when 
the conflicts are heavy, the recover will be failed.
    
    Mostly, when we use Spark streaming, we will enable WAL. Then we can 
recover data from WAL, instead of re-reading from Kinesis directly.
    
    This PR tries to create WriteAheadLogBackedBlockRDD for Kinesis Streaming 
if WAL is enabled. 
    
    ## How was this patch tested?
    
    In KinesisStreamSuite.scala, I will add a test case.
    
    Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/brucezhao11/spark dev

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21929.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21929
    
----
commit cbff70a803a59fce0d6121fabd0f5958d7b0436d
Author: bruce_zhao <bruce_zhao@...>
Date:   2018-07-30T14:08:08Z

    [Streaming][Kinesis] Create WriteAheadLogBackedBlockRDD for Kinesis 
Streaming if WAL is enabled. With this change, it can recover data from WAL, 
instead of re-reading from Kinesis directly when an application tries to 
recover from streaming checkpoint.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to