[GitHub] spark pull request #21491: [SPARK-24453][SS] Fix error recovering from the f...

tdas Mon, 04 Jun 2018 12:48:59 -0700

GitHub user tdas opened a pull request:

    https://github.com/apache/spark/pull/21491


    [SPARK-24453][SS] Fix error recovering from the failure in a no-data batch

    ## What changes were proposed in this pull request?
    
    The error occurs when we are recovering from a failure in a no-data batch 
(say X) that has been planned (i.e. written to offset log) but not executed 
(i.e. not written to commit log). Upon recovery the following sequence of 
events happen.
    
    1. `MicroBatchExecution.populateStartOffsets` sets `currentBatchId` to X. 
Since there was no data in the batch, the `availableOffsets` is same as 
`committedOffsets`, so `isNewDataAvailable` is `false`.
    2. When `MicroBatchExecution.constructNextBatch` is called, ideally it 
should immediately return true because the next batch has already been 
constructed. However, the check of whether the batch has been constructed was 
`if (isNewDataAvailable) return true`. Since the planned batch is a no-data 
batch, it escaped this check and proceeded to plan the same batch X *once 
again*.
    
    The correct solution is to check the offset log whether the currentBatchId 
is the latest or not. This is the fix below.
    
    ## How was this patch tested?
    
    new unit test

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tdas/spark SPARK-24453

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21491.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21491
    
----
commit e6b12e64d7cfffad7e50c7c46b9604ea39a781cb
Author: Tathagata Das <tathagata.das1565@...>
Date:   2018-06-04T18:17:48Z

    [SPARK-24156][SS] Fix error recovering from the failure in a no-data batch
    
    **This PR is not for merging, only for looking at the change. Consider only 
the changes to the file MicroBatchExecution.scala.**
    
    The error occurs when we are recovering from a failure in a no-data batch 
(say X) that has been planned (i.e. written to offset log) but not executed 
(i.e. not written to commit log). Upon recovery the following sequence of 
events happen.
    
    1. `MicroBatchExecution.populateStartOffsets` sets `currentBatchId` to X. 
Since there was no data in the batch, the `availableOffsets` is same as 
`committedOffsets`, so `isNewDataAvailable` is `false`.
    2. When `MicroBatchExecution.constructNextBatch` is called, ideally it 
should immediately return true because the next batch has already been 
constructed. However, the check of whether the batch has been constructed was 
`if (isNewDataAvailable) return true`. Since the planned batch is a no-data 
batch, it escaped this check and proceeded to plan the same batch X *once 
again*.
    
    The correct solution is to check the offset log whether the currentBatchId 
is the latest or not. This is the fix below.
    
    TODO
    
    Author: Tathagata Das <[email protected]>
    
    Closes #2567 from tdas/SC-11085.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21491: [SPARK-24453][SS] Fix error recovering from the f...

Reply via email to