GitHub user tdas opened a pull request:
https://github.com/apache/spark/pull/21491
[SPARK-24453][SS] Fix error recovering from the failure in a no-data batch
## What changes were proposed in this pull request?
The error occurs when we are recovering from a failure in a no-data batch
(say X) that has been planned (i.e. written to offset log) but not executed
(i.e. not written to commit log). Upon recovery the following sequence of
events happen.
1. `MicroBatchExecution.populateStartOffsets` sets `currentBatchId` to X.
Since there was no data in the batch, the `availableOffsets` is same as
`committedOffsets`, so `isNewDataAvailable` is `false`.
2. When `MicroBatchExecution.constructNextBatch` is called, ideally it
should immediately return true because the next batch has already been
constructed. However, the check of whether the batch has been constructed was
`if (isNewDataAvailable) return true`. Since the planned batch is a no-data
batch, it escaped this check and proceeded to plan the same batch X *once
again*.
The correct solution is to check the offset log whether the currentBatchId
is the latest or not. This is the fix below.
## How was this patch tested?
new unit test
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tdas/spark SPARK-24453
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21491.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21491
----
commit e6b12e64d7cfffad7e50c7c46b9604ea39a781cb
Author: Tathagata Das <tathagata.das1565@...>
Date: 2018-06-04T18:17:48Z
[SPARK-24156][SS] Fix error recovering from the failure in a no-data batch
**This PR is not for merging, only for looking at the change. Consider only
the changes to the file MicroBatchExecution.scala.**
The error occurs when we are recovering from a failure in a no-data batch
(say X) that has been planned (i.e. written to offset log) but not executed
(i.e. not written to commit log). Upon recovery the following sequence of
events happen.
1. `MicroBatchExecution.populateStartOffsets` sets `currentBatchId` to X.
Since there was no data in the batch, the `availableOffsets` is same as
`committedOffsets`, so `isNewDataAvailable` is `false`.
2. When `MicroBatchExecution.constructNextBatch` is called, ideally it
should immediately return true because the next batch has already been
constructed. However, the check of whether the batch has been constructed was
`if (isNewDataAvailable) return true`. Since the planned batch is a no-data
batch, it escaped this check and proceeded to plan the same batch X *once
again*.
The correct solution is to check the offset log whether the currentBatchId
is the latest or not. This is the fix below.
TODO
Author: Tathagata Das <[email protected]>
Closes #2567 from tdas/SC-11085.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]