GitHub user gaborgsomogyi opened a pull request:
https://github.com/apache/spark/pull/21430
[SPARK-23991][DSTREAMS] Fix data loss when WAL write fails in
allocateBlocksToBatch
## What changes were proposed in this pull request?
When blocks tried to get allocated to a batch and WAL write fails then the
blocks will be removed from the received block queue. This fact simply produces
data loss because the next allocation will not find the mentioned blocks in the
queue.
In this PR blocks will be removed from the received queue only if WAL write
succeded.
## How was this patch tested?
Additional unit test.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gaborgsomogyi/spark SPARK-23991
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21430.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21430
----
commit 2d35dfacd54d747e6a4167d46234d4b3ce87529b
Author: Gabor Somogyi <gabor.g.somogyi@...>
Date: 2018-05-25T12:52:36Z
[SPARK-23991][DSTREAMS] Fix data loss when WAL write fails in
allocateBlocksToBatch
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]