GitHub user gaborgsomogyi opened a pull request:
https://github.com/apache/spark/pull/20620
[SPARK-23438][DSTREAMS] Fix DStreams data loss with WAL when driver crashes
## What changes were proposed in this pull request?
There is a race condition introduced in SPARK-11141 which could cause data
loss.
The problem is that ReceivedBlockTracker.insertAllocatedBatch function
assumes that all blocks from streamIdToUnallocatedBlockQueues allocated to the
batch and clears the queue.
In this PR only the allocated blocks will be removed from the queue which
will prevent data loss.
## How was this patch tested?
Additional unit test + manually.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gaborgsomogyi/spark SPARK-23438
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20620.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20620
----
commit 152fec431218161e538c377a6cb82753100dc70b
Author: Gabor Somogyi <gabor.g.somogyi@...>
Date: 2018-02-09T08:30:19Z
[SPARK-23438][DSTREAMS] Fix DStreams data loss with WAL when driver crashes
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]