GitHub user jose-torres opened a pull request:
https://github.com/apache/spark/pull/20646
[SPARK-23408][SS] Synchronize successive AddDataMemory actions in
StreamTest.
## What changes were proposed in this pull request?
The stream-stream join tests add data to multiple sources, and expect it
all to show up in the next batch. But there's a race condition; the new batch
might trigger when only one of the AddData actions has been reached.
Fortunately, MemoryStream synchronizes batch generation on itself, and
StreamExecution won't generate empty batches. So we can resolve this race
condition by synchronizing successive AddDataMemory actions against every
MemoryStream together. Then we can be sure StreamExecution won't start
generating a batch before all the data is present.
## How was this patch tested?
existing tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jose-torres/spark flaky
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20646.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20646
----
commit d540be6bb051a33d2f6bd69a49fbe11afe9f0a65
Author: Jose Torres <jose@...>
Date: 2018-02-20T23:34:16Z
just use synchronization
commit d91c55f1a17b03aa2d46682e76c6eb207e71a521
Author: Jose Torres <jose@...>
Date: 2018-02-20T23:38:35Z
Merge branch 'master' of https://github.com/apache/spark into flaky
commit dce075f53c8a1418dac99c9b7b7f9b7e79ed17ff
Author: Jose Torres <jose@...>
Date: 2018-02-20T23:45:40Z
fix merge
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]