GitHub user jose-torres opened a pull request:

    https://github.com/apache/spark/pull/20646

    [SPARK-23408][SS] Synchronize successive AddDataMemory actions in 
StreamTest.

    ## What changes were proposed in this pull request?
    
    The stream-stream join tests add data to multiple sources, and expect it 
all to show up in the next batch. But there's a race condition; the new batch 
might trigger when only one of the AddData actions has been reached.
    
    Fortunately, MemoryStream synchronizes batch generation on itself, and 
StreamExecution won't generate empty batches. So we can resolve this race 
condition by synchronizing successive AddDataMemory actions against every 
MemoryStream together. Then we can be sure StreamExecution won't start 
generating a batch before all the data is present.
    
    ## How was this patch tested?
    existing tests


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jose-torres/spark flaky

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20646.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20646
    
----
commit d540be6bb051a33d2f6bd69a49fbe11afe9f0a65
Author: Jose Torres <jose@...>
Date:   2018-02-20T23:34:16Z

    just use synchronization

commit d91c55f1a17b03aa2d46682e76c6eb207e71a521
Author: Jose Torres <jose@...>
Date:   2018-02-20T23:38:35Z

    Merge branch 'master' of https://github.com/apache/spark into flaky

commit dce075f53c8a1418dac99c9b7b7f9b7e79ed17ff
Author: Jose Torres <jose@...>
Date:   2018-02-20T23:45:40Z

    fix merge

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to