GitHub user jose-torres opened a pull request: https://github.com/apache/spark/pull/20646
[SPARK-23408][SS] Synchronize successive AddDataMemory actions in StreamTest. ## What changes were proposed in this pull request? The stream-stream join tests add data to multiple sources, and expect it all to show up in the next batch. But there's a race condition; the new batch might trigger when only one of the AddData actions has been reached. Fortunately, MemoryStream synchronizes batch generation on itself, and StreamExecution won't generate empty batches. So we can resolve this race condition by synchronizing successive AddDataMemory actions against every MemoryStream together. Then we can be sure StreamExecution won't start generating a batch before all the data is present. ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/jose-torres/spark flaky Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20646.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20646 ---- commit d540be6bb051a33d2f6bd69a49fbe11afe9f0a65 Author: Jose Torres <jose@...> Date: 2018-02-20T23:34:16Z just use synchronization commit d91c55f1a17b03aa2d46682e76c6eb207e71a521 Author: Jose Torres <jose@...> Date: 2018-02-20T23:38:35Z Merge branch 'master' of https://github.com/apache/spark into flaky commit dce075f53c8a1418dac99c9b7b7f9b7e79ed17ff Author: Jose Torres <jose@...> Date: 2018-02-20T23:45:40Z fix merge ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org