[GitHub] spark pull request #20650: [SPARK-23408][SS] Synchronize successive AddData ...

tdas Wed, 21 Feb 2018 03:14:18 -0800

GitHub user tdas opened a pull request:

    https://github.com/apache/spark/pull/20650


    [SPARK-23408][SS] Synchronize successive AddData actions in 
Streaming*JoinSuite

    ## What changes were proposed in this pull request?
    
    The stream-stream join tests add data to multiple sources and expect it all 
to show up in the next batch. But there's a race condition; the new batch might 
trigger when only one of the AddData actions has been reached.
    
    Prior attempt to solve this issue by @jose-torres in #20646 attempted to 
simultaneously synchronize on all memory sources together when consecutive 
AddData was found in the actions. However, this carries the risk of deadlock as 
well as unintended modification of stress tests (see the above PR for a 
detailed explanation). Instead, this PR attempts the following.
    
    - A new action called `StreamProgressBlockedActions` that allows multiple 
actions to be executed while the streaming query is blocked from making 
progress. This allows data to be added to multiple sources that are made 
visible simultaneously in the next batch.
    - An alias of `StreamProgressBlockedActions` called `MultiAddData` is 
explicitly used in the `Streaming*JoinSuites` to add data to two memory sources 
simultaneously.
    
    ## How was this patch tested?
    Modified test cases in `Streaming*JoinSuites` where there are consecutive 
`AddData` actions.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tdas/spark SPARK-23408

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20650.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20650
    
----
commit b4c3c55db394178f083d3eeaf537e407c026f0cd
Author: Tathagata Das <tathagata.das1565@...>
Date:   2018-02-21T10:48:15Z

    Fixed bug

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20650: [SPARK-23408][SS] Synchronize successive AddData ...

Reply via email to