GitHub user squito opened a pull request:

    https://github.com/apache/spark/pull/13565

    [SPARK-15783][CORE] Fix Flakiness in BlacklistIntegrationSuite

    ## What changes were proposed in this pull request?
    
    Three changes here -- first two were causing failures w/ 
BlacklistIntegrationSuite
    
    1. The testing framework didn't include the reviveOffers thread, so the 
test which involved delay scheduling might never submit offers late enough for 
the delay scheduling to kick in.  So added in the periodic revive offers, just 
like the real scheduler.
    
    2. `assertEmptyDataStructures` would occasionally fail, because it appeared 
there was still an active job.  This is because in DAGScheduler, the jobWaiter 
is notified of the job completion before the data structures are cleaned up.  
Most of the time the test code that is waiting on the jobWaiter won't become 
active until after the data structures are cleared, but occasionally the race 
goes the other way, and the assertions fail.
    
    3. `DAGSchedulerSuite` was not stopping all the inner parts it was setting 
up, so each test was leaking a number of threads.  So we stop those parts too.
    
    ## How was this patch tested?
    
    I ran all the tests in `BlacklistIntegrationSuite` 5k times and everything 
in `DAGSchedulerSuite` 1k times on my laptop.  Also I ran a full jenkins build 
with `BlacklistIntegrationSuite` 500 times and `DAGSchedulerSuite` 50 times, 
see https://github.com/apache/spark/pull/13548.  (I tried more times but 
jenkins timed out.)
    
    To check for more leaked threads, I added some code to dump the list of all 
threads at the end of each test in DAGSchedulerSuite, which is how I discovered 
the mapOutputTracker and eventLoop were leaking threads.  (I removed that code 
from the final pr, just part of the testing.)
    
    And I'll run Jenkins on this a couple of times to do one more check.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/squito/spark blacklist_extra_tests

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13565.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13565
    
----
commit 0ee2f1fe487e4f7defb7a4bc53ab3d69d16c9173
Author: Imran Rashid <[email protected]>
Date:   2016-06-06T13:26:34Z

    increase test timeouts

commit 270a038a20d8f1e2604636f00498fc4dcacc178a
Author: Imran Rashid <[email protected]>
Date:   2016-06-06T14:40:54Z

    for delay scheduling to work, the mock backend has to periodically revive 
all offers

commit 4dc8711993c69fd852da92597473b6852eaa2e21
Author: Imran Rashid <[email protected]>
Date:   2016-06-06T14:51:37Z

    cleanup state before notifying job waiter; stop things to clean up a bunch 
of threads

commit 7f4e9eb41e3276e4e91f8f262b4e3e25a28e8e7c
Author: Imran Rashid <[email protected]>
Date:   2016-06-07T22:14:13Z

    repeat tests a lot to check for flakiness

commit f562c6658efca4c2fc505e4ea906eb78a3901a0d
Author: Imran Rashid <[email protected]>
Date:   2016-06-07T22:23:22Z

    Merge branch 'master' into blacklist_extra_tests

commit 5bc48f23324a754e695535e036cf3759c0dfb040
Author: Imran Rashid <[email protected]>
Date:   2016-06-07T22:23:39Z

    Revert "[SPARK-15783][CORE] still some flakiness in these blacklist tests 
so ignore for now"
    
    This reverts commit 36d3dfa59a1ec0af6118e0667b80e9b7628e2cb6.

commit 41b7b79b366aa3ebbd5e7796e0d3f703250e51cf
Author: Imran Rashid <[email protected]>
Date:   2016-06-08T05:25:56Z

    tone it down a bit

commit 174d0704eb1bf01df6834ca5f437518a2131d45a
Author: Imran Rashid <[email protected]>
Date:   2016-06-08T18:48:19Z

    Go back to running tests once

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to