GitHub user squito opened a pull request:
https://github.com/apache/spark/pull/13565
[SPARK-15783][CORE] Fix Flakiness in BlacklistIntegrationSuite
## What changes were proposed in this pull request?
Three changes here -- first two were causing failures w/
BlacklistIntegrationSuite
1. The testing framework didn't include the reviveOffers thread, so the
test which involved delay scheduling might never submit offers late enough for
the delay scheduling to kick in. So added in the periodic revive offers, just
like the real scheduler.
2. `assertEmptyDataStructures` would occasionally fail, because it appeared
there was still an active job. This is because in DAGScheduler, the jobWaiter
is notified of the job completion before the data structures are cleaned up.
Most of the time the test code that is waiting on the jobWaiter won't become
active until after the data structures are cleared, but occasionally the race
goes the other way, and the assertions fail.
3. `DAGSchedulerSuite` was not stopping all the inner parts it was setting
up, so each test was leaking a number of threads. So we stop those parts too.
## How was this patch tested?
I ran all the tests in `BlacklistIntegrationSuite` 5k times and everything
in `DAGSchedulerSuite` 1k times on my laptop. Also I ran a full jenkins build
with `BlacklistIntegrationSuite` 500 times and `DAGSchedulerSuite` 50 times,
see https://github.com/apache/spark/pull/13548. (I tried more times but
jenkins timed out.)
To check for more leaked threads, I added some code to dump the list of all
threads at the end of each test in DAGSchedulerSuite, which is how I discovered
the mapOutputTracker and eventLoop were leaking threads. (I removed that code
from the final pr, just part of the testing.)
And I'll run Jenkins on this a couple of times to do one more check.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/squito/spark blacklist_extra_tests
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/13565.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #13565
----
commit 0ee2f1fe487e4f7defb7a4bc53ab3d69d16c9173
Author: Imran Rashid <[email protected]>
Date: 2016-06-06T13:26:34Z
increase test timeouts
commit 270a038a20d8f1e2604636f00498fc4dcacc178a
Author: Imran Rashid <[email protected]>
Date: 2016-06-06T14:40:54Z
for delay scheduling to work, the mock backend has to periodically revive
all offers
commit 4dc8711993c69fd852da92597473b6852eaa2e21
Author: Imran Rashid <[email protected]>
Date: 2016-06-06T14:51:37Z
cleanup state before notifying job waiter; stop things to clean up a bunch
of threads
commit 7f4e9eb41e3276e4e91f8f262b4e3e25a28e8e7c
Author: Imran Rashid <[email protected]>
Date: 2016-06-07T22:14:13Z
repeat tests a lot to check for flakiness
commit f562c6658efca4c2fc505e4ea906eb78a3901a0d
Author: Imran Rashid <[email protected]>
Date: 2016-06-07T22:23:22Z
Merge branch 'master' into blacklist_extra_tests
commit 5bc48f23324a754e695535e036cf3759c0dfb040
Author: Imran Rashid <[email protected]>
Date: 2016-06-07T22:23:39Z
Revert "[SPARK-15783][CORE] still some flakiness in these blacklist tests
so ignore for now"
This reverts commit 36d3dfa59a1ec0af6118e0667b80e9b7628e2cb6.
commit 41b7b79b366aa3ebbd5e7796e0d3f703250e51cf
Author: Imran Rashid <[email protected]>
Date: 2016-06-08T05:25:56Z
tone it down a bit
commit 174d0704eb1bf01df6834ca5f437518a2131d45a
Author: Imran Rashid <[email protected]>
Date: 2016-06-08T18:48:19Z
Go back to running tests once
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]