[
https://issues.apache.org/jira/browse/YARN-7084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Lowe updated YARN-7084:
-----------------------------
Attachment: YARN-7084.001.patch
Saw this fail again, and I had a bit of time to take a deeper look. The test
is starting the monitor and then _immediately_ checking if the policy was
edited:
{code}
try {
monitor.serviceInit(conf);
monitor.serviceStart();
} catch (Exception e) {
fail("SchedulingMonitor failes to start.");
}
verify(mPolicy, times(1)).editSchedule();
{code}
However looking at how the monitor actually start, it's an asynchronous thread
pool that does the real work:
{code}
public void serviceStart() throws Exception {
assert !stopped : "starting when already stopped";
ses = Executors.newSingleThreadScheduledExecutor(new ThreadFactory() {
public Thread newThread(Runnable r) {
Thread t = new Thread(r);
t.setName(getName());
return t;
}
});
handler = ses.scheduleAtFixedRate(new PreemptionChecker(),
0, monitorInterval, TimeUnit.MILLISECONDS);
super.serviceStart();
}
{code}
Therefore there's no guarantee that when the start method returns that the
thread pool has had time to pick up the scheduled task and execute it before
the verify check. On the flip side, there's also no guarantee that the thread
pool couldn't have edited the schedule multiple times before the verify check
if the startup processing was particularly slow or the main thread was somehow
stalled for a while.
If the intent of the unit test is to simply verify that schedule editing
commences when the monitor is started then I think it's better to use
verification with timeout here. However I'm a little unclear on exactly what
semantics the test is really trying to test. Pinging [~mshen].
I'm attaching a patch that implements the verification-with-timeout approach.
The patch also simplifies the unit test by letting exceptions bubble up and
fail the test directly rather than catching and failing with an assert. This
has the benefit of being able to see the exception that caused the test failure
directly rather than a generic failure message that some exception was thrown
during the test.
> TestSchedulingMonitor#testRMStarts fails sporadically
> -----------------------------------------------------
>
> Key: YARN-7084
> URL: https://issues.apache.org/jira/browse/YARN-7084
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Jason Lowe
> Attachments: YARN-7084.001.patch
>
>
> TestSchedulingMonitor has been failing sporadically in precommit builds.
> Failures look like this:
> {noformat}
> Running
> org.apache.hadoop.yarn.server.resourcemanager.monitor.TestSchedulingMonitor
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.802 sec <<<
> FAILURE! - in
> org.apache.hadoop.yarn.server.resourcemanager.monitor.TestSchedulingMonitor
> testRMStarts(org.apache.hadoop.yarn.server.resourcemanager.monitor.TestSchedulingMonitor)
> Time elapsed: 1.728 sec <<< FAILURE!
> org.mockito.exceptions.verification.WantedButNotInvoked:
> Wanted but not invoked:
> schedulingEditPolicy.editSchedule();
> -> at
> org.apache.hadoop.yarn.server.resourcemanager.monitor.TestSchedulingMonitor.testRMStarts(TestSchedulingMonitor.java:58)
> However, there were other interactions with this mock:
> -> at
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.<init>(SchedulingMonitor.java:50)
> -> at
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.serviceInit(SchedulingMonitor.java:61)
> -> at
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.serviceInit(SchedulingMonitor.java:62)
> at
> org.apache.hadoop.yarn.server.resourcemanager.monitor.TestSchedulingMonitor.testRMStarts(TestSchedulingMonitor.java:58)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]