[jira] [Updated] (YARN-7084) TestSchedulingMonitor#testRMStarts fails sporadically

Jason Lowe (JIRA) Wed, 13 Sep 2017 07:49:41 -0700

     [ 
https://issues.apache.org/jira/browse/YARN-7084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jason Lowe updated YARN-7084:
-----------------------------
    Attachment: YARN-7084.001.patch

Saw this fail again, and I had a bit of time to take a deeper look.  The test 
is starting the monitor and then _immediately_ checking if the policy was 
edited:
{code}
    try {
      monitor.serviceInit(conf);
      monitor.serviceStart();
    } catch (Exception e) {
      fail("SchedulingMonitor failes to start.");
    }
    verify(mPolicy, times(1)).editSchedule();
{code}

However looking at how the monitor actually start, it's an asynchronous thread 
pool that does the real work:
{code}
  public void serviceStart() throws Exception {
    assert !stopped : "starting when already stopped";
    ses = Executors.newSingleThreadScheduledExecutor(new ThreadFactory() {
      public Thread newThread(Runnable r) {
        Thread t = new Thread(r);
        t.setName(getName());
        return t;
      }
    });
    handler = ses.scheduleAtFixedRate(new PreemptionChecker(),
        0, monitorInterval, TimeUnit.MILLISECONDS);
    super.serviceStart();
  }
{code}

Therefore there's no guarantee that when the start method returns that the 
thread pool has had time to pick up the scheduled task and execute it before 
the verify check.  On the flip side, there's also no guarantee that the thread 
pool couldn't have edited the schedule multiple times before the verify check 
if the startup processing was particularly slow or the main thread was somehow 
stalled for a while.

If the intent of the unit test is to simply verify that schedule editing 
commences when the monitor is started then I think it's better to use 
verification with timeout here.  However I'm a little unclear on exactly what 
semantics the test is really trying to test.  Pinging [~mshen].

I'm attaching a patch that implements the verification-with-timeout approach.  
The patch also simplifies the unit test by letting exceptions bubble up and 
fail the test directly rather than catching and failing with an assert.  This 
has the benefit of being able to see the exception that caused the test failure 
directly rather than a generic failure message that some exception was thrown 
during the test.


> TestSchedulingMonitor#testRMStarts fails sporadically
> -----------------------------------------------------
>
>                 Key: YARN-7084
>                 URL: https://issues.apache.org/jira/browse/YARN-7084
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Jason Lowe
>         Attachments: YARN-7084.001.patch
>
>
> TestSchedulingMonitor has been failing sporadically in precommit builds.  
> Failures look like this:
> {noformat}
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.TestSchedulingMonitor
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.802 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.TestSchedulingMonitor
> testRMStarts(org.apache.hadoop.yarn.server.resourcemanager.monitor.TestSchedulingMonitor)
>   Time elapsed: 1.728 sec  <<< FAILURE!
> org.mockito.exceptions.verification.WantedButNotInvoked: 
> Wanted but not invoked:
> schedulingEditPolicy.editSchedule();
> -> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.TestSchedulingMonitor.testRMStarts(TestSchedulingMonitor.java:58)
> However, there were other interactions with this mock:
> -> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.<init>(SchedulingMonitor.java:50)
> -> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.serviceInit(SchedulingMonitor.java:61)
> -> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.serviceInit(SchedulingMonitor.java:62)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.TestSchedulingMonitor.testRMStarts(TestSchedulingMonitor.java:58)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (YARN-7084) TestSchedulingMonitor#testRMStarts fails sporadically

Reply via email to