Min Shen created YARN-5543:
------------------------------

             Summary: ResourceManager SchedulingMonitor could potentially 
terminate the preemption checker thread
                 Key: YARN-5543
                 URL: https://issues.apache.org/jira/browse/YARN-5543
             Project: Hadoop YARN
          Issue Type: Bug
          Components: capacityscheduler, resourcemanager
    Affects Versions: 2.6.1, 2.7.0
            Reporter: Min Shen


In SchedulingMonitor.java, when the service starts, it starts a checker thread 
to perform Capacity Scheduler's preemption. However, the implementation of this 
checker thread has the following issue:
{code}
while (!stopped && !Thread.currentThread().isInterrupted()) {
    ....
    try {
      Thread.sleep(monitorInterval)
    } catch (InterruptedException e) {
      ....
      break;
    }
}
{code}
The above code snippet will terminate the checker thread whenever it is 
interrupted. 
We noticed in our cluster that this could lead to CapacityScheduler's 
preemption disabled unexpectedly due to the checker thread getting terminated.

We propose to use ScheduledExecutorService to improve the robustness of this 
part of the code to ensure the liveness of CapacityScheduler's preemption 
functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to