Min Shen created YARN-5543:
------------------------------
Summary: ResourceManager SchedulingMonitor could potentially
terminate the preemption checker thread
Key: YARN-5543
URL: https://issues.apache.org/jira/browse/YARN-5543
Project: Hadoop YARN
Issue Type: Bug
Components: capacityscheduler, resourcemanager
Affects Versions: 2.6.1, 2.7.0
Reporter: Min Shen
In SchedulingMonitor.java, when the service starts, it starts a checker thread
to perform Capacity Scheduler's preemption. However, the implementation of this
checker thread has the following issue:
{code}
while (!stopped && !Thread.currentThread().isInterrupted()) {
....
try {
Thread.sleep(monitorInterval)
} catch (InterruptedException e) {
....
break;
}
}
{code}
The above code snippet will terminate the checker thread whenever it is
interrupted.
We noticed in our cluster that this could lead to CapacityScheduler's
preemption disabled unexpectedly due to the checker thread getting terminated.
We propose to use ScheduledExecutorService to improve the robustness of this
part of the code to ensure the liveness of CapacityScheduler's preemption
functionality.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]