Tao Yang created YARN-8958:
------------------------------
Summary: Schedulable entities leak in fair ordering policy when
recovering containers between remove app attempt and remove app
Key: YARN-8958
URL: https://issues.apache.org/jira/browse/YARN-8958
Project: Hadoop YARN
Issue Type: Bug
Components: capacityscheduler
Affects Versions: 3.2.1
Reporter: Tao Yang
Assignee: Tao Yang
We found a NPE in ClientRMService#getApplications when querying apps with
specified queue. The cause is that there is one app which can't be found by
calling RMContextImpl#getRMApps(is finished and swapped out of memory) but
still can be queried from fair ordering policy.
To reproduce schedulable entities leak in fair ordering policy:
(1) create app1 and launch container1 on node1
(2) restart RM
(3) remove app1 attempt, app1 is removed from the schedulable entities.
(4) recover container1, then the state of contianer1 is changed to COMPLETED,
app1 is bring back to entitiesToReorder after container released, then app1
will be added back into schedulable entities after calling
FairOrderingPolicy#getAssignmentIterator by scheduler.
(5) remove app1
To solve this problem, we should make sure schedulableEntities can only be
affected by add or remove app attempt, new entity should not be added into
schedulableEntities by reordering process.
{code:java}
protected void reorderSchedulableEntity(S schedulableEntity) {
//remove, update comparable data, and reinsert to update position in order
schedulableEntities.remove(schedulableEntity);
updateSchedulingResourceUsage(
schedulableEntity.getSchedulingResourceUsage());
schedulableEntities.add(schedulableEntity);
}
{code}
Related codes above can be improved as follow to make sure only existent entity
can be re-add into schedulableEntities.
{code:java}
protected void reorderSchedulableEntity(S schedulableEntity) {
//remove, update comparable data, and reinsert to update position in order
boolean exists = schedulableEntities.remove(schedulableEntity);
updateSchedulingResourceUsage(
schedulableEntity.getSchedulingResourceUsage());
if (exists) {
schedulableEntities.add(schedulableEntity);
} else {
LOG.info("Skip reordering non-existent schedulable entity: "
+ schedulableEntity.getId());
}
}
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]