hex108 created YARN-2612:
----------------------------
Summary: Some completed containers are not reported to NM
Key: YARN-2612
URL: https://issues.apache.org/jira/browse/YARN-2612
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Affects Versions: 2.6.0
Reporter: hex108
Fix For: 2.6.0
In YARN-1372, NM will report completed containers to RM until it gets ACK from
RM. If AM does not call allocate, which means that AM does not ack RM, RM will
not ack NM. We have observed these two cases when running Mapreduce task 'pi':
1) RM sends completed containers to AM. After receiving it, AM thinks it has
done the work and does not need resource, so it does not call allocate.
2) When AM finishes, it could not ack to RM because AM itself has not finished
yet.
In order to solve this problem, we have two solutions:
1) When RMAppAttempt call FinalTransition, it means AppAttempt finishes, then
RM could send this AppAttempt's completed containers to NM.
2) In FairScheduler#nodeUpdate, if completed containers sent by NM does not
have corresponding RMContainer, RM just ack it to NM.
We prefer to solution 2 because it is more clear and concise. However RM might
ack same completed containers to NM many times.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)