[ https://issues.apache.org/jira/browse/YARN-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
hex108 updated YARN-2612: ------------------------- Attachment: YARN-2612.patch > Some completed containers are not reported to NM > ------------------------------------------------ > > Key: YARN-2612 > URL: https://issues.apache.org/jira/browse/YARN-2612 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.6.0 > Reporter: hex108 > Fix For: 2.6.0 > > Attachments: YARN-2612.patch > > > In YARN-1372, NM will report completed containers to RM until it gets ACK > from RM. If AM does not call allocate, which means that AM does not ack RM, > RM will not ack NM. We have observed these two cases when running Mapreduce > task 'pi': > 1) RM sends completed containers to AM. After receiving it, AM thinks it has > done the work and does not need resource, so it does not call allocate. > 2) When AM finishes, it could not ack to RM because AM itself has not > finished yet. > In order to solve this problem, we have two solutions: > 1) When RMAppAttempt call FinalTransition, it means AppAttempt finishes, then > RM could send this AppAttempt's completed containers to NM. > 2) In FairScheduler#nodeUpdate, if completed containers sent by NM does not > have corresponding RMContainer, RM just ack it to NM. > We prefer to solution 2 because it is more clear and concise. However RM > might ack same completed containers to NM many times. -- This message was sent by Atlassian JIRA (v6.3.4#6332)