[ https://issues.apache.org/jira/browse/YARN-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588818#comment-13588818 ]
Bikas Saha commented on YARN-365: --------------------------------- Do we need to worry about there being overlap between the 2 lists. i.e. a newlyLaunchedContainer also got completed by the time the slow RM handled the NM updates? {code} + private synchronized void nodeUpdate(RMNode nm) { if (LOG.isDebugEnabled()) { LOG.debug("nodeUpdate: " + nm + " clusterResources: " + clusterResource); } - - FiCaSchedulerNode node = getNode(nm.getNodeID()); + FiCaSchedulerNode node = getNode(nm.getNodeID()); + List<UpdatedContainerInfo> containerInfoList = nm.pullContainerUpdates(); + List<ContainerStatus> newlyLaunchedContainers = new ArrayList<ContainerStatus>(); + List<ContainerStatus> completedContainers = new ArrayList<ContainerStatus>(); + for(UpdatedContainerInfo containerInfo : containerInfoList) { + newlyLaunchedContainers.addAll(containerInfo.getNewlyLaunchedContainers()); + completedContainers.addAll(containerInfo.getCompletedContainers()); + } + {code} Note than this problem (if it is a problem) exists regardless of this change because a container may start and complete within the NM heartbeat interval. However, chances of hitting it are low before this change because the heartbeat interval is short and so the RM never see a node update in which the same container both launches and completes. After this change, with a slow RM, this can easily happen, specially because we are simply concatenating both sub-lists. > Each NM heartbeat should not generate an event for the Scheduler > ---------------------------------------------------------------- > > Key: YARN-365 > URL: https://issues.apache.org/jira/browse/YARN-365 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler > Affects Versions: 0.23.5 > Reporter: Siddharth Seth > Assignee: Xuan Gong > Fix For: 2.0.4-beta > > Attachments: Prototype2.txt, Prototype3.txt, YARN-365.10.patch, > YARN-365.1.patch, YARN-365.2.patch, YARN-365.3.patch, YARN-365.4.patch, > YARN-365.5.patch, YARN-365.6.patch, YARN-365.7.patch, YARN-365.8.patch, > YARN-365.9.patch > > > Follow up from YARN-275 > https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira