[
https://issues.apache.org/jira/browse/YARN-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588818#comment-13588818
]
Bikas Saha commented on YARN-365:
---------------------------------
Do we need to worry about there being overlap between the 2 lists. i.e. a
newlyLaunchedContainer also got completed by the time the slow RM handled the
NM updates?
{code}
+ private synchronized void nodeUpdate(RMNode nm) {
if (LOG.isDebugEnabled()) {
LOG.debug("nodeUpdate: " + nm + " clusterResources: " + clusterResource);
}
-
- FiCaSchedulerNode node = getNode(nm.getNodeID());
+ FiCaSchedulerNode node = getNode(nm.getNodeID());
+ List<UpdatedContainerInfo> containerInfoList = nm.pullContainerUpdates();
+ List<ContainerStatus> newlyLaunchedContainers = new
ArrayList<ContainerStatus>();
+ List<ContainerStatus> completedContainers = new
ArrayList<ContainerStatus>();
+ for(UpdatedContainerInfo containerInfo : containerInfoList) {
+
newlyLaunchedContainers.addAll(containerInfo.getNewlyLaunchedContainers());
+ completedContainers.addAll(containerInfo.getCompletedContainers());
+ }
+
{code}
Note than this problem (if it is a problem) exists regardless of this change
because a container may start and complete within the NM heartbeat interval.
However, chances of hitting it are low before this change because the heartbeat
interval is short and so the RM never see a node update in which the same
container both launches and completes. After this change, with a slow RM, this
can easily happen, specially because we are simply concatenating both sub-lists.
> Each NM heartbeat should not generate an event for the Scheduler
> ----------------------------------------------------------------
>
> Key: YARN-365
> URL: https://issues.apache.org/jira/browse/YARN-365
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager, scheduler
> Affects Versions: 0.23.5
> Reporter: Siddharth Seth
> Assignee: Xuan Gong
> Fix For: 2.0.4-beta
>
> Attachments: Prototype2.txt, Prototype3.txt, YARN-365.10.patch,
> YARN-365.1.patch, YARN-365.2.patch, YARN-365.3.patch, YARN-365.4.patch,
> YARN-365.5.patch, YARN-365.6.patch, YARN-365.7.patch, YARN-365.8.patch,
> YARN-365.9.patch
>
>
> Follow up from YARN-275
> https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira