[
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627594#comment-14627594
]
Rohith Sharma K S commented on YARN-3535:
-----------------------------------------
{code}
for (ApplicationId appId : reconnectEvent.getRunningApplications()) {
handleRunningAppOnNode(rmNode, rmNode.context, appId, rmNode.nodeId);
}
{code}
IIUC, This code will update RMApp about node details so that RMApp get to know
that its some containers has run on this node. And this part of code does not
kill the existing running containers. Running containers are killed when the
NodeRemoved event is triggered to schedulers, and this event will be triggered
by RMNodeImpl#Reconnected transition if noAppsRunning.
> ResourceRequest should be restored back to scheduler when RMContainer is
> killed at ALLOCATED
> ---------------------------------------------------------------------------------------------
>
> Key: YARN-3535
> URL: https://issues.apache.org/jira/browse/YARN-3535
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.6.0
> Reporter: Peng Zhang
> Assignee: Peng Zhang
> Priority: Critical
> Attachments: 0003-YARN-3535.patch, YARN-3535-001.patch,
> YARN-3535-002.patch, syslog.tgz, yarn-app.log
>
>
> During rolling update of NM, AM start of container on NM failed.
> And then job hang there.
> Attach AM logs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)