Rohith commented on YARN-3194:

bq. it's removing the old node and adding the newly connected node. RM is also 
not restarted. 
{{RMNodeImpl#ReconnectNodeTransition#.transition}} does not remove old node if 
any applications are running. In the below code, if noRunningApps is false then 
Node is not removed. Instead just handling running applications.
public void transition(RMNodeImpl rmNode, RMNodeEvent event) {
      RMNodeReconnectEvent reconnectEvent = (RMNodeReconnectEvent) event;
      RMNode newNode = reconnectEvent.getReconnectedNode();
      rmNode.nodeManagerVersion = newNode.getNodeManagerVersion();
      List<ApplicationId> runningApps = reconnectEvent.getRunningApplications();
      boolean noRunningApps = 
          (runningApps == null) || (runningApps.size() == 0);
      // No application running on the node, so send node-removal event with 
      // cleaning up old container info.
      if (noRunningApps) {
        // Remove the node from scheduler
        // Add node to the scheduler
      } else {
        rmNode.httpPort = newNode.getHttpPort();
        rmNode.httpAddress = newNode.getHttpAddress();
        rmNode.totalCapability = newNode.getTotalCapability();
        // Reset heartbeat ID since node just restarted.

      // Handles running app on this node
    // resource update to schedule code

> After NM restart,completed containers are not released which are sent during 
> NM registration
> --------------------------------------------------------------------------------------------
>                 Key: YARN-3194
>                 URL: https://issues.apache.org/jira/browse/YARN-3194
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>         Environment: NM restart is enabled
>            Reporter: Rohith
>            Assignee: Rohith
> On NM restart ,NM sends all the outstanding NMContainerStatus to RM. But RM 
> process only ContainerState.RUNNING. If container is completed when NM was 
> down then those containers resources wont be release which result in 
> applications to hang.

This message was sent by Atlassian JIRA

Reply via email to