Prabhu Joseph created YARN-11494:
------------------------------------

             Summary: Acquired Containers are killed when the node is 
reconnected
                 Key: YARN-11494
                 URL: https://issues.apache.org/jira/browse/YARN-11494
             Project: Hadoop YARN
          Issue Type: Bug
          Components: resourcemanager
    Affects Versions: 3.3.3
            Reporter: Prabhu Joseph
            Assignee: Prabhu Joseph


When a nodemanager is reconnected, resourcemanager marks the acquired 
containers on that node as LOST and which leads to job failure.

{code}
2023-04-10 02:57:16,412 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService (IPC 
Server handler 41 on 8025): Reconnect from the node at: node1
2023-04-10 02:57:16,412 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService (IPC 
Server handler 41 on 8025): NodeManager from node node1(cmPort: 8041 httpPort: 
8042) registered with capability: <memory:122880, vCores:16>, assigned nodeId 
node1:8041, node labels { CORE } 
2023-04-10 02:57:16,413 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl 
(ResourceManager Event Processor): container_e15_1677844874019_238016_01_000002 
Container Transitioned from ACQUIRED to KILLED
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to