[ 
https://issues.apache.org/jira/browse/YARN-10825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17367223#comment-17367223
 ] 

D M Murali Krishna Reddy commented on YARN-10825:
-------------------------------------------------

As per my analysis when yarn.nodemanager.recovery.supervised is enabled, I 
could see that for Mapreduce jobs, once the NM is shutdown, after some time RM 
assumes the Node lost and then with UPDATED_NODES_TRANSITION, AM removes all 
the taskAttempts of the containers launched on Lost node and launches the next 
taskattempt. Once the old containers sends *status update*, the AM assumes it 
as illegal task and returns feedback with taskFound as false in 
TaskAttemptListenerImpl. In Task.java container gets killed by itself.

 

But in yarn services I couldnt find any communication directly from container 
to AM like *status update* in MR jobs.  So, I think the AM is not able to 
communicate to container directly to get the container killed. I think the only 
communication is from AM to RM and then from RM to NM to container, which is 
not possible as the NM itself is down.

 

[~billie], [~eyang], [~prabhujoseph]  Can you have look over this issue.

 

Thanks!

> Yarn Service containers not getting killed after NM shutdown
> ------------------------------------------------------------
>
>                 Key: YARN-10825
>                 URL: https://issues.apache.org/jira/browse/YARN-10825
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 3.1.1
>            Reporter: Sushanta Sen
>            Assignee: D M Murali Krishna Reddy
>            Priority: Major
>
> When yarn.nodemanager.recovery.supervised is enabled and NM is shutdown, the 
> new containers are getting launched after the RM sends the node lost event to 
> AM, but the existing containers on the lost node are not getting killed. The 
> issue has occurred only for yarn service. For Normal jobs the behavior is 
> working fine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to