[ 
https://issues.apache.org/jira/browse/YARN-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15406362#comment-15406362
 ] 

Jason Lowe commented on YARN-5451:
----------------------------------

Note that we can get localizers to stop today (e.g.: when the corresponding 
container being localized is killed), but only if the localizer is 
well-behaved.  So even with large sized resources as long as the localizer 
heartbeat thread is still heartbeating to the NM and can respond properly to 
heartbeat commands things end up working out OK.  It becomes a problem when the 
localizer _doesn't_ behave properly and needs external intervention.  I don't 
think that's common in practice, but it does happen sometimes.

> Container localizers that hang are not cleaned up
> -------------------------------------------------
>
>                 Key: YARN-5451
>                 URL: https://issues.apache.org/jira/browse/YARN-5451
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.6.0
>            Reporter: Jason Lowe
>
> I ran across an old, rogue process on one of our nodes.  It apparently was a 
> container localizer that somehow entered an infinite loop during startup.  
> The NM never cleaned up this broken localizer, so it happily ran forever.  
> The NM needs to do a better job of tracking localizers, including killing 
> them if they appear to be hung/broken.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to