[ 
https://issues.apache.org/jira/browse/YARN-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved YARN-2510.
------------------------------
    Resolution: Invalid

My apologies, this is an invalid report.  I accidentally grabbed the wrong 
container ID when searching the RM log because after looking again I don't see 
the RM seeing the container completion event.  The 9 missing completion events 
on the AM were all from the same node, so I think this is a case of a poorly 
handled node failure that lead to a MapReduce app hang.

I'll file a separate JIRA to track handling that case better.  That's probably 
is a MapReduce fix since the RM can't tell the container is no longer needed 
unless either the NM reports it completing (which it failed to do in this case 
due to a bad node) or the AM explicitly releases the container.

> RM can drop container completion events
> ---------------------------------------
>
>                 Key: YARN-2510
>                 URL: https://issues.apache.org/jira/browse/YARN-2510
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.5.0
>            Reporter: Jason Lowe
>            Priority: Critical
>
> The RM can drop container completion events and fail to report them to the 
> AM.  Details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to