[
https://issues.apache.org/jira/browse/YARN-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Lowe resolved YARN-2510.
------------------------------
Resolution: Invalid
My apologies, this is an invalid report. I accidentally grabbed the wrong
container ID when searching the RM log because after looking again I don't see
the RM seeing the container completion event. The 9 missing completion events
on the AM were all from the same node, so I think this is a case of a poorly
handled node failure that lead to a MapReduce app hang.
I'll file a separate JIRA to track handling that case better. That's probably
is a MapReduce fix since the RM can't tell the container is no longer needed
unless either the NM reports it completing (which it failed to do in this case
due to a bad node) or the AM explicitly releases the container.
> RM can drop container completion events
> ---------------------------------------
>
> Key: YARN-2510
> URL: https://issues.apache.org/jira/browse/YARN-2510
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.5.0
> Reporter: Jason Lowe
> Priority: Critical
>
> The RM can drop container completion events and fail to report them to the
> AM. Details in the first comment.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)