[
https://issues.apache.org/jira/browse/YARN-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14963572#comment-14963572
]
Junping Du commented on YARN-4277:
----------------------------------
I don't think this is the duplication of YARN-4273 as that one is about the
leak of application (with step to kill application from users). This scenario
should be well addressed as NM should recover and report the running containers
to RM no matter if it has failover or not.
[~sandflee], do we see this is a real problem in the cluster? If so, would you
put on more details, like: some exceptions in nm/rm log for us to understand
what's going wrong there? Thanks!
CC to [~jlowe].
> containers would be leaked if nm crashed and rm failover
> ---------------------------------------------------------
>
> Key: YARN-4277
> URL: https://issues.apache.org/jira/browse/YARN-4277
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: sandflee
>
> nm restart and rm ha is enabled.
> 1, nm crashed, after timeout, rm send container complete msg to
> corresponding AM.
> 2, rm failovers
> 3, nm restart and register to RM , recovering containers running on NM, these
> containers and leaked.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)