[
https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376640#comment-14376640
]
Karthik Kambatla commented on YARN-3387:
----------------------------------------
Does this imply our work-preserving AM restart is broken on a RM failover?
> container complete message couldn't pass to am if am restarted and rm changed
> -----------------------------------------------------------------------------
>
> Key: YARN-3387
> URL: https://issues.apache.org/jira/browse/YARN-3387
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.6.0
> Reporter: sandflee
> Priority: Critical
>
> suppose am work preserving and rm ha is enabled.
> container complete message is passed to appattemt.justFinishedContainers in
> rm。in normal situation,all attempt in one app shares the same
> justFinishedContainers, but when rm changed, every attempt has it's own
> justFinishedContainers, so in situations below, container complete message
> couldn't passed to am:
> 1, am restart
> 2, rm changes
> 3, container launched by first am completes
> container complete message will be passed to appAttempt1 not appAttempt2, but
> am pull finished containers from appAttempt2 (currentAppAttempt)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)