[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed

sandflee (JIRA) Wed, 22 Apr 2015 08:29:50 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507255#comment-14507255
 ]


sandflee commented on YARN-3387:
--------------------------------

It seems a bug in LaunchAM in MockRM.java, in LaunchAM:
1, wait App becomes ACCEPTED, after this appAttempt is created
2, node Heart beat 
3, wait appAttempt becomes ALLOCATED

If nodeHeartBeat is handled before appAttempt becomes SCHEDULED, appAttempt 
State will never comes to ALLOCATED if no other nm heartbeat comes.
just as the failed case 
https://builds.apache.org/job/PreCommit-YARN-Build/7410//testReport/org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager/TestAMRestart/testShouldNotCountFailureToMaxAttemptRetry/
https://builds.apache.org/job/PreCommit-YARN-Build/7410//testReport/org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager/TestAMRestart/testPreemptedAMRestartOnRMRestart/

> container complete message couldn't pass to am if am restarted and rm changed
> -----------------------------------------------------------------------------
>
>                 Key: YARN-3387
>                 URL: https://issues.apache.org/jira/browse/YARN-3387
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: sandflee
>            Priority: Critical
>              Labels: patch
>         Attachments: YARN-3387.001.patch, YARN-3387.002.patch
>
>
> suppose am work preserving and rm ha is enabled.
> container complete message is passed to appattemt.justFinishedContainers in 
> rm。in normal situation，all attempt in one app shares the same 
> justFinishedContainers, but when rm changed, every attempt has it's own 
> justFinishedContainers, so in situations below, container complete message 
> couldn't passed to am:
> 1, am restart
> 2, rm changes
> 3, container launched by first am completes
> container complete message will be passed to appAttempt1 not appAttempt2, but 
> am pull finished containers from appAttempt2 (currentAppAttempt)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed

Reply via email to