[jira] [Commented] (YARN-5416) TestRMRestart#testRMRestartWaitForPreviousAMToFinish failed intermittently due to not wait SchedulerApplicationAttempt to be stopped

Jason Lowe (JIRA) Wed, 27 Jul 2016 12:59:45 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396249#comment-15396249
 ]


Jason Lowe commented on YARN-5416:
----------------------------------

bq. I think we can close this as dup of that. What do you think?

I don't care much if we want to close this one for that one or vice-versa, just 
that we shouldn't keep both open.  Since this is the one that has a patch, I'll 
go ahead and comment on the patch here as Eric has also done.

bq. seems only necessary to wait before launch another AM immediately

I agree with Eric that it looks like another place was missed in the test.  
IIUC we launch AM1 then wait for it to enter the FAILED state then launch AM2.  
This patch changes that to do a more thorough wait before trying to launch AM2. 
 However later in the same test we wait for the second AM to fail and launch a 
third attempt, which looks like the same case we're trying to fix -- waiting 
for a previous AM to fully stop before immediately launching another attempt:
{code}
    rm2.waitForState(am2.getApplicationAttemptId(), RMAppAttemptState.FAILED);
    launchAM(rmApp, rm2, nm1);
   Assert.assertEquals(3, rmApp.getAppAttempts().size());
 {code}

> TestRMRestart#testRMRestartWaitForPreviousAMToFinish failed intermittently 
> due to not wait SchedulerApplicationAttempt to be stopped
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-5416
>                 URL: https://issues.apache.org/jira/browse/YARN-5416
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: test, yarn
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Minor
>         Attachments: YARN-5416.patch
>
>
> The test failure stack is:
> Running org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
> Tests run: 54, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 385.338 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
> testRMRestartWaitForPreviousAMToFinish[0](org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
>   Time elapsed: 43.134 sec  <<< FAILURE!
> java.lang.AssertionError: AppAttempt state is not correct (timedout) 
> expected:<ALLOCATED> but was:<SCHEDULED>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:743)
>       at org.junit.Assert.assertEquals(Assert.java:118)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:86)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:594)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.launchAM(TestRMRestart.java:1008)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:530)
> This is due to the same issue that partially fixed in YARN-4968



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-5416) TestRMRestart#testRMRestartWaitForPreviousAMToFinish failed intermittently due to not wait SchedulerApplicationAttempt to be stopped

Reply via email to