[
https://issues.apache.org/jira/browse/YARN-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15387192#comment-15387192
]
Jun Gong commented on YARN-5043:
--------------------------------
The whole process is as following: app attempt's status becomes FAILED =>
RMAppAttempt sends 'FAILED' event to RMApp => RMApp handles this event, and
sends REMOVE event to RMStateStore if there are attempts need be removed =>
RMStateStore handles REMOVE event, and removes the attempt. It does not mean
attempt has been removed from RM state store once attempt's status becomes
FAILED, we have to wait the events are handled, so I add
{{waitForEventsProcessed}} before checking attempts' number to wait for the
events to be handled.
We could reproduce the test case error by adding some {{Thread.sleep}} in the
event handling code to delay handling events.
After adding {{waitForEventsProcessed}}, we could avoid unnecessary sleeps.
> TestAMRestart.testRMAppAttemptFailuresValidityInterval random fail
> ------------------------------------------------------------------
>
> Key: YARN-5043
> URL: https://issues.apache.org/jira/browse/YARN-5043
> Project: Hadoop YARN
> Issue Type: Test
> Reporter: sandflee
> Assignee: Jun Gong
> Attachments: TestAMRestart-output.txt, YARN-5043.01.patch
>
>
> {noformat}
> Test set:
> org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
> -------------------------------------------------------------------------------
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 31.558 sec
> <<< FAILURE! - in
> org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
> testRMAppAttemptFailuresValidityInterval(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart)
> Time elapsed: 31.509 sec <<< FAILURE!
> java.lang.AssertionError: expected:<2> but was:<3>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at org.junit.Assert.assertEquals(Assert.java:542)
> at
> org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testRMAppAttemptFailuresValidityInterval(TestAMRestart.java:913)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]