[
https://issues.apache.org/jira/browse/YARN-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14580209#comment-14580209
]
zhihai xu commented on YARN-3790:
---------------------------------
Hi [~rohithsharma], thanks for reporting this issue. I think this test fails
intermittently.
The following is stack trace for the test failure:
{code}
java.lang.AssertionError: expected:<6144> but was:<8192>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.assertMetrics(TestWorkPreservingRMRestart.java:852)
at
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkFSQueue(TestWorkPreservingRMRestart.java:341)
at
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testSchedulerRecovery(TestWorkPreservingRMRestart.java:240)
{code}
The failure is {{rootMetrics}}'s available resource is not correct for
FairScheduler.
I know what cause this test failure.
For FairScheduler, {{updateRootQueueMetrics}} is used to update
{{rootMetrics}}'s available resource.
But {{updateRootQueueMetrics}} is not called in/after
{{recoverContainersOnNode}}, in this case, we can only depend UpdateThread to
update {{rootMetrics}}'s available resource. Currently UpdateThread will be
triggered in {{addNode}}. The timing in UpdateThread will decide whether this
test will succeed or not. If UpdateThread call {{update}} after
{{recoverContainersOnNode}}, the test will succeed. If UpdateThread call
{{update}} before {{recoverContainersOnNode}}, the test will fail.
> TestWorkPreservingRMRestart#testSchedulerRecovery fails in trunk for FS
> scheduler
> ---------------------------------------------------------------------------------
>
> Key: YARN-3790
> URL: https://issues.apache.org/jira/browse/YARN-3790
> Project: Hadoop YARN
> Issue Type: Bug
> Components: test
> Reporter: Rohith
> Assignee: zhihai xu
>
> Failure trace is as follows
> {noformat}
> Tests run: 28, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 284.078 sec
> <<< FAILURE! - in
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
> testSchedulerRecovery[1](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart)
> Time elapsed: 6.502 sec <<< FAILURE!
> java.lang.AssertionError: expected:<6144> but was:<8192>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at org.junit.Assert.assertEquals(Assert.java:542)
> at
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.assertMetrics(TestWorkPreservingRMRestart.java:853)
> at
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkFSQueue(TestWorkPreservingRMRestart.java:342)
> at
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testSchedulerRecovery(TestWorkPreservingRMRestart.java:241)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)