[ https://issues.apache.org/jira/browse/YARN-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14580209#comment-14580209 ]
zhihai xu commented on YARN-3790: --------------------------------- Hi [~rohithsharma], thanks for reporting this issue. I think this test fails intermittently. The following is stack trace for the test failure: {code} java.lang.AssertionError: expected:<6144> but was:<8192> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.assertMetrics(TestWorkPreservingRMRestart.java:852) at org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkFSQueue(TestWorkPreservingRMRestart.java:341) at org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testSchedulerRecovery(TestWorkPreservingRMRestart.java:240) {code} The failure is {{rootMetrics}}'s available resource is not correct for FairScheduler. I know what cause this test failure. For FairScheduler, {{updateRootQueueMetrics}} is used to update {{rootMetrics}}'s available resource. But {{updateRootQueueMetrics}} is not called in/after {{recoverContainersOnNode}}, in this case, we can only depend UpdateThread to update {{rootMetrics}}'s available resource. Currently UpdateThread will be triggered in {{addNode}}. The timing in UpdateThread will decide whether this test will succeed or not. If UpdateThread call {{update}} after {{recoverContainersOnNode}}, the test will succeed. If UpdateThread call {{update}} before {{recoverContainersOnNode}}, the test will fail. > TestWorkPreservingRMRestart#testSchedulerRecovery fails in trunk for FS > scheduler > --------------------------------------------------------------------------------- > > Key: YARN-3790 > URL: https://issues.apache.org/jira/browse/YARN-3790 > Project: Hadoop YARN > Issue Type: Bug > Components: test > Reporter: Rohith > Assignee: zhihai xu > > Failure trace is as follows > {noformat} > Tests run: 28, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 284.078 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart > testSchedulerRecovery[1](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart) > Time elapsed: 6.502 sec <<< FAILURE! > java.lang.AssertionError: expected:<6144> but was:<8192> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.assertMetrics(TestWorkPreservingRMRestart.java:853) > at > org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkFSQueue(TestWorkPreservingRMRestart.java:342) > at > org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testSchedulerRecovery(TestWorkPreservingRMRestart.java:241) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)