[ 
https://issues.apache.org/jira/browse/YARN-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14580209#comment-14580209
 ] 

zhihai xu commented on YARN-3790:
---------------------------------

Hi [~rohithsharma], thanks for reporting this issue. I think this test fails 
intermittently.
The following is stack trace for the test failure:
{code}
java.lang.AssertionError: expected:<6144> but was:<8192>
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.failNotEquals(Assert.java:743)
        at org.junit.Assert.assertEquals(Assert.java:118)
        at org.junit.Assert.assertEquals(Assert.java:555)
        at org.junit.Assert.assertEquals(Assert.java:542)
        at 
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.assertMetrics(TestWorkPreservingRMRestart.java:852)
        at 
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkFSQueue(TestWorkPreservingRMRestart.java:341)
        at 
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testSchedulerRecovery(TestWorkPreservingRMRestart.java:240)
{code}
The failure is {{rootMetrics}}'s available resource is not correct for 
FairScheduler.
I know what cause this test failure.
For FairScheduler, {{updateRootQueueMetrics}} is used to update 
{{rootMetrics}}'s available resource.
But {{updateRootQueueMetrics}} is not called in/after 
{{recoverContainersOnNode}}, in this case, we can only depend UpdateThread to 
update {{rootMetrics}}'s available resource. Currently UpdateThread will be 
triggered in {{addNode}}. The timing in UpdateThread will decide whether this 
test will succeed or not. If UpdateThread call {{update}} after 
{{recoverContainersOnNode}}, the test will succeed. If UpdateThread call 
{{update}} before {{recoverContainersOnNode}}, the test will fail.

> TestWorkPreservingRMRestart#testSchedulerRecovery fails in trunk for FS 
> scheduler
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-3790
>                 URL: https://issues.apache.org/jira/browse/YARN-3790
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: test
>            Reporter: Rohith
>            Assignee: zhihai xu
>
> Failure trace is as follows
> {noformat}
> Tests run: 28, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 284.078 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
> testSchedulerRecovery[1](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart)
>   Time elapsed: 6.502 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<6144> but was:<8192>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:743)
>       at org.junit.Assert.assertEquals(Assert.java:118)
>       at org.junit.Assert.assertEquals(Assert.java:555)
>       at org.junit.Assert.assertEquals(Assert.java:542)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.assertMetrics(TestWorkPreservingRMRestart.java:853)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkFSQueue(TestWorkPreservingRMRestart.java:342)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testSchedulerRecovery(TestWorkPreservingRMRestart.java:241)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to