[ https://issues.apache.org/jira/browse/YARN-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974876#comment-15974876 ]
Jason Lowe commented on YARN-6272: ---------------------------------- I've also seen this stacktrace on 2.8: {noformat} java.lang.AssertionError: expected:<1> but was:<2> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.doContainerResourceChange(TestAMRMClient.java:920) at org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientWithContainerResourceChange(TestAMRMClient.java:813) {noformat} In the above case, it looks like the nodemanager happened to be heartbeating just as the app made the allocate call that asked for the increase request. In that case it was able to process both the increase and the decrease in the same heartbeat which the test explicitly does not expect. The test itself is very fragile. It's launching a full minicluster and uses hardcoded sleeps sprinkled in various places hoping asynchronous events have processed in the interim. That not only directly leads to flaky tests but slows down the unit test unnecessarily. Either the test needs to be made more tolerant of all the asynchronous stuff going on or ditch the minicluster and explicitly manage the cluster heartbeating. The former can be done by having the test poll via app alloc heartbeats until it gets all the responses it needs rather than assume which heartbeats will get which responses. The latter can be done by using MockRM, MockNM, and drain dispatchers so the test knows exactly which heartbeats have been completely processed and thus know which app alloc calls will get the appropriate responses. This latter approach would also eliminate the need for any arbitrary polling/sleeping intervals and speed up the test significantly. > TestAMRMClient#testAMRMClientWithContainerResourceChange fails intermittently > ----------------------------------------------------------------------------- > > Key: YARN-6272 > URL: https://issues.apache.org/jira/browse/YARN-6272 > Project: Hadoop YARN > Issue Type: Test > Components: yarn > Affects Versions: 3.0.0-alpha3 > Reporter: Ray Chiang > > I'm seeing this unit test fail fairly often in trunk: > testAMRMClientWithContainerResourceChange(org.apache.hadoop.yarn.client.api.impl.TestAMRMClient) > Time elapsed: 5.113 sec <<< FAILURE! > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.doContainerResourceChange(TestAMRMClient.java:1087) > at > org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientWithContainerResourceChange(TestAMRMClient.java:963) -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org