[
https://issues.apache.org/jira/browse/YARN-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974876#comment-15974876
]
Jason Lowe commented on YARN-6272:
----------------------------------
I've also seen this stacktrace on 2.8:
{noformat}
java.lang.AssertionError: expected:<1> but was:<2>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at
org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.doContainerResourceChange(TestAMRMClient.java:920)
at
org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientWithContainerResourceChange(TestAMRMClient.java:813)
{noformat}
In the above case, it looks like the nodemanager happened to be heartbeating
just as the app made the allocate call that asked for the increase request. In
that case it was able to process both the increase and the decrease in the same
heartbeat which the test explicitly does not expect.
The test itself is very fragile. It's launching a full minicluster and uses
hardcoded sleeps sprinkled in various places hoping asynchronous events have
processed in the interim. That not only directly leads to flaky tests but
slows down the unit test unnecessarily. Either the test needs to be made more
tolerant of all the asynchronous stuff going on or ditch the minicluster and
explicitly manage the cluster heartbeating. The former can be done by having
the test poll via app alloc heartbeats until it gets all the responses it needs
rather than assume which heartbeats will get which responses. The latter can
be done by using MockRM, MockNM, and drain dispatchers so the test knows
exactly which heartbeats have been completely processed and thus know which app
alloc calls will get the appropriate responses. This latter approach would
also eliminate the need for any arbitrary polling/sleeping intervals and speed
up the test significantly.
> TestAMRMClient#testAMRMClientWithContainerResourceChange fails intermittently
> -----------------------------------------------------------------------------
>
> Key: YARN-6272
> URL: https://issues.apache.org/jira/browse/YARN-6272
> Project: Hadoop YARN
> Issue Type: Test
> Components: yarn
> Affects Versions: 3.0.0-alpha3
> Reporter: Ray Chiang
>
> I'm seeing this unit test fail fairly often in trunk:
> testAMRMClientWithContainerResourceChange(org.apache.hadoop.yarn.client.api.impl.TestAMRMClient)
> Time elapsed: 5.113 sec <<< FAILURE!
> java.lang.AssertionError: expected:<1> but was:<0>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at org.junit.Assert.assertEquals(Assert.java:542)
> at
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.doContainerResourceChange(TestAMRMClient.java:1087)
> at
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientWithContainerResourceChange(TestAMRMClient.java:963)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]