[ 
https://issues.apache.org/jira/browse/YARN-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974876#comment-15974876
 ] 

Jason Lowe commented on YARN-6272:
----------------------------------

I've also seen this stacktrace on 2.8:
{noformat}
java.lang.AssertionError: expected:<1> but was:<2>
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.failNotEquals(Assert.java:743)
        at org.junit.Assert.assertEquals(Assert.java:118)
        at org.junit.Assert.assertEquals(Assert.java:555)
        at org.junit.Assert.assertEquals(Assert.java:542)
        at 
org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.doContainerResourceChange(TestAMRMClient.java:920)
        at 
org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientWithContainerResourceChange(TestAMRMClient.java:813)
{noformat}

In the above case, it looks like the nodemanager happened to be heartbeating 
just as the app made the allocate call that asked for the increase request.  In 
that case it was able to process both the increase and the decrease in the same 
heartbeat which the test explicitly does not expect.

The test itself is very fragile.  It's launching a full minicluster and uses 
hardcoded sleeps sprinkled in various places hoping asynchronous events have 
processed in the interim.  That not only directly leads to flaky tests but 
slows down the unit test unnecessarily.  Either the test needs to be made more 
tolerant of all the asynchronous stuff going on or ditch the minicluster and 
explicitly manage the cluster heartbeating.  The former can be done by having 
the test poll via app alloc heartbeats until it gets all the responses it needs 
rather than assume which heartbeats will get which responses.  The latter can 
be done by using MockRM, MockNM, and drain dispatchers so the test knows 
exactly which heartbeats have been completely processed and thus know which app 
alloc calls will get the appropriate responses.  This latter approach would 
also eliminate the need for any arbitrary polling/sleeping intervals and speed 
up the test significantly.


> TestAMRMClient#testAMRMClientWithContainerResourceChange fails intermittently
> -----------------------------------------------------------------------------
>
>                 Key: YARN-6272
>                 URL: https://issues.apache.org/jira/browse/YARN-6272
>             Project: Hadoop YARN
>          Issue Type: Test
>          Components: yarn
>    Affects Versions: 3.0.0-alpha3
>            Reporter: Ray Chiang
>
> I'm seeing this unit test fail fairly often in trunk:
> testAMRMClientWithContainerResourceChange(org.apache.hadoop.yarn.client.api.impl.TestAMRMClient)
>   Time elapsed: 5.113 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<1> but was:<0>
>         at org.junit.Assert.fail(Assert.java:88)
>         at org.junit.Assert.failNotEquals(Assert.java:743)
>         at org.junit.Assert.assertEquals(Assert.java:118)
>         at org.junit.Assert.assertEquals(Assert.java:555)
>         at org.junit.Assert.assertEquals(Assert.java:542)
>         at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.doContainerResourceChange(TestAMRMClient.java:1087)
>         at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientWithContainerResourceChange(TestAMRMClient.java:963)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to