[ 
https://issues.apache.org/jira/browse/YARN-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826350#comment-16826350
 ] 

Prabhu Joseph commented on YARN-6272:
-------------------------------------

Thanks [~giovanni.fumarola] for reviewing.

The testcase heartbeats once with three NMs and expects the increase allocation 
will happen immediately. But it won't in cases where the allocation happens on 
some other NM. The allocation has to happen on the same NM as of actual 
container for which increase resource requested until then the request will be 
added back and will be processed only on subsequent node update.

Heartbeating with only the NM where the container was allocated initially will 
not require any sleep. But MiniYarnCluster sends node update for all NMs thus 
the allocation will be random out of three NMs and so the testcase requires 
wait and retry till the container allocated on right NM out of three.

The fix heartbeats with only the right NM. This will increase the possibility 
(even though MiniYarnCluster does nodeUpdate for all) and does wait and retry 
till the new increased container gets allocated on same NM. And validated the 
fix with multiple 500 iterations and did not face test failure. Without the 
fix, the testcase consistently fails within 50 iterations.

The other way is to use MockRM and MockNM (as per above Jason's comment), have 
tried this and felt lot of changes. Let me know if this is not convincing, will 
test it with MockRM and MockNM .


> TestAMRMClient#testAMRMClientWithContainerResourceChange fails intermittently
> -----------------------------------------------------------------------------
>
>                 Key: YARN-6272
>                 URL: https://issues.apache.org/jira/browse/YARN-6272
>             Project: Hadoop YARN
>          Issue Type: Test
>          Components: yarn
>    Affects Versions: 3.0.0-alpha4
>            Reporter: Ray Chiang
>            Assignee: Prabhu Joseph
>            Priority: Major
>         Attachments: YARN-6272-001.patch
>
>
> I'm seeing this unit test fail fairly often in trunk:
> testAMRMClientWithContainerResourceChange(org.apache.hadoop.yarn.client.api.impl.TestAMRMClient)
>   Time elapsed: 5.113 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<1> but was:<0>
>         at org.junit.Assert.fail(Assert.java:88)
>         at org.junit.Assert.failNotEquals(Assert.java:743)
>         at org.junit.Assert.assertEquals(Assert.java:118)
>         at org.junit.Assert.assertEquals(Assert.java:555)
>         at org.junit.Assert.assertEquals(Assert.java:542)
>         at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.doContainerResourceChange(TestAMRMClient.java:1087)
>         at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientWithContainerResourceChange(TestAMRMClient.java:963)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to