[
https://issues.apache.org/jira/browse/YARN-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826350#comment-16826350
]
Prabhu Joseph commented on YARN-6272:
-------------------------------------
Thanks [~giovanni.fumarola] for reviewing.
The testcase heartbeats once with three NMs and expects the increase allocation
will happen immediately. But it won't in cases where the allocation happens on
some other NM. The allocation has to happen on the same NM as of actual
container for which increase resource requested until then the request will be
added back and will be processed only on subsequent node update.
Heartbeating with only the NM where the container was allocated initially will
not require any sleep. But MiniYarnCluster sends node update for all NMs thus
the allocation will be random out of three NMs and so the testcase requires
wait and retry till the container allocated on right NM out of three.
The fix heartbeats with only the right NM. This will increase the possibility
(even though MiniYarnCluster does nodeUpdate for all) and does wait and retry
till the new increased container gets allocated on same NM. And validated the
fix with multiple 500 iterations and did not face test failure. Without the
fix, the testcase consistently fails within 50 iterations.
The other way is to use MockRM and MockNM (as per above Jason's comment), have
tried this and felt lot of changes. Let me know if this is not convincing, will
test it with MockRM and MockNM .
> TestAMRMClient#testAMRMClientWithContainerResourceChange fails intermittently
> -----------------------------------------------------------------------------
>
> Key: YARN-6272
> URL: https://issues.apache.org/jira/browse/YARN-6272
> Project: Hadoop YARN
> Issue Type: Test
> Components: yarn
> Affects Versions: 3.0.0-alpha4
> Reporter: Ray Chiang
> Assignee: Prabhu Joseph
> Priority: Major
> Attachments: YARN-6272-001.patch
>
>
> I'm seeing this unit test fail fairly often in trunk:
> testAMRMClientWithContainerResourceChange(org.apache.hadoop.yarn.client.api.impl.TestAMRMClient)
> Time elapsed: 5.113 sec <<< FAILURE!
> java.lang.AssertionError: expected:<1> but was:<0>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at org.junit.Assert.assertEquals(Assert.java:542)
> at
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.doContainerResourceChange(TestAMRMClient.java:1087)
> at
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientWithContainerResourceChange(TestAMRMClient.java:963)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]