[
https://issues.apache.org/jira/browse/YARN-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sunil G updated YARN-4890:
--------------------------
Attachment: 0001-YARN-4890.patch
Mostly I agree with [~bibinchundatt] and thanks Bibin for the pointer. I
started analyzing this issue after I met with this failure in YARN-4934.
I am able to reproduce this issue with debug points and I think attached fix
will resolve the pblm
Analysis:
- {{waitSchedulerNodeJoined}} is depending upon {{nodeTracker}} count. This
will be updated as first step when a new node is added to CS via {{addNode}}
call. (from {{NODE_ADDED}} event handling)
- After updating node to {{nodeTracker}} , new node change information is
updated to LabelManager with {{labelManager.activateNode()}} call. This
internally invokes {{updateResourceMappings}} method and it tries to update
Scheduler with {{NODE_LABELS_UPDATE}} event.
- In this test case since node is added earlier to {{nodeTracker}}, there are
chances that the test case will resume and continue check for capacity metrics
check. But many a time, its possible that Labels are not updated to
{{SchedulerNode}} via {{NODE_LABELS_UPDATE}}.
- {{updateLabelsOnNode}} is updating labels to {{FiCaSchedulerNode}}. So
ideally its better this test case can check whether the intended label is added
to Node also.
I have updated a patch for same with this improvement.
[~bibinchundatt]/[~leftnoteasy] Thoughts?
> Unit test intermittent failure:
> TestNodeLabelContainerAllocation#testQueueUsedCapacitiesUpdate
> ----------------------------------------------------------------------------------------------
>
> Key: YARN-4890
> URL: https://issues.apache.org/jira/browse/YARN-4890
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Wangda Tan
> Attachments: 0001-YARN-4890.patch
>
>
> Message:
> {code}
> Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 314.062 sec
> <<< FAILURE! - in
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
> testQueueUsedCapacitiesUpdate(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation)
> Time elapsed: 12.426 sec <<< FAILURE!
> java.lang.AssertionError: expected:<0.3> but was:<0.6>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:519)
> at org.junit.Assert.assertEquals(Assert.java:609)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation.checkQueueUsedCapacity(TestNodeLabelContainerAllocation.java:1163)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation.testQueueUsedCapacitiesUpdate(TestNodeLabelContainerAllocation.java:1382)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)