[ 
https://issues.apache.org/jira/browse/YARN-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4890:
--------------------------
    Attachment: 0001-YARN-4890.patch

Mostly I agree with [~bibinchundatt] and thanks Bibin for the pointer. I 
started analyzing this issue after I met with this failure in YARN-4934.
I am able to reproduce this issue with debug points and I think attached fix 
will resolve the pblm

Analysis:
- {{waitSchedulerNodeJoined}} is depending upon {{nodeTracker}} count. This 
will be updated as first step when a new node is added to CS via {{addNode}} 
call. (from {{NODE_ADDED}} event handling)
- After updating node to {{nodeTracker}} , new node change information is 
updated to LabelManager with {{labelManager.activateNode()}} call. This 
internally invokes {{updateResourceMappings}} method and it tries to update 
Scheduler with {{NODE_LABELS_UPDATE}} event.
- In this test case since  node is added earlier to {{nodeTracker}}, there are 
chances that the test case will resume and continue check for capacity metrics 
check. But many a time, its possible that Labels are not updated to 
{{SchedulerNode}} via  {{NODE_LABELS_UPDATE}}.
- {{updateLabelsOnNode}} is updating labels to {{FiCaSchedulerNode}}. So 
ideally its better this test case can check whether the intended label is added 
to Node also.

I have updated a patch for same with this improvement. 
[~bibinchundatt]/[~leftnoteasy] Thoughts?

> Unit test intermittent failure: 
> TestNodeLabelContainerAllocation#testQueueUsedCapacitiesUpdate
> ----------------------------------------------------------------------------------------------
>
>                 Key: YARN-4890
>                 URL: https://issues.apache.org/jira/browse/YARN-4890
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Wangda Tan
>         Attachments: 0001-YARN-4890.patch
>
>
> Message:
> {code}
> Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 314.062 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
> testQueueUsedCapacitiesUpdate(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation)
>   Time elapsed: 12.426 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<0.3> but was:<0.6>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:743)
>       at org.junit.Assert.assertEquals(Assert.java:519)
>       at org.junit.Assert.assertEquals(Assert.java:609)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation.checkQueueUsedCapacity(TestNodeLabelContainerAllocation.java:1163)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation.testQueueUsedCapacitiesUpdate(TestNodeLabelContainerAllocation.java:1382)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to