Wangda Tan commented on YARN-4169:

[~Naganarasimha] / [~steve_l].

Thanks for looking at this issue.
IIUC, the problem is heartbeatMonitor has some racing conditions, which is 
caused by send OOB wait/notify implementation.

Looked at the patch, I think maybe do a small refactoring can avoid lots of 
complexities in your test:
- Pull updating logic from startStatusUpdater() to a separated method, for 
example: doStatusUpdate, it should be synchronized to make sure there's only 
one thread can access it. lastHeartbeatID needs to be a member variable.
- startStatusUpdater will use doStatusUpdate.
- Use doStatusUpdater in your test, since you have a synchronized 
ResourceTrackerService implementation, you don't need waitHeartbeat, etc.


> jenkins trunk+java build failed in TestNodeStatusUpdaterForLabels
> -----------------------------------------------------------------
>                 Key: YARN-4169
>                 URL: https://issues.apache.org/jira/browse/YARN-4169
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 3.0.0
>         Environment: Jenkins
>            Reporter: Steve Loughran
>            Assignee: Naganarasimha G R
>            Priority: Critical
>         Attachments: YARN-4169.v1.001.patch, YARN-4169.v1.002.patch, 
> YARN-4169.v1.003.patch
> Test failing in [[Jenkins build 
> 402|https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-Yarn-trunk-Java8/402/testReport/junit/org.apache.hadoop.yarn.server.nodemanager/TestNodeStatusUpdaterForLabels/testNodeStatusUpdaterForNodeLabels/]
> {code}
> java.lang.NullPointerException: null
>       at java.util.HashSet.<init>(HashSet.java:118)
>       at 
> org.apache.hadoop.yarn.nodelabels.NodeLabelTestBase.assertNLCollectionEquals(NodeLabelTestBase.java:103)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels.testNodeStatusUpdaterForNodeLabels(TestNodeStatusUpdaterForLabels.java:268)
> {code}

This message was sent by Atlassian JIRA

Reply via email to