[
https://issues.apache.org/jira/browse/YARN-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14956071#comment-14956071
]
Wangda Tan commented on YARN-4169:
----------------------------------
[~Naganarasimha] / [~steve_l].
Thanks for looking at this issue.
IIUC, the problem is heartbeatMonitor has some racing conditions, which is
caused by send OOB wait/notify implementation.
Looked at the patch, I think maybe do a small refactoring can avoid lots of
complexities in your test:
- Pull updating logic from startStatusUpdater() to a separated method, for
example: doStatusUpdate, it should be synchronized to make sure there's only
one thread can access it. lastHeartbeatID needs to be a member variable.
- startStatusUpdater will use doStatusUpdate.
- Use doStatusUpdater in your test, since you have a synchronized
ResourceTrackerService implementation, you don't need waitHeartbeat, etc.
Thoughts?
> jenkins trunk+java build failed in TestNodeStatusUpdaterForLabels
> -----------------------------------------------------------------
>
> Key: YARN-4169
> URL: https://issues.apache.org/jira/browse/YARN-4169
> Project: Hadoop YARN
> Issue Type: Bug
> Components: test
> Affects Versions: 3.0.0
> Environment: Jenkins
> Reporter: Steve Loughran
> Assignee: Naganarasimha G R
> Priority: Critical
> Attachments: YARN-4169.v1.001.patch, YARN-4169.v1.002.patch,
> YARN-4169.v1.003.patch
>
>
> Test failing in [[Jenkins build
> 402|https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-Yarn-trunk-Java8/402/testReport/junit/org.apache.hadoop.yarn.server.nodemanager/TestNodeStatusUpdaterForLabels/testNodeStatusUpdaterForNodeLabels/]
> {code}
> java.lang.NullPointerException: null
> at java.util.HashSet.<init>(HashSet.java:118)
> at
> org.apache.hadoop.yarn.nodelabels.NodeLabelTestBase.assertNLCollectionEquals(NodeLabelTestBase.java:103)
> at
> org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels.testNodeStatusUpdaterForNodeLabels(TestNodeStatusUpdaterForLabels.java:268)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)