[
https://issues.apache.org/jira/browse/YARN-4756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eric Badger updated YARN-4756:
------------------------------
Attachment: YARN-4756.001.patch
The optimization to notify the Node Status Updater thread to stop waiting for a
heartbeat exposes a race condition in the test
TestNodeManagerResync#testContainerResourceIncreaseIsSynchronizedWithRMResync.
The test checks the current resources of the NM, then checks for it again since
a different thread changes the current resources. However, there is no
synchronization between these threads and it was only working because of the
excessive wait time from the reboot. The patch adds in a barrier to synchronize
these two threads.
> Unnecessary wait in Node Status Updater during reboot
> -----------------------------------------------------
>
> Key: YARN-4756
> URL: https://issues.apache.org/jira/browse/YARN-4756
> Project: Hadoop YARN
> Issue Type: Improvement
> Reporter: Eric Badger
> Assignee: Eric Badger
> Attachments: YARN-4756.001.patch
>
>
> The Node Status Updater thread waits for the isStopped variable to be set to
> true, but it is waiting for the next heartbeat. During a reboot, the next
> heartbeat will not come and so the thread waits for a timeout. Instead, we
> should notify the thread to continue so that it can check the isStopped
> variable and exit without having to wait for a timeout.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)