[
https://issues.apache.org/jira/browse/YARN-5937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Weiwei Yang updated YARN-5937:
------------------------------
Attachment: nm_shutdown.log
> stop-yarn.sh is not able to gracefully stop node managers
> ---------------------------------------------------------
>
> Key: YARN-5937
> URL: https://issues.apache.org/jira/browse/YARN-5937
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Weiwei Yang
> Assignee: Weiwei Yang
> Attachments: nm_shutdown.log
>
>
> stop-yarn.sh always gives following output
> {code}
> ./sbin/stop-yarn.sh
> Stopping resourcemanager
> Stopping nodemanagers
> <NM_HOST>: WARNING: nodemanager did not stop gracefully after 5 seconds:
> Trying to kill with kill -9
> oracle1.fyre.ibm.com: ERROR: Unable to kill 18097
> {code}
> this was because resource manager is stopped before node managers, when the
> shutdown hook manager tries to gracefully stop NM services, NM needs to
> unregister with RM, and it gets timeout as NM could not connect to RM
> (already stopped). See log (stop RM then run kill <nm_pid>)
> {code}
> 16/11/28 08:26:43 ERROR nodemanager.NodeManager: RECEIVED SIGNAL 15: SIGTERM
> ...
> 16/11/28 08:26:53 WARN util.ShutdownHookManager: ShutdownHook
> 'CompositeServiceShutdownHook' timeout, java.util.concurrent.TimeoutException
> java.util.concurrent.TimeoutException
> at java.util.concurrent.FutureTask.get(FutureTask.java:205)
> at
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:67)
> ...
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.unRegisterNM(NodeStatusUpdaterImpl.java:291)
> ...
> 16/11/28 08:27:13 ERROR util.ShutdownHookManager: ShutdownHookManger shutdown
> forcefully.
> {code}
> the shutdown hooker has a default of 10s timeout, so if RM is stopped before
> NMs, they always took more than 10s to stop (in java code). However
> stop-yarn.sh only gives 5s timeout, so NM is always killed instead of stopped.
> It would make sense to stop NMs before RMs in this script, in a graceful way.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]