[
https://issues.apache.org/jira/browse/YARN-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543903#comment-14543903
]
Advertising
Hudson commented on YARN-3641:
------------------------------
SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2143 (See
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2143/])
YARN-3641. NodeManager: stopRecoveryStore() shouldn't be skipped when
exceptions happen in stopping NM's sub-services. Contributed by Junping Du
(jlowe: rev 711d77cc54a64b2c3db70bdacc6bf2245c896a4b)
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* hadoop-yarn-project/CHANGES.txt
> NodeManager: stopRecoveryStore() shouldn't be skipped when exceptions happen
> in stopping NM's sub-services.
> -----------------------------------------------------------------------------------------------------------
>
> Key: YARN-3641
> URL: https://issues.apache.org/jira/browse/YARN-3641
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager, rolling upgrade
> Affects Versions: 2.6.0
> Reporter: Junping Du
> Assignee: Junping Du
> Priority: Critical
> Fix For: 2.7.1
>
> Attachments: YARN-3641.patch
>
>
> If NM' services not get stopped properly, we cannot start NM with enabling NM
> restart with work preserving. The exception is as following:
> {noformat}
> org.apache.hadoop.service.ServiceStateException:
> org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock
> /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: Resource
> temporarily unavailable
> at
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:175)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:217)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:507)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:555)
> Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error:
> lock /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK:
> Resource temporarily unavailable
> at
> org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
> at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
> at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
> at
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:930)
> at
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2015-05-12 00:34:45,262 INFO nodemanager.NodeManager
> (LogAdapter.java:info(45)) - SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NodeManager at
> c6403.ambari.apache.org/192.168.64.103
> ************************************************************/
> {noformat}
> The related code is as below in NodeManager.java:
> {code}
> @Override
> protected void serviceStop() throws Exception {
> if (isStopping.getAndSet(true)) {
> return;
> }
> super.serviceStop();
> stopRecoveryStore();
> DefaultMetricsSystem.shutdown();
> }
> {code}
> We can see we stop all NM registered services (NodeStatusUpdater,
> LogAggregationService, ResourceLocalizationService, etc.) first. Any of
> services get stopped with exception could cause stopRecoveryStore() get
> skipped which means levelDB store is not get closed. So next time NM start,
> it will get failed with exception above.
> We should put stopRecoveryStore(); in a finally block.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)