[ 
https://issues.apache.org/jira/browse/YARN-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038912#comment-14038912
 ] 

Jonathan Eagles commented on YARN-2184:
---------------------------------------

Jeff, This issue has already be reported under YARN-2035 by me and there is a 
patch available. Let me know if this solves your issue and we can close this 
ticket out.

> ResourceManager may fail due to name node in safe mode
> ------------------------------------------------------
>
>                 Key: YARN-2184
>                 URL: https://issues.apache.org/jira/browse/YARN-2184
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 2.4.0
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>
> If the historyservice is enabled in resourcemanager, it will try to mkdir 
> when service is inited. And at that time maybe the name node is still in 
> safemode which may cause the historyservice failed and then cause the 
> resouremanager fail. It would be very possible when the cluster is restarted 
> when namenode will be in safemode in a long time.
> Here's the error logs:
> {code}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException):
>  Cannot create directory 
> /Users/jzhang/Java/lib/hadoop-2.4.0/logs/yarn/system/history/ApplicationHistoryDataRoot.
>  Name node is in safe mode.
> The reported blocks 85 has reached the threshold 0.9990 of total blocks 85. 
> The number of live datanodes 1 has reached the minimum number 0. In safe mode 
> extension. Safe mode will be turned off automatically in 19 seconds.
>     at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1195)
>     at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3564)
>     at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3540)
>     at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:754)
>     at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:558)
>     at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:415)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>     at com.sun.proxy.$Proxy14.mkdirs(Unknown Source)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>     at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>     at com.sun.proxy.$Proxy14.mkdirs(Unknown Source)
>     at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:500)
>     at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2553)
>     at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2524)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:827)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:823)
>     at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:823)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:816)
>     at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1815)
>     at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.serviceInit(FileSystemApplicationHistoryStore.java:120)
>     at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>     ... 10 more
> 2014-06-20 11:06:25,220 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down ResourceManager at 
> jzhangMBPr.local/192.168.100.152
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to