[
https://issues.apache.org/jira/browse/YARN-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arpit Gupta updated YARN-1220:
------------------------------
Summary: Yarn App recovers when it should not as delete failed from rm fs
store (was: Yarn RM fs state store should handle safemode exceptions)
> Yarn App recovers when it should not as delete failed from rm fs store
> ----------------------------------------------------------------------
>
> Key: YARN-1220
> URL: https://issues.apache.org/jira/browse/YARN-1220
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.1.0-beta
> Reporter: Arpit Gupta
> Assignee: Vinod Kumar Vavilapalli
>
> {code}
> ons: 0
> 2013-09-18 05:41:13,542 ERROR recovery.RMStateStore
> (RMStateStore.java:handleStoreEvent(490)) - Error removing app:
> application_1379482521108_0003
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException):
> Cannot delete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1379482521108_0003.
> Name node is in safe mode.
> The reported blocks 1018 has reached the threshold 1.0000 of total blocks
> 1018. The number of live datanodes 5 has reached the minimum number 0. Safe
> mode will be turned off automatically in 20 seconds.
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:3124)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:3083)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3067)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:697)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:491)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Clien
> {code}
> The issue here is that in case namenode is in safemode while we are
> interacting with fs state store we wont be able to update the status. In this
> particular case the app was never removed from the store and upon rm restart
> the app was recovered when it did not need to be.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira