[ 
https://issues.apache.org/jira/browse/YARN-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Gupta updated YARN-1220:
------------------------------

    Summary: Yarn App recovers when it should not as delete failed from rm fs 
store  (was: Yarn RM fs state store should handle safemode exceptions)
    
> Yarn App recovers when it should not as delete failed from rm fs store
> ----------------------------------------------------------------------
>
>                 Key: YARN-1220
>                 URL: https://issues.apache.org/jira/browse/YARN-1220
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.1.0-beta
>            Reporter: Arpit Gupta
>            Assignee: Vinod Kumar Vavilapalli
>
> {code}
> ons: 0
> 2013-09-18 05:41:13,542 ERROR recovery.RMStateStore 
> (RMStateStore.java:handleStoreEvent(490)) - Error removing app: 
> application_1379482521108_0003
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException):
>  Cannot delete 
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1379482521108_0003.
> Name node is in safe mode.
> The reported blocks 1018 has reached the threshold 1.0000 of total blocks 
> 1018. The number of live datanodes 5 has reached the minimum number 0. Safe 
> mode will be turned off automatically in 20 seconds.
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:3124)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:3083)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3067)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:697)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:491)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Clien
> {code}
> The issue here is that in case namenode is in safemode while we are 
> interacting with fs state store we wont be able to update the status. In this 
> particular case the app was never removed from the store and upon rm restart 
> the app was recovered when it did not need to be.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to