[ 
https://issues.apache.org/jira/browse/YARN-4881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212903#comment-15212903
 ] 

Rohith Sharma K S commented on YARN-4881:
-----------------------------------------

In our cluster RM HA is enabled. Both RM's are up and running. But none of the 
RM's are becoming active and continuously switching. So either it is single RM 
or HA enabled, in both cases RM will not be up.

> RM continuously switch if HDFS is too busy when NodeLabel is configured
> -----------------------------------------------------------------------
>
>                 Key: YARN-4881
>                 URL: https://issues.apache.org/jira/browse/YARN-4881
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Rohith Sharma K S
>            Priority: Critical
>
> It is observed in the production cluster that RM fail to become active and 
> keep continuously switching if the HDFS is too busy and node label is 
> configured. This is causing RM down time as very high. 
> Exception from RM logs
> {noformat}
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/mapred/node-labels/nodelabel.mirror.writing could only be replicated to 
> 0 nodes instead of minReplication (=1). There are 7 datanode(s) running and 
> no node(s) are excluded in this operation.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to