Naganarasimha G R commented on YARN-3152:

Hi [~xgong],
bq. If the file does not exist, both of them will throw out the exception. No ?
Yes you are right, intention is to not throw the exception, but may be can log 
WARN message saying Configured file doesn't exist (with the path info). 

bq. So, we. throw out such exception when active RM start
I see currently there is different behavior in HA mode(starts of properly) and 
non HA mode(starts of but continuous logs saying file not found) which i feel 
should not be the behavior
Also in other places where we configure the file path we do check whether file 
exists (ex. nodeHealthScripts), so i was of the opinion that its better to add 
a file exists check here too or atleast the behavior for HA and Non HA mode 
should be same.

> Missing hadoop exclude file fails RMs in HA
> -------------------------------------------
>                 Key: YARN-3152
>                 URL: https://issues.apache.org/jira/browse/YARN-3152
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>         Environment: Debian 7
>            Reporter: Neill Lima
>            Assignee: Naganarasimha G R
> NI have two NNs in HA, they do not fail when the exclude file is not present 
> (hadoop-2.6.0/etc/hadoop/exclude). I had one RM and I wanted to make two in 
> HA. I didn't create the exclude file at this point as well. I applied the HA 
> RM settings properly and when I started both RMs I started getting this 
> exception:
> 2015-02-06 12:25:25,326 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root   
> OPERATION=transitionToActive    TARGET=RMHAProtocolService      
> RESULT=FAILURE  DESCRIPTION=Exception transitioning to active   
> PERMISSIONS=All users are allowed
> 2015-02-06 12:25:25,326 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128)
>       at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:805)
>       at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:416)
>       at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
>       at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:304)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
>       ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException: 
> java.io.FileNotFoundException: /hadoop-2.6.0/etc/hadoop/exclude (No such file 
> or directory)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:626)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:297)
>       ... 5 more
> 2015-02-06 12:25:25,327 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
> Trying to re-establish ZK session
> 2015-02-06 12:25:25,339 INFO org.apache.zookeeper.ZooKeeper: Session: 
> 0x44af32566180094 closed
> 2015-02-06 12:25:26,340 INFO org.apache.zookeeper.ZooKeeper: Initiating 
> client connection, connectString=x.x.x.x:2181,x.x.x.x:2181 
> sessionTimeout=10000 
> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@307587c
> 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
> connection to server x.x.x.x/x.x.x.x:2181. Will not attempt to authenticate 
> using SASL (unknown error)
> 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Socket 
> connection established to x.x.x.x/x.x.x.x:2181, initiating session
> The issue is descriptive enough to resolve the problem - and it has been 
> fixed by creating the exclude file. 
> I just think as of a improvement: 
> - Should RMs ignore the missing file as the NNs did?
> - Should single RM fail even when the file is not present?
> Just suggesting this improvement to keep the behavior consistent when working 
> with in HA (both NNs and RMs). 

This message was sent by Atlassian JIRA

Reply via email to