[ 
https://issues.apache.org/jira/browse/YARN-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3152:
------------------------------------
    Description: 
NI have two NNs in HA, they do not fail when the exclude file is not present 
(hadoop-2.6.0/etc/hadoop/exclude). I had one RM and I wanted to make two in HA. 
I didn't create the exclude file at this point as well. I applied the HA RM 
settings properly and when I started both RMs I started getting this exception:

2015-02-06 12:25:25,326 WARN 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root     
OPERATION=transitionToActive    TARGET=RMHAProtocolService      RESULT=FAILURE  
DESCRIPTION=Exception transitioning to active   PERMISSIONS=All users are 
allowed
2015-02-06 12:25:25,326 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
Exception handling the winning of election
org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
        at 
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128)
        at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:805)
        at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:416)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
transitioning to Active mode
        at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:304)
        at 
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
        ... 4 more
Caused by: org.apache.hadoop.ha.ServiceFailedException: 
java.io.FileNotFoundException: /hadoop-2.6.0/etc/hadoop/exclude (No such file 
or directory)
        at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:626)
        at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:297)
        ... 5 more
2015-02-06 12:25:25,327 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying 
to re-establish ZK session
2015-02-06 12:25:25,339 INFO org.apache.zookeeper.ZooKeeper: Session: 
0x44af32566180094 closed
2015-02-06 12:25:26,340 INFO org.apache.zookeeper.ZooKeeper: Initiating client 
connection, connectString=x.x.x.x:2181,x.x.x.x:2181 sessionTimeout=10000 
watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@307587c
2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server x.x.x.x/x.x.x.x:2181. Will not attempt to authenticate 
using SASL (unknown error)
2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Socket connection 
established to x.x.x.x/x.x.x.x:2181, initiating session

The issue is descriptive enough to resolve the problem - and it has been fixed 
by creating the exclude file. 

I just think as of a improvement: 

- Should RMs ignore the missing file as the NNs did?
- Should single RM fail even when the file is not present?

Just suggesting this improvement to keep the behavior consistent when working 
with in HA (both NNs and RMs). 

  was:
I have two NNs in HA, they do not fail when the exclude file is not present 
(hadoop-2.6.0/etc/hadoop/exclude). I had one RM and I wanted to make two in HA. 
I didn't create the exclude file at this point as well. I applied the HA RM 
settings properly and when I started both RMs I started getting this exception:

2015-02-06 12:25:25,326 WARN 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root     
OPERATION=transitionToActive    TARGET=RMHAProtocolService      RESULT=FAILURE  
DESCRIPTION=Exception transitioning to active   PERMISSIONS=All users are 
allowed
2015-02-06 12:25:25,326 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
Exception handling the winning of election
org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
        at 
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128)
        at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:805)
        at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:416)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
transitioning to Active mode
        at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:304)
        at 
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
        ... 4 more
Caused by: org.apache.hadoop.ha.ServiceFailedException: 
java.io.FileNotFoundException: /hadoop-2.6.0/etc/hadoop/exclude (No such file 
or directory)
        at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:626)
        at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:297)
        ... 5 more
2015-02-06 12:25:25,327 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying 
to re-establish ZK session
2015-02-06 12:25:25,339 INFO org.apache.zookeeper.ZooKeeper: Session: 
0x44af32566180094 closed
2015-02-06 12:25:26,340 INFO org.apache.zookeeper.ZooKeeper: Initiating client 
connection, connectString=x.x.x.x:2181,x.x.x.x:2181 sessionTimeout=10000 
watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@307587c
2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server x.x.x.x/x.x.x.x:2181. Will not attempt to authenticate 
using SASL (unknown error)
2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Socket connection 
established to x.x.x.x/x.x.x.x:2181, initiating session

The issue is descriptive enough to resolve the problem - and it has been fixed 
by creating the exclude file. 

I just think as of a improvement: 

- Should RMs ignore the missing file as the NNs did?
- Should single RM fail even when the file is not present?

Just suggesting this improvement to keep the behavior consistent when working 
with in HA (both NNs and RMs). 


> Missing hadoop exclude file fails RMs in HA
> -------------------------------------------
>
>                 Key: YARN-3152
>                 URL: https://issues.apache.org/jira/browse/YARN-3152
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>         Environment: Debian 7
>            Reporter: Neill Lima
>            Assignee: Naganarasimha G R
>
> NI have two NNs in HA, they do not fail when the exclude file is not present 
> (hadoop-2.6.0/etc/hadoop/exclude). I had one RM and I wanted to make two in 
> HA. I didn't create the exclude file at this point as well. I applied the HA 
> RM settings properly and when I started both RMs I started getting this 
> exception:
> 2015-02-06 12:25:25,326 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root   
> OPERATION=transitionToActive    TARGET=RMHAProtocolService      
> RESULT=FAILURE  DESCRIPTION=Exception transitioning to active   
> PERMISSIONS=All users are allowed
> 2015-02-06 12:25:25,326 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128)
>       at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:805)
>       at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:416)
>       at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
>       at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:304)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
>       ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException: 
> java.io.FileNotFoundException: /hadoop-2.6.0/etc/hadoop/exclude (No such file 
> or directory)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:626)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:297)
>       ... 5 more
> 2015-02-06 12:25:25,327 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
> Trying to re-establish ZK session
> 2015-02-06 12:25:25,339 INFO org.apache.zookeeper.ZooKeeper: Session: 
> 0x44af32566180094 closed
> 2015-02-06 12:25:26,340 INFO org.apache.zookeeper.ZooKeeper: Initiating 
> client connection, connectString=x.x.x.x:2181,x.x.x.x:2181 
> sessionTimeout=10000 
> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@307587c
> 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
> connection to server x.x.x.x/x.x.x.x:2181. Will not attempt to authenticate 
> using SASL (unknown error)
> 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Socket 
> connection established to x.x.x.x/x.x.x.x:2181, initiating session
> The issue is descriptive enough to resolve the problem - and it has been 
> fixed by creating the exclude file. 
> I just think as of a improvement: 
> - Should RMs ignore the missing file as the NNs did?
> - Should single RM fail even when the file is not present?
> Just suggesting this improvement to keep the behavior consistent when working 
> with in HA (both NNs and RMs). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to