[ 
https://issues.apache.org/jira/browse/YARN-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14626448#comment-14626448
 ] 

Neill Lima commented on YARN-3152:
----------------------------------

Hello [~Naganarasimha], I am visiting this topic since it is been a while, 
sorry about the delay.

| When you mention "RMs didn't start" you mean you were not able to access web 
ui or the process down ?

The RM didn't bootstrap so I couldn't even see the web ui or connect to the 
server. If the excluded file is not that relevant the absence of it should not 
block the RM to go up (that is a way higher priority). A [WARN] could be added 
to the logs though, just like the NNs do.

> Missing hadoop exclude file fails RMs in HA
> -------------------------------------------
>
>                 Key: YARN-3152
>                 URL: https://issues.apache.org/jira/browse/YARN-3152
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>         Environment: Debian 7
>            Reporter: Neill Lima
>            Assignee: Naganarasimha G R
>
> NI have two NNs in HA, they do not fail when the exclude file is not present 
> (hadoop-2.6.0/etc/hadoop/exclude). I had one RM and I wanted to make two in 
> HA. I didn't create the exclude file at this point as well. I applied the HA 
> RM settings properly and when I started both RMs I started getting this 
> exception:
> 2015-02-06 12:25:25,326 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root   
> OPERATION=transitionToActive    TARGET=RMHAProtocolService      
> RESULT=FAILURE  DESCRIPTION=Exception transitioning to active   
> PERMISSIONS=All users are allowed
> 2015-02-06 12:25:25,326 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128)
>       at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:805)
>       at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:416)
>       at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
>       at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:304)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
>       ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException: 
> java.io.FileNotFoundException: /hadoop-2.6.0/etc/hadoop/exclude (No such file 
> or directory)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:626)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:297)
>       ... 5 more
> 2015-02-06 12:25:25,327 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
> Trying to re-establish ZK session
> 2015-02-06 12:25:25,339 INFO org.apache.zookeeper.ZooKeeper: Session: 
> 0x44af32566180094 closed
> 2015-02-06 12:25:26,340 INFO org.apache.zookeeper.ZooKeeper: Initiating 
> client connection, connectString=x.x.x.x:2181,x.x.x.x:2181 
> sessionTimeout=10000 
> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@307587c
> 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
> connection to server x.x.x.x/x.x.x.x:2181. Will not attempt to authenticate 
> using SASL (unknown error)
> 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Socket 
> connection established to x.x.x.x/x.x.x.x:2181, initiating session
> The issue is descriptive enough to resolve the problem - and it has been 
> fixed by creating the exclude file. 
> I just think as of a improvement: 
> - Should RMs ignore the missing file as the NNs did?
> - Should single RM fail even when the file is not present?
> Just suggesting this improvement to keep the behavior consistent when working 
> with in HA (both NNs and RMs). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to