If you are using hadoop user and you have correct ssh conf then below commands should works without password.
Execute from NN2 & NN1 # ssh hadoop@NN1_host & Execute from NN2 & NN1 # ssh hadoop@NN2_host Regards Jitendra On Mon, Dec 2, 2013 at 6:10 PM, YouPeng Yang <[email protected]>wrote: > Hi Jitendra > Yes > I'm doubt that it need to enter the ssh-agent bash & ssh-add before I > ssh the NN from each other.Is it an problem? > > Regards > > > > > 2013/12/2 Jitendra Yadav <[email protected]> > >> Are you able to connect both NN hosts using SSH without password? >> Make sure you have correct ssh keys in authorized key file. >> >> Regards >> Jitendra >> >> >> On Mon, Dec 2, 2013 at 5:50 PM, YouPeng Yang >> <[email protected]>wrote: >> >>> Hi Pavan >>> >>> >>> I'm using sshfence >>> >>> ------core-site.xml----------------- >>> >>> <configuration> >>> <property> >>> <name>fs.defaultFS</name> >>> <value>hdfs://lklcluster</value> >>> <final>true</final> >>> </property> >>> >>> <property> >>> <name>hadoop.tmp.dir</name> >>> <value>/home/hadoop/tmp2</value> >>> </property> >>> >>> >>> </configuration> >>> >>> >>> -------hdfs-site.xml------------- >>> >>> <configuration> >>> <property> >>> <name>dfs.namenode.name.dir</name> >>> <value>/home/hadoop/namedir2</value> >>> </property> >>> >>> <property> >>> <name>dfs.datanode.data.dir</name> >>> <value>/home/hadoop/datadir2</value> >>> </property> >>> >>> <property> >>> <name>dfs.nameservices</name> >>> <value>lklcluster</value> >>> </property> >>> >>> <property> >>> <name>dfs.ha.namenodes.lklcluster</name> >>> <value>nn1,nn2</value> >>> </property> >>> <property> >>> <name>dfs.namenode.rpc-address.lklcluster.nn1</name> >>> <value>hadoop2:8020</value> >>> </property> >>> <property> >>> <name>dfs.namenode.rpc-address.lklcluster.nn2</name> >>> <value>hadoop3:8020</value> >>> </property> >>> >>> <property> >>> <name>dfs.namenode.http-address.lklcluster.nn1</name> >>> <value>hadoop2:50070</value> >>> </property> >>> >>> <property> >>> <name>dfs.namenode.http-address.lklcluster.nn2</name> >>> <value>hadoop3:50070</value> >>> </property> >>> >>> <property> >>> <name>dfs.namenode.shared.edits.dir</name> >>> >>> <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster</value> >>> </property> >>> <property> >>> <name>dfs.client.failover.proxy.provider.lklcluster</name> >>> >>> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> >>> </property> >>> <property> >>> <name>dfs.ha.fencing.methods</name> >>> <value>sshfence</value> >>> </property> >>> >>> <property> >>> <name>dfs.ha.fencing.ssh.private-key-files</name> >>> <value>/home/hadoop/.ssh/id_rsa</value> >>> </property> >>> >>> <property> >>> <name>dfs.ha.fencing.ssh.connect-timeout</name> >>> <value>5000</value> >>> </property> >>> >>> <property> >>> <name>dfs.journalnode.edits.dir</name> >>> <value>/home/hadoop/journal/data</value> >>> </property> >>> >>> <property> >>> <name>dfs.ha.automatic-failover.enabled</name> >>> <value>true</value> >>> </property> >>> >>> <property> >>> <name>ha.zookeeper.quorum</name> >>> <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value> >>> </property> >>> >>> </configuration> >>> >>> >>> 2013/12/2 Pavan Kumar Polineni <[email protected]> >>> >>>> Post your config files and in which method you are following for >>>> automatic failover >>>> >>>> >>>> On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <[email protected] >>>> > wrote: >>>> >>>>> Hi i >>>>> I'm testing the HA auto-failover within hadoop-2.2.0 >>>>> >>>>> The cluster can be manully failover ,however failed with the >>>>> automatic failover. >>>>> I setup the HA according to the URL >>>>> >>>>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html >>>>> >>>>> When I test the automatic failover, I killed my active NN by kill -9 >>>>> <Pid-nn>,while the standby namenode does not change to active state. >>>>> It came out the log in my DFSZKFailoverController as [1] >>>>> >>>>> Please help me ,any suggestion will be appreciated. >>>>> >>>>> >>>>> Regards. >>>>> >>>>> >>>>> zkfc >>>>> log[1]---------------------------------------------------------------------------------------------------- >>>>> >>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ====== >>>>> Beginning Service Fencing Process... ====== >>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying >>>>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null) >>>>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort: >>>>> Connecting to hadoop3... >>>>> 2013-12-02 19:49:28,590 INFO >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop3 port 22 >>>>> 2013-12-02 19:49:28,592 INFO >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established >>>>> 2013-12-02 19:49:28,603 INFO >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string: >>>>> SSH-2.0-OpenSSH_5.3 >>>>> 2013-12-02 19:49:28,603 INFO >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string: >>>>> SSH-2.0-JSCH-0.1.42 >>>>> 2013-12-02 19:49:28,603 INFO >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers: >>>>> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256 >>>>> 2013-12-02 19:49:28,608 INFO >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available. >>>>> 2013-12-02 19:49:28,608 INFO >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available. >>>>> 2013-12-02 19:49:28,608 INFO >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available. >>>>> 2013-12-02 19:49:28,608 INFO >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available. >>>>> 2013-12-02 19:49:28,609 INFO >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available. >>>>> 2013-12-02 19:49:28,610 INFO >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent >>>>> 2013-12-02 19:49:28,610 INFO >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received >>>>> 2013-12-02 19:49:28,610 INFO >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client >>>>> aes128-ctr >>>>> hmac-md5 none >>>>> 2013-12-02 19:49:28,610 INFO >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server >>>>> aes128-ctr >>>>> hmac-md5 none >>>>> 2013-12-02 19:49:28,617 INFO >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent >>>>> 2013-12-02 19:49:28,617 INFO >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY >>>>> 2013-12-02 19:49:28,634 INFO >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature >>>>> true >>>>> 2013-12-02 19:49:28,635 WARN >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'hadoop3' >>>>> (RSA) to the list of known hosts. >>>>> 2013-12-02 19:49:28,635 INFO >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent >>>>> 2013-12-02 19:49:28,635 INFO >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received >>>>> 2013-12-02 19:49:28,636 INFO >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent >>>>> 2013-12-02 19:49:28,637 INFO >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT >>>>> received >>>>> 2013-12-02 19:49:28,638 INFO >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can >>>>> continue: gssapi-with-mic,publickey,keyboard-interactive,password >>>>> 2013-12-02 19:49:28,639 INFO >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: >>>>> gssapi-with-mic >>>>> 2013-12-02 19:49:28,642 INFO >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can >>>>> continue: publickey,keyboard-interactive,password >>>>> 2013-12-02 19:49:28,642 INFO >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: >>>>> publickey >>>>> 2013-12-02 19:49:28,644 INFO >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from hadoop3 >>>>> port 22 >>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort: >>>>> Unable to connect to hadoop3 as user hadoop >>>>> com.jcraft.jsch.JSchException: Auth fail >>>>> at com.jcraft.jsch.Session.connect(Session.java:452) >>>>> at >>>>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100) >>>>> at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97) >>>>> at >>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521) >>>>> at >>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494) >>>>> at >>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59) >>>>> at >>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837) >>>>> at >>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900) >>>>> at >>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799) >>>>> at >>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) >>>>> at >>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596) >>>>> at >>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) >>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing >>>>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful. >>>>> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable >>>>> to fence service by any configured method. >>>>> 2013-12-02 19:49:28,645 INFO >>>>> org.apache.hadoop.ha.ZKFailoverController: Local service NameNode at >>>>> hadoop2/10.7.23.125:8020 entered state: SERVICE_NOT_RESPONDING >>>>> 2013-12-02 19:49:28,646 WARN >>>>> org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning >>>>> of election >>>>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/ >>>>> 10.7.23.124:8020 >>>>> at >>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522) >>>>> at >>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494) >>>>> at >>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59) >>>>> at >>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837) >>>>> at >>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900) >>>>> at >>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799) >>>>> at >>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) >>>>> at >>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596) >>>>> at >>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) >>>>> 2013-12-02 19:49:28,646 INFO >>>>> org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK >>>>> session >>>>> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session: >>>>> 0x2429313c808025b closed >>>>> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: >>>>> Initiating client connection, >>>>> connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181 sessionTimeout=5000 >>>>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b >>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening >>>>> socket connection to server hadoop3/10.7.23.124:2181. Will not >>>>> attempt to authenticate using SASL (Unable to locate a login >>>>> configuration) >>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket >>>>> connection established to hadoop3/10.7.23.124:2181, initiating session >>>>> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session >>>>> establishment complete on server hadoop3/10.7.23.124:2181, sessionid >>>>> = 0x3429312ba330262, negotiated timeout = 5000 >>>>> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn: >>>>> EventThread shut down >>>>> 2013-12-02 19:49:29,706 INFO >>>>> org.apache.hadoop.ha.ActiveStandbyElector: Session connected. >>>>> 2013-12-02 19:49:29,706 INFO >>>>> org.apache.hadoop.ha.ZKFailoverController: Quitting master election for >>>>> NameNode at hadoop2/10.7.23.125:8020 and marking that fencing is >>>>> necessary >>>>> 2013-12-02 19:49:29,706 INFO >>>>> org.apache.hadoop.ha.ActiveStandbyElector: Yielding from election >>>>> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session: >>>>> 0x3429312ba330262 closed >>>>> 2013-12-02 19:49:29,728 WARN >>>>> org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old >>>>> client with sessionId 0x3429312ba330262 >>>>> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn: >>>>> EventThread shut down >>>>> >>>> >>>> >>> >> >
