Hi Pavan
I'm using sshfence
------core-site.xml-----------------
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://lklcluster</value>
<final>true</final>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp2</value>
</property>
</configuration>
-------hdfs-site.xml-------------
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/namedir2</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/datadir2</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>lklcluster</value>
</property>
<property>
<name>dfs.ha.namenodes.lklcluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.lklcluster.nn1</name>
<value>hadoop2:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.lklcluster.nn2</name>
<value>hadoop3:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.lklcluster.nn1</name>
<value>hadoop2:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.lklcluster.nn2</name>
<value>hadoop3:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.lklcluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>5000</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/journal/data</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
</property>
</configuration>
2013/12/2 Pavan Kumar Polineni <[email protected]>
> Post your config files and in which method you are following for automatic
> failover
>
>
> On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <[email protected]>wrote:
>
>> Hi i
>> I'm testing the HA auto-failover within hadoop-2.2.0
>>
>> The cluster can be manully failover ,however failed with the automatic
>> failover.
>> I setup the HA according to the URL
>>
>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>>
>> When I test the automatic failover, I killed my active NN by kill -9
>> <Pid-nn>,while the standby namenode does not change to active state.
>> It came out the log in my DFSZKFailoverController as [1]
>>
>> Please help me ,any suggestion will be appreciated.
>>
>>
>> Regards.
>>
>>
>> zkfc
>> log[1]----------------------------------------------------------------------------------------------------
>>
>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
>> Beginning Service Fencing Process... ======
>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
>> Connecting to hadoop3...
>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Connecting to hadoop3 port 22
>> 2013-12-02 19:49:28,592 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Connection established
>> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Remote version string: SSH-2.0-OpenSSH_5.3
>> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Local version string: SSH-2.0-JSCH-0.1.42
>> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> CheckCiphers:
>> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
>> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> aes256-ctr is not available.
>> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> aes192-ctr is not available.
>> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> aes256-cbc is not available.
>> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> aes192-cbc is not available.
>> 2013-12-02 19:49:28,609 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> arcfour256 is not available.
>> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_KEXINIT sent
>> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_KEXINIT received
>> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> kex: server->client aes128-ctr hmac-md5 none
>> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> kex: client->server aes128-ctr hmac-md5 none
>> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_KEXDH_INIT sent
>> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> expecting SSH_MSG_KEXDH_REPLY
>> 2013-12-02 19:49:28,634 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> ssh_rsa_verify: signature true
>> 2013-12-02 19:49:28,635 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Permanently added 'hadoop3' (RSA) to the list of known hosts.
>> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_NEWKEYS sent
>> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_NEWKEYS received
>> 2013-12-02 19:49:28,636 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_SERVICE_REQUEST sent
>> 2013-12-02 19:49:28,637 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_SERVICE_ACCEPT received
>> 2013-12-02 19:49:28,638 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Authentications that can continue:
>> gssapi-with-mic,publickey,keyboard-interactive,password
>> 2013-12-02 19:49:28,639 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Next authentication method: gssapi-with-mic
>> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Authentications that can continue: publickey,keyboard-interactive,password
>> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Next authentication method: publickey
>> 2013-12-02 19:49:28,644 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Disconnecting from hadoop3 port 22
>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
>> Unable to connect to hadoop3 as user hadoop
>> com.jcraft.jsch.JSchException: Auth fail
>> at com.jcraft.jsch.Session.connect(Session.java:452)
>> at
>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>> at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>> at
>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>> at
>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>> at
>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>> at
>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>> at
>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>> at
>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>> at
>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>> at
>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>> at
>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
>> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable to
>> fence service by any configured method.
>> 2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController:
>> Local service NameNode at hadoop2/10.7.23.125:8020 entered state:
>> SERVICE_NOT_RESPONDING
>> 2013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>> Exception handling the winning of election
>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
>> 10.7.23.124:8020
>> at
>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>> at
>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>> at
>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>> at
>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>> at
>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>> at
>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>> at
>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>> at
>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>> at
>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>> 2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>> Trying to re-establish ZK session
>> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
>> 0x2429313c808025b closed
>> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating
>> client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
>> sessionTimeout=5000
>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
>> socket connection to server hadoop3/10.7.23.124:2181. Will not attempt
>> to authenticate using SASL (Unable to locate a login configuration)
>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
>> connection established to hadoop3/10.7.23.124:2181, initiating session
>> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
>> establishment complete on server hadoop3/10.7.23.124:2181, sessionid =
>> 0x3429312ba330262, negotiated timeout = 5000
>> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn: EventThread
>> shut down
>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>> Session connected.
>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ZKFailoverController:
>> Quitting master election for NameNode at hadoop2/10.7.23.125:8020 and
>> marking that fencing is necessary
>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>> Yielding from election
>> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
>> 0x3429312ba330262 closed
>> 2013-12-02 19:49:29,728 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>> Ignoring stale result from old client with sessionId 0x3429312ba330262
>> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn: EventThread
>> shut down
>>
>
>