[jira] [Commented] (HADOOP-10722) Standby NN continuing as standby when active NN machine got shutdown.

2014-06-19 Thread Beckham007 (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037201#comment-14037201
 ] 

Beckham007 commented on HADOOP-10722:
-

Unable to connect to host-10-18-40-101 as user myuser port 22
Check ur config whether ssh myuser@host-10-18-40-101 works on ur standby 
node. 
I think this is not a bug, but wrong configuration.

 Standby NN continuing as standby when active NN machine got shutdown.
 -

 Key: HADOOP-10722
 URL: https://issues.apache.org/jira/browse/HADOOP-10722
 Project: Hadoop Common
  Issue Type: Bug
  Components: auto-failover, ha
Affects Versions: 2.4.0
Reporter: surendra singh lilhore

 I have HA cluster with 3 ZK, 3 QJM.
 My Active NN machine got shutdown, but still my standby NN is standby only.
 It should be active
 ZKFC logs
 
 {noformat}
 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: == 
 Beginning Service Fencing Process... ==
 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: Trying method 
 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort: 
 Connecting to host-10-18-40-101...
 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: 
 Connecting to host-10-18-40-101 port 22
 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable 
 to connect to host-10-18-40-101 as user myuser
 com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to 
 host
   at com.jcraft.jsch.Util.createSocket(Util.java:386)
   at com.jcraft.jsch.Session.connect(Session.java:182)
   at 
 org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
   at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
   at 
 org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
   at 
 org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
   at 
 org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
   at 
 org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:901)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:800)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
   at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.NodeFencer: Fencing method 
 org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10722) Standby NN continuing as standby when active NN machine got shutdown.

2014-06-19 Thread surendra singh lilhore (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037205#comment-14037205
 ] 

surendra singh lilhore commented on HADOOP-10722:
-

[~beckham007]

 Standby NN continuing as standby when active NN machine got shutdown.
 -

 Key: HADOOP-10722
 URL: https://issues.apache.org/jira/browse/HADOOP-10722
 Project: Hadoop Common
  Issue Type: Bug
  Components: auto-failover, ha
Affects Versions: 2.4.0
Reporter: surendra singh lilhore

 I have HA cluster with 3 ZK, 3 QJM.
 My Active NN machine got shutdown, but still my standby NN is standby only.
 It should be active
 ZKFC logs
 
 {noformat}
 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: == 
 Beginning Service Fencing Process... ==
 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: Trying method 
 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort: 
 Connecting to host-10-18-40-101...
 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: 
 Connecting to host-10-18-40-101 port 22
 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable 
 to connect to host-10-18-40-101 as user myuser
 com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to 
 host
   at com.jcraft.jsch.Util.createSocket(Util.java:386)
   at com.jcraft.jsch.Session.connect(Session.java:182)
   at 
 org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
   at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
   at 
 org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
   at 
 org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
   at 
 org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
   at 
 org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:901)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:800)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
   at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.NodeFencer: Fencing method 
 org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10722) Standby NN continuing as standby when active NN machine got shutdown.

2014-06-19 Thread surendra singh lilhore (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037209#comment-14037209
 ] 

surendra singh lilhore commented on HADOOP-10722:
-

Thanks for looking this issue

Yes in my machine ssh is working.

and Unable to connect to host-10-18-40-101 as user myuser port 22 coming 
because ANN machine is not reachable


 Standby NN continuing as standby when active NN machine got shutdown.
 -

 Key: HADOOP-10722
 URL: https://issues.apache.org/jira/browse/HADOOP-10722
 Project: Hadoop Common
  Issue Type: Bug
  Components: auto-failover, ha
Affects Versions: 2.4.0
Reporter: surendra singh lilhore

 I have HA cluster with 3 ZK, 3 QJM.
 My Active NN machine got shutdown, but still my standby NN is standby only.
 It should be active
 ZKFC logs
 
 {noformat}
 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: == 
 Beginning Service Fencing Process... ==
 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: Trying method 
 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort: 
 Connecting to host-10-18-40-101...
 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: 
 Connecting to host-10-18-40-101 port 22
 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable 
 to connect to host-10-18-40-101 as user myuser
 com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to 
 host
   at com.jcraft.jsch.Util.createSocket(Util.java:386)
   at com.jcraft.jsch.Session.connect(Session.java:182)
   at 
 org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
   at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
   at 
 org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
   at 
 org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
   at 
 org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
   at 
 org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:901)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:800)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
   at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.NodeFencer: Fencing method 
 org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10722) Standby NN continuing as standby when active NN machine got shutdown.

2014-06-19 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037211#comment-14037211
 ] 

Vinayakumar B commented on HADOOP-10722:


Ideally Fencing methods should be configured to not to allow multiple writers 
to same shared storage.

QJM supports the fencing feature on its own. i.e. it wont allow multiple 
writers at a time. So external fencing methods need not be configured.
You can remove the SSH fencing method from both machines configuration and 
restart the cluster.
Then failover will happen successfully.

You can just set the below configuration for fence methods to skip SSH fence.
{code:xml}property
  namedfs.ha.fencing.methods/name
  valueshell(/bin/true)/value
/property{code}

 Standby NN continuing as standby when active NN machine got shutdown.
 -

 Key: HADOOP-10722
 URL: https://issues.apache.org/jira/browse/HADOOP-10722
 Project: Hadoop Common
  Issue Type: Bug
  Components: auto-failover, ha
Affects Versions: 2.4.0
Reporter: surendra singh lilhore

 I have HA cluster with 3 ZK, 3 QJM.
 My Active NN machine got shutdown, but still my standby NN is standby only.
 It should be active
 ZKFC logs
 
 {noformat}
 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: == 
 Beginning Service Fencing Process... ==
 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: Trying method 
 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort: 
 Connecting to host-10-18-40-101...
 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: 
 Connecting to host-10-18-40-101 port 22
 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable 
 to connect to host-10-18-40-101 as user myuser
 com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to 
 host
   at com.jcraft.jsch.Util.createSocket(Util.java:386)
   at com.jcraft.jsch.Session.connect(Session.java:182)
   at 
 org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
   at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
   at 
 org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
   at 
 org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
   at 
 org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
   at 
 org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:901)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:800)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
   at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.NodeFencer: Fencing method 
 org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10722) Standby NN continuing as standby when active NN machine got shutdown.

2014-06-19 Thread surendra singh lilhore (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037221#comment-14037221
 ] 

surendra singh lilhore commented on HADOOP-10722:
-

@vinay

Thanks

Its working fine

{noformat}
2014-06-19 16:29:26,083 INFO org.apache.hadoop.ha.NodeFencer: == Beginning 
Service Fencing Process... ==
2014-06-19 16:29:26,083 INFO org.apache.hadoop.ha.NodeFencer: Trying method 
1/1: org.apache.hadoop.ha.ShellCommandFencer(/bin/true)
2014-06-19 16:29:26,129 INFO org.apache.hadoop.ha.ShellCommandFencer: Launched 
fencing command '/bin/true' with pid 24316
2014-06-19 16:29:26,168 INFO org.apache.hadoop.ha.NodeFencer: == Fencing 
successful by method org.apache.hadoop.ha.ShellCommandFencer(/bin/true) ==
2014-06-19 16:29:26,168 INFO org.apache.hadoop.ha.ActiveStandbyElector: Writing 
znode /hadoop-ha/mycluster/ActiveBreadCrumb to indicate that the local node is 
the most recent active...
2014-06-19 16:29:26,206 INFO org.apache.hadoop.ha.ZKFailoverController: Trying 
to make NameNode at host-10-18-40-90/10.18.40.90:8020 active...
2014-06-19 16:29:26,862 INFO org.apache.hadoop.ha.ZKFailoverController: 
Successfully transitioned NameNode at host-10-18-40-90/10.18.40.90:8020 to 
active state
{noformat}

 Standby NN continuing as standby when active NN machine got shutdown.
 -

 Key: HADOOP-10722
 URL: https://issues.apache.org/jira/browse/HADOOP-10722
 Project: Hadoop Common
  Issue Type: Bug
  Components: auto-failover, ha
Affects Versions: 2.4.0
Reporter: surendra singh lilhore

 I have HA cluster with 3 ZK, 3 QJM.
 My Active NN machine got shutdown, but still my standby NN is standby only.
 It should be active
 ZKFC logs
 
 {noformat}
 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: == 
 Beginning Service Fencing Process... ==
 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: Trying method 
 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort: 
 Connecting to host-10-18-40-101...
 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: 
 Connecting to host-10-18-40-101 port 22
 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable 
 to connect to host-10-18-40-101 as user myuser
 com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to 
 host
   at com.jcraft.jsch.Util.createSocket(Util.java:386)
   at com.jcraft.jsch.Session.connect(Session.java:182)
   at 
 org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
   at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
   at 
 org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
   at 
 org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
   at 
 org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
   at 
 org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:901)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:800)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
   at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.NodeFencer: Fencing method 
 org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10722) Standby NN continuing as standby when active NN machine got shutdown.

2014-06-19 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037222#comment-14037222
 ] 

Vinayakumar B commented on HADOOP-10722:


Then you can resolve this issue?

BTW, Thanks for the update.

 Standby NN continuing as standby when active NN machine got shutdown.
 -

 Key: HADOOP-10722
 URL: https://issues.apache.org/jira/browse/HADOOP-10722
 Project: Hadoop Common
  Issue Type: Bug
  Components: auto-failover, ha
Affects Versions: 2.4.0
Reporter: surendra singh lilhore

 I have HA cluster with 3 ZK, 3 QJM.
 My Active NN machine got shutdown, but still my standby NN is standby only.
 It should be active
 ZKFC logs
 
 {noformat}
 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: == 
 Beginning Service Fencing Process... ==
 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: Trying method 
 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort: 
 Connecting to host-10-18-40-101...
 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: 
 Connecting to host-10-18-40-101 port 22
 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable 
 to connect to host-10-18-40-101 as user myuser
 com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to 
 host
   at com.jcraft.jsch.Util.createSocket(Util.java:386)
   at com.jcraft.jsch.Session.connect(Session.java:182)
   at 
 org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
   at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
   at 
 org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
   at 
 org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
   at 
 org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
   at 
 org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:901)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:800)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
   at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.NodeFencer: Fencing method 
 org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)