[jira] [Commented] (HADOOP-10722) Standby NN continuing as standby when active NN machine got shutdown.
[ https://issues.apache.org/jira/browse/HADOOP-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037201#comment-14037201 ] Beckham007 commented on HADOOP-10722: - Unable to connect to host-10-18-40-101 as user myuser port 22 Check ur config whether ssh myuser@host-10-18-40-101 works on ur standby node. I think this is not a bug, but wrong configuration. Standby NN continuing as standby when active NN machine got shutdown. - Key: HADOOP-10722 URL: https://issues.apache.org/jira/browse/HADOOP-10722 Project: Hadoop Common Issue Type: Bug Components: auto-failover, ha Affects Versions: 2.4.0 Reporter: surendra singh lilhore I have HA cluster with 3 ZK, 3 QJM. My Active NN machine got shutdown, but still my standby NN is standby only. It should be active ZKFC logs {noformat} 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: == Beginning Service Fencing Process... == 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null) 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connecting to host-10-18-40-101... 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to host-10-18-40-101 port 22 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable to connect to host-10-18-40-101 as user myuser com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to host at com.jcraft.jsch.Util.createSocket(Util.java:386) at com.jcraft.jsch.Session.connect(Session.java:182) at org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100) at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97) at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521) at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494) at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59) at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837) at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:901) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:800) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful. {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10722) Standby NN continuing as standby when active NN machine got shutdown.
[ https://issues.apache.org/jira/browse/HADOOP-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037205#comment-14037205 ] surendra singh lilhore commented on HADOOP-10722: - [~beckham007] Standby NN continuing as standby when active NN machine got shutdown. - Key: HADOOP-10722 URL: https://issues.apache.org/jira/browse/HADOOP-10722 Project: Hadoop Common Issue Type: Bug Components: auto-failover, ha Affects Versions: 2.4.0 Reporter: surendra singh lilhore I have HA cluster with 3 ZK, 3 QJM. My Active NN machine got shutdown, but still my standby NN is standby only. It should be active ZKFC logs {noformat} 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: == Beginning Service Fencing Process... == 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null) 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connecting to host-10-18-40-101... 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to host-10-18-40-101 port 22 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable to connect to host-10-18-40-101 as user myuser com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to host at com.jcraft.jsch.Util.createSocket(Util.java:386) at com.jcraft.jsch.Session.connect(Session.java:182) at org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100) at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97) at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521) at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494) at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59) at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837) at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:901) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:800) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful. {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10722) Standby NN continuing as standby when active NN machine got shutdown.
[ https://issues.apache.org/jira/browse/HADOOP-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037209#comment-14037209 ] surendra singh lilhore commented on HADOOP-10722: - Thanks for looking this issue Yes in my machine ssh is working. and Unable to connect to host-10-18-40-101 as user myuser port 22 coming because ANN machine is not reachable Standby NN continuing as standby when active NN machine got shutdown. - Key: HADOOP-10722 URL: https://issues.apache.org/jira/browse/HADOOP-10722 Project: Hadoop Common Issue Type: Bug Components: auto-failover, ha Affects Versions: 2.4.0 Reporter: surendra singh lilhore I have HA cluster with 3 ZK, 3 QJM. My Active NN machine got shutdown, but still my standby NN is standby only. It should be active ZKFC logs {noformat} 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: == Beginning Service Fencing Process... == 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null) 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connecting to host-10-18-40-101... 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to host-10-18-40-101 port 22 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable to connect to host-10-18-40-101 as user myuser com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to host at com.jcraft.jsch.Util.createSocket(Util.java:386) at com.jcraft.jsch.Session.connect(Session.java:182) at org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100) at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97) at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521) at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494) at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59) at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837) at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:901) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:800) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful. {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10722) Standby NN continuing as standby when active NN machine got shutdown.
[ https://issues.apache.org/jira/browse/HADOOP-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037211#comment-14037211 ] Vinayakumar B commented on HADOOP-10722: Ideally Fencing methods should be configured to not to allow multiple writers to same shared storage. QJM supports the fencing feature on its own. i.e. it wont allow multiple writers at a time. So external fencing methods need not be configured. You can remove the SSH fencing method from both machines configuration and restart the cluster. Then failover will happen successfully. You can just set the below configuration for fence methods to skip SSH fence. {code:xml}property namedfs.ha.fencing.methods/name valueshell(/bin/true)/value /property{code} Standby NN continuing as standby when active NN machine got shutdown. - Key: HADOOP-10722 URL: https://issues.apache.org/jira/browse/HADOOP-10722 Project: Hadoop Common Issue Type: Bug Components: auto-failover, ha Affects Versions: 2.4.0 Reporter: surendra singh lilhore I have HA cluster with 3 ZK, 3 QJM. My Active NN machine got shutdown, but still my standby NN is standby only. It should be active ZKFC logs {noformat} 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: == Beginning Service Fencing Process... == 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null) 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connecting to host-10-18-40-101... 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to host-10-18-40-101 port 22 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable to connect to host-10-18-40-101 as user myuser com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to host at com.jcraft.jsch.Util.createSocket(Util.java:386) at com.jcraft.jsch.Session.connect(Session.java:182) at org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100) at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97) at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521) at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494) at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59) at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837) at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:901) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:800) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful. {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10722) Standby NN continuing as standby when active NN machine got shutdown.
[ https://issues.apache.org/jira/browse/HADOOP-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037221#comment-14037221 ] surendra singh lilhore commented on HADOOP-10722: - @vinay Thanks Its working fine {noformat} 2014-06-19 16:29:26,083 INFO org.apache.hadoop.ha.NodeFencer: == Beginning Service Fencing Process... == 2014-06-19 16:29:26,083 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.ShellCommandFencer(/bin/true) 2014-06-19 16:29:26,129 INFO org.apache.hadoop.ha.ShellCommandFencer: Launched fencing command '/bin/true' with pid 24316 2014-06-19 16:29:26,168 INFO org.apache.hadoop.ha.NodeFencer: == Fencing successful by method org.apache.hadoop.ha.ShellCommandFencer(/bin/true) == 2014-06-19 16:29:26,168 INFO org.apache.hadoop.ha.ActiveStandbyElector: Writing znode /hadoop-ha/mycluster/ActiveBreadCrumb to indicate that the local node is the most recent active... 2014-06-19 16:29:26,206 INFO org.apache.hadoop.ha.ZKFailoverController: Trying to make NameNode at host-10-18-40-90/10.18.40.90:8020 active... 2014-06-19 16:29:26,862 INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at host-10-18-40-90/10.18.40.90:8020 to active state {noformat} Standby NN continuing as standby when active NN machine got shutdown. - Key: HADOOP-10722 URL: https://issues.apache.org/jira/browse/HADOOP-10722 Project: Hadoop Common Issue Type: Bug Components: auto-failover, ha Affects Versions: 2.4.0 Reporter: surendra singh lilhore I have HA cluster with 3 ZK, 3 QJM. My Active NN machine got shutdown, but still my standby NN is standby only. It should be active ZKFC logs {noformat} 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: == Beginning Service Fencing Process... == 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null) 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connecting to host-10-18-40-101... 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to host-10-18-40-101 port 22 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable to connect to host-10-18-40-101 as user myuser com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to host at com.jcraft.jsch.Util.createSocket(Util.java:386) at com.jcraft.jsch.Session.connect(Session.java:182) at org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100) at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97) at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521) at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494) at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59) at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837) at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:901) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:800) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful. {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10722) Standby NN continuing as standby when active NN machine got shutdown.
[ https://issues.apache.org/jira/browse/HADOOP-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037222#comment-14037222 ] Vinayakumar B commented on HADOOP-10722: Then you can resolve this issue? BTW, Thanks for the update. Standby NN continuing as standby when active NN machine got shutdown. - Key: HADOOP-10722 URL: https://issues.apache.org/jira/browse/HADOOP-10722 Project: Hadoop Common Issue Type: Bug Components: auto-failover, ha Affects Versions: 2.4.0 Reporter: surendra singh lilhore I have HA cluster with 3 ZK, 3 QJM. My Active NN machine got shutdown, but still my standby NN is standby only. It should be active ZKFC logs {noformat} 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: == Beginning Service Fencing Process... == 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null) 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connecting to host-10-18-40-101... 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to host-10-18-40-101 port 22 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable to connect to host-10-18-40-101 as user myuser com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to host at com.jcraft.jsch.Util.createSocket(Util.java:386) at com.jcraft.jsch.Session.connect(Session.java:182) at org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100) at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97) at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521) at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494) at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59) at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837) at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:901) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:800) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful. {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)