[jira] [Commented] (HDFS-14442) Disagreement between HAUtil.getAddressOfActive and RpcInvocationHandler.getConnectionId
[ https://issues.apache.org/jira/browse/HDFS-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966600#comment-16966600 ] Ravuri Sushma sree commented on HDFS-14442: --- Hi [~xkrogen], as you have suggestion I have replaced DFSTestUtil.waitReplication with *DFSTestUtil.waitForReplication* which uses GenericTestUtils.waitFor(); Uploaded a patch, Kindly Review > Disagreement between HAUtil.getAddressOfActive and > RpcInvocationHandler.getConnectionId > --- > > Key: HDFS-14442 > URL: https://issues.apache.org/jira/browse/HDFS-14442 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Erik Krogen >Priority: Major > Attachments: HDFS-14442.001.patch, HDFS-14442.002.patch > > > While working on HDFS-14245, we noticed a discrepancy in some proxy-handling > code. > The description of {{RpcInvocationHandler.getConnectionId()}} states: > {code} > /** >* Returns the connection id associated with the InvocationHandler instance. >* @return ConnectionId >*/ > ConnectionId getConnectionId(); > {code} > It does not make any claims about whether this connection ID will be an > active proxy or not. Yet in {{HAUtil}} we have: > {code} > /** >* Get the internet address of the currently-active NN. This should rarely > be >* used, since callers of this method who connect directly to the NN using > the >* resulting InetSocketAddress will not be able to connect to the active NN > if >* a failover were to occur after this method has been called. >* >* @param fs the file system to get the active address of. >* @return the internet address of the currently-active NN. >* @throws IOException if an error occurs while resolving the active NN. >*/ > public static InetSocketAddress getAddressOfActive(FileSystem fs) > throws IOException { > if (!(fs instanceof DistributedFileSystem)) { > throw new IllegalArgumentException("FileSystem " + fs + " is not a > DFS."); > } > // force client address resolution. > fs.exists(new Path("/")); > DistributedFileSystem dfs = (DistributedFileSystem) fs; > DFSClient dfsClient = dfs.getClient(); > return RPC.getServerAddress(dfsClient.getNamenode()); > } > {code} > Where the call {{RPC.getServerAddress()}} eventually terminates into > {{RpcInvocationHandler#getConnectionId()}}, via {{RPC.getServerAddress()}} -> > {{RPC.getConnectionIdForProxy()}} -> > {{RpcInvocationHandler#getConnectionId()}}. {{HAUtil}} appears to be making > an incorrect assumption that {{RpcInvocationHandler}} will necessarily return > an _active_ connection ID. {{ObserverReadProxyProvider}} demonstrates a > counter-example to this, since the current connection ID may be pointing at, > for example, an Observer NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14442) Disagreement between HAUtil.getAddressOfActive and RpcInvocationHandler.getConnectionId
[ https://issues.apache.org/jira/browse/HDFS-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravuri Sushma sree updated HDFS-14442: -- Attachment: HDFS-14442.002.patch > Disagreement between HAUtil.getAddressOfActive and > RpcInvocationHandler.getConnectionId > --- > > Key: HDFS-14442 > URL: https://issues.apache.org/jira/browse/HDFS-14442 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Erik Krogen >Priority: Major > Attachments: HDFS-14442.001.patch, HDFS-14442.002.patch > > > While working on HDFS-14245, we noticed a discrepancy in some proxy-handling > code. > The description of {{RpcInvocationHandler.getConnectionId()}} states: > {code} > /** >* Returns the connection id associated with the InvocationHandler instance. >* @return ConnectionId >*/ > ConnectionId getConnectionId(); > {code} > It does not make any claims about whether this connection ID will be an > active proxy or not. Yet in {{HAUtil}} we have: > {code} > /** >* Get the internet address of the currently-active NN. This should rarely > be >* used, since callers of this method who connect directly to the NN using > the >* resulting InetSocketAddress will not be able to connect to the active NN > if >* a failover were to occur after this method has been called. >* >* @param fs the file system to get the active address of. >* @return the internet address of the currently-active NN. >* @throws IOException if an error occurs while resolving the active NN. >*/ > public static InetSocketAddress getAddressOfActive(FileSystem fs) > throws IOException { > if (!(fs instanceof DistributedFileSystem)) { > throw new IllegalArgumentException("FileSystem " + fs + " is not a > DFS."); > } > // force client address resolution. > fs.exists(new Path("/")); > DistributedFileSystem dfs = (DistributedFileSystem) fs; > DFSClient dfsClient = dfs.getClient(); > return RPC.getServerAddress(dfsClient.getNamenode()); > } > {code} > Where the call {{RPC.getServerAddress()}} eventually terminates into > {{RpcInvocationHandler#getConnectionId()}}, via {{RPC.getServerAddress()}} -> > {{RPC.getConnectionIdForProxy()}} -> > {{RpcInvocationHandler#getConnectionId()}}. {{HAUtil}} appears to be making > an incorrect assumption that {{RpcInvocationHandler}} will necessarily return > an _active_ connection ID. {{ObserverReadProxyProvider}} demonstrates a > counter-example to this, since the current connection ID may be pointing at, > for example, an Observer NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14442) Disagreement between HAUtil.getAddressOfActive and RpcInvocationHandler.getConnectionId
[ https://issues.apache.org/jira/browse/HDFS-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954248#comment-16954248 ] Ravuri Sushma sree commented on HDFS-14442: --- [~xkrogen], Thank you so much for your valuable suggestions. I have uploaded a patch following up the first approach. Can you please review > Disagreement between HAUtil.getAddressOfActive and > RpcInvocationHandler.getConnectionId > --- > > Key: HDFS-14442 > URL: https://issues.apache.org/jira/browse/HDFS-14442 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Erik Krogen >Priority: Major > Attachments: HDFS-14442.001.patch > > > While working on HDFS-14245, we noticed a discrepancy in some proxy-handling > code. > The description of {{RpcInvocationHandler.getConnectionId()}} states: > {code} > /** >* Returns the connection id associated with the InvocationHandler instance. >* @return ConnectionId >*/ > ConnectionId getConnectionId(); > {code} > It does not make any claims about whether this connection ID will be an > active proxy or not. Yet in {{HAUtil}} we have: > {code} > /** >* Get the internet address of the currently-active NN. This should rarely > be >* used, since callers of this method who connect directly to the NN using > the >* resulting InetSocketAddress will not be able to connect to the active NN > if >* a failover were to occur after this method has been called. >* >* @param fs the file system to get the active address of. >* @return the internet address of the currently-active NN. >* @throws IOException if an error occurs while resolving the active NN. >*/ > public static InetSocketAddress getAddressOfActive(FileSystem fs) > throws IOException { > if (!(fs instanceof DistributedFileSystem)) { > throw new IllegalArgumentException("FileSystem " + fs + " is not a > DFS."); > } > // force client address resolution. > fs.exists(new Path("/")); > DistributedFileSystem dfs = (DistributedFileSystem) fs; > DFSClient dfsClient = dfs.getClient(); > return RPC.getServerAddress(dfsClient.getNamenode()); > } > {code} > Where the call {{RPC.getServerAddress()}} eventually terminates into > {{RpcInvocationHandler#getConnectionId()}}, via {{RPC.getServerAddress()}} -> > {{RPC.getConnectionIdForProxy()}} -> > {{RpcInvocationHandler#getConnectionId()}}. {{HAUtil}} appears to be making > an incorrect assumption that {{RpcInvocationHandler}} will necessarily return > an _active_ connection ID. {{ObserverReadProxyProvider}} demonstrates a > counter-example to this, since the current connection ID may be pointing at, > for example, an Observer NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14442) Disagreement between HAUtil.getAddressOfActive and RpcInvocationHandler.getConnectionId
[ https://issues.apache.org/jira/browse/HDFS-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravuri Sushma sree updated HDFS-14442: -- Attachment: HDFS-14442.001.patch > Disagreement between HAUtil.getAddressOfActive and > RpcInvocationHandler.getConnectionId > --- > > Key: HDFS-14442 > URL: https://issues.apache.org/jira/browse/HDFS-14442 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Erik Krogen >Priority: Major > Attachments: HDFS-14442.001.patch > > > While working on HDFS-14245, we noticed a discrepancy in some proxy-handling > code. > The description of {{RpcInvocationHandler.getConnectionId()}} states: > {code} > /** >* Returns the connection id associated with the InvocationHandler instance. >* @return ConnectionId >*/ > ConnectionId getConnectionId(); > {code} > It does not make any claims about whether this connection ID will be an > active proxy or not. Yet in {{HAUtil}} we have: > {code} > /** >* Get the internet address of the currently-active NN. This should rarely > be >* used, since callers of this method who connect directly to the NN using > the >* resulting InetSocketAddress will not be able to connect to the active NN > if >* a failover were to occur after this method has been called. >* >* @param fs the file system to get the active address of. >* @return the internet address of the currently-active NN. >* @throws IOException if an error occurs while resolving the active NN. >*/ > public static InetSocketAddress getAddressOfActive(FileSystem fs) > throws IOException { > if (!(fs instanceof DistributedFileSystem)) { > throw new IllegalArgumentException("FileSystem " + fs + " is not a > DFS."); > } > // force client address resolution. > fs.exists(new Path("/")); > DistributedFileSystem dfs = (DistributedFileSystem) fs; > DFSClient dfsClient = dfs.getClient(); > return RPC.getServerAddress(dfsClient.getNamenode()); > } > {code} > Where the call {{RPC.getServerAddress()}} eventually terminates into > {{RpcInvocationHandler#getConnectionId()}}, via {{RPC.getServerAddress()}} -> > {{RPC.getConnectionIdForProxy()}} -> > {{RpcInvocationHandler#getConnectionId()}}. {{HAUtil}} appears to be making > an incorrect assumption that {{RpcInvocationHandler}} will necessarily return > an _active_ connection ID. {{ObserverReadProxyProvider}} demonstrates a > counter-example to this, since the current connection ID may be pointing at, > for example, an Observer NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14442) Disagreement between HAUtil.getAddressOfActive and RpcInvocationHandler.getConnectionId
[ https://issues.apache.org/jira/browse/HDFS-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935972#comment-16935972 ] Ravuri Sushma sree commented on HDFS-14442: --- In my scenario there are *1 Active* *NameNode*** , *1 Standby* *NameNode*** and *1 Observer NameNode* When fsck command is executed, connection is being establised to Observer Namenode whereas in DFSck.java an ActiveNamenode is expected from getCurrentNamenodeAddress() to handle fsck There seems no issues with a normal fsck command, but when there are corrupted files present in the cluster , *fsck -delete* command throws {{*"org.apache.hadoop.ipc.StandbyException: Operation category WRITE is not supported in state observer"*}} and *"* {{*Fsck on path '/' FAILED"* }} > Disagreement between HAUtil.getAddressOfActive and > RpcInvocationHandler.getConnectionId > --- > > Key: HDFS-14442 > URL: https://issues.apache.org/jira/browse/HDFS-14442 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Erik Krogen >Priority: Major > > While working on HDFS-14245, we noticed a discrepancy in some proxy-handling > code. > The description of {{RpcInvocationHandler.getConnectionId()}} states: > {code} > /** >* Returns the connection id associated with the InvocationHandler instance. >* @return ConnectionId >*/ > ConnectionId getConnectionId(); > {code} > It does not make any claims about whether this connection ID will be an > active proxy or not. Yet in {{HAUtil}} we have: > {code} > /** >* Get the internet address of the currently-active NN. This should rarely > be >* used, since callers of this method who connect directly to the NN using > the >* resulting InetSocketAddress will not be able to connect to the active NN > if >* a failover were to occur after this method has been called. >* >* @param fs the file system to get the active address of. >* @return the internet address of the currently-active NN. >* @throws IOException if an error occurs while resolving the active NN. >*/ > public static InetSocketAddress getAddressOfActive(FileSystem fs) > throws IOException { > if (!(fs instanceof DistributedFileSystem)) { > throw new IllegalArgumentException("FileSystem " + fs + " is not a > DFS."); > } > // force client address resolution. > fs.exists(new Path("/")); > DistributedFileSystem dfs = (DistributedFileSystem) fs; > DFSClient dfsClient = dfs.getClient(); > return RPC.getServerAddress(dfsClient.getNamenode()); > } > {code} > Where the call {{RPC.getServerAddress()}} eventually terminates into > {{RpcInvocationHandler#getConnectionId()}}, via {{RPC.getServerAddress()}} -> > {{RPC.getConnectionIdForProxy()}} -> > {{RpcInvocationHandler#getConnectionId()}}. {{HAUtil}} appears to be making > an incorrect assumption that {{RpcInvocationHandler}} will necessarily return > an _active_ connection ID. {{ObserverReadProxyProvider}} demonstrates a > counter-example to this, since the current connection ID may be pointing at, > for example, an Observer NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14442) Disagreement between HAUtil.getAddressOfActive and RpcInvocationHandler.getConnectionId
[ https://issues.apache.org/jira/browse/HDFS-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935775#comment-16935775 ] Ravuri Sushma sree commented on HDFS-14442: --- Hi [~xkrogen] getAddressOfActive() does not necessarily return an activeNamenode as you mentioned Getting proxies ForAllNamenodesInNameservice and RPC.getServerAddress() of the proxy who's HAServiceState is ACTIVE is helping return the active. {code:java} URI dfsUri = getUri(); String nsId = dfsUri.getHost(); List namenodes = HAUtilgetProxiesForAllNameNodesInNameservice(dfsConf , nsId); for(ClientProtocol proxy : namenodes){ if(proxy.getHAServiceState().equals(HAServiceState.ACTIVE)){ return(RPC.getServerAddress(proxy); } }{code} Can you suggest if there is any way I can do the same from DFSClient > Disagreement between HAUtil.getAddressOfActive and > RpcInvocationHandler.getConnectionId > --- > > Key: HDFS-14442 > URL: https://issues.apache.org/jira/browse/HDFS-14442 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Erik Krogen >Priority: Major > > While working on HDFS-14245, we noticed a discrepancy in some proxy-handling > code. > The description of {{RpcInvocationHandler.getConnectionId()}} states: > {code} > /** >* Returns the connection id associated with the InvocationHandler instance. >* @return ConnectionId >*/ > ConnectionId getConnectionId(); > {code} > It does not make any claims about whether this connection ID will be an > active proxy or not. Yet in {{HAUtil}} we have: > {code} > /** >* Get the internet address of the currently-active NN. This should rarely > be >* used, since callers of this method who connect directly to the NN using > the >* resulting InetSocketAddress will not be able to connect to the active NN > if >* a failover were to occur after this method has been called. >* >* @param fs the file system to get the active address of. >* @return the internet address of the currently-active NN. >* @throws IOException if an error occurs while resolving the active NN. >*/ > public static InetSocketAddress getAddressOfActive(FileSystem fs) > throws IOException { > if (!(fs instanceof DistributedFileSystem)) { > throw new IllegalArgumentException("FileSystem " + fs + " is not a > DFS."); > } > // force client address resolution. > fs.exists(new Path("/")); > DistributedFileSystem dfs = (DistributedFileSystem) fs; > DFSClient dfsClient = dfs.getClient(); > return RPC.getServerAddress(dfsClient.getNamenode()); > } > {code} > Where the call {{RPC.getServerAddress()}} eventually terminates into > {{RpcInvocationHandler#getConnectionId()}}, via {{RPC.getServerAddress()}} -> > {{RPC.getConnectionIdForProxy()}} -> > {{RpcInvocationHandler#getConnectionId()}}. {{HAUtil}} appears to be making > an incorrect assumption that {{RpcInvocationHandler}} will necessarily return > an _active_ connection ID. {{ObserverReadProxyProvider}} demonstrates a > counter-example to this, since the current connection ID may be pointing at, > for example, an Observer NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14528) Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravuri Sushma sree updated HDFS-14528: -- Attachment: HDFS-14528.005.patch > Failover from Active to Standby Failed > > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: HDFS-14528.003.patch, HDFS-14528.004.patch, > HDFS-14528.005.patch, HDFS-14528.2.Patch, ZKFC_issue.patch > > > *In a cluster with more than one Standby namenode, manual failover throws > exception for some cases* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused > This is encountered in the following cases : > Scenario 1 : > Namenodes - NN1(Active) , NN2(Standby), NN3(Standby) > When trying to manually failover from NN1 to NN2 if NN3 is down, Exception is > thrown > Scenario 2 : > Namenodes - NN1(Active) , NN2(Standby), NN3(Standby) > ZKFC's - ZKFC1, ZKFC2, ZKFC3 > When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is > down, Exception is thrown -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14528) Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925119#comment-16925119 ] Ravuri Sushma sree edited comment on HDFS-14528 at 9/9/19 4:28 AM: --- Patch has been uploaded.Please Review The above test failures aren't related was (Author: sushma_28): The above test failures arent related > Failover from Active to Standby Failed > > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: HDFS-14528.003.patch, HDFS-14528.004.patch, > HDFS-14528.2.Patch, ZKFC_issue.patch > > > *In a cluster with more than one Standby namenode, manual failover throws > exception for some cases* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused > This is encountered in the following cases : > Scenario 1 : > Namenodes - NN1(Active) , NN2(Standby), NN3(Standby) > When trying to manually failover from NN1 to NN2 if NN3 is down, Exception is > thrown > Scenario 2 : > Namenodes - NN1(Active) , NN2(Standby), NN3(Standby) > ZKFC's - ZKFC1, ZKFC2, ZKFC3 > When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is > down, Exception is thrown -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14528) Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925119#comment-16925119 ] Ravuri Sushma sree commented on HDFS-14528: --- The above test failures arent related > Failover from Active to Standby Failed > > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: HDFS-14528.003.patch, HDFS-14528.004.patch, > HDFS-14528.2.Patch, ZKFC_issue.patch > > > *In a cluster with more than one Standby namenode, manual failover throws > exception for some cases* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused > This is encountered in the following cases : > Scenario 1 : > Namenodes - NN1(Active) , NN2(Standby), NN3(Standby) > When trying to manually failover from NN1 to NN2 if NN3 is down, Exception is > thrown > Scenario 2 : > Namenodes - NN1(Active) , NN2(Standby), NN3(Standby) > ZKFC's - ZKFC1, ZKFC2, ZKFC3 > When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is > down, Exception is thrown -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14528) Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravuri Sushma sree updated HDFS-14528: -- Attachment: HDFS-14528.004.patch > Failover from Active to Standby Failed > > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: HDFS-14528.003.patch, HDFS-14528.004.patch, > HDFS-14528.2.Patch, ZKFC_issue.patch > > > *In a cluster with more than one Standby namenode, manual failover throws > exception for some cases* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused > This is encountered in the following cases : > Scenario 1 : > Namenodes - NN1(Active) , NN2(Standby), NN3(Standby) > When trying to manually failover from NN1 to NN2 if NN3 is down, Exception is > thrown > Scenario 2 : > Namenodes - NN1(Active) , NN2(Standby), NN3(Standby) > ZKFC's - ZKFC1, ZKFC2, ZKFC3 > When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is > down, Exception is thrown -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14528) Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924289#comment-16924289 ] Ravuri Sushma sree commented on HDFS-14528: --- Thanks [~csun] , No, adding remote host twice wasnt intended. I will upload the patch correcting the same . > Failover from Active to Standby Failed > > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: HDFS-14528.003.patch, HDFS-14528.2.Patch, > ZKFC_issue.patch > > > *In a cluster with more than one Standby namenode, manual failover throws > exception for some cases* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused > This is encountered in the following cases : > Scenario 1 : > Namenodes - NN1(Active) , NN2(Standby), NN3(Standby) > When trying to manually failover from NN1 to NN2 if NN3 is down, Exception is > thrown > Scenario 2 : > Namenodes - NN1(Active) , NN2(Standby), NN3(Standby) > ZKFC's - ZKFC1, ZKFC2, ZKFC3 > When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is > down, Exception is thrown -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14528) Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravuri Sushma sree updated HDFS-14528: -- Description: *In a cluster with more than one Standby namenode, manual failover throws exception for some cases* *When trying to exectue the failover command from active to standby* *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on connection exception: java.net.ConnectException: Connection refused This is encountered in the following cases : Scenario 1 : Namenodes - NN1(Active) , NN2(Standby), NN3(Standby) When trying to manually failover from NN1 to NN2 if NN3 is down, Exception is thrown Scenario 2 : Namenodes - NN1(Active) , NN2(Standby), NN3(Standby) ZKFC's - ZKFC1, ZKFC2, ZKFC3 When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is down, Exception is thrown was: *In a cluster with more than one Standby namenode, manual failover throws exception for some cases* *When trying to exectue the failover command from active to standby* *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on connection exception: java.net.ConnectException: Connection refused This is encountered in the following cases : Scenario 1 : Namenodes - NN1(Active) , NN2(Standby), NN3(Standby) When trying to manually failover from NN1 TO NN2 if NN3 is down, Exception is thrown Scenario 2 : Namenodes - NN1(Active) , NN2(Standby), NN3(Standby) ZKFC's - ZKFC1, ZKFC2, ZKFC3 When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is down, Exception is thrown > Failover from Active to Standby Failed > > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: HDFS-14528.003.patch, HDFS-14528.2.Patch, > ZKFC_issue.patch > > > *In a cluster with more than one Standby namenode, manual failover throws > exception for some cases* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused > This is encountered in the following cases : > Scenario 1 : > Namenodes - NN1(Active) , NN2(Standby), NN3(Standby) > When trying to manually failover from NN1 to NN2 if NN3 is down, Exception is > thrown > Scenario 2 : > Namenodes - NN1(Active) , NN2(Standby), NN3(Standby) > ZKFC's - ZKFC1, ZKFC2, ZKFC3 > When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is > down, Exception is thrown -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14528) Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravuri Sushma sree updated HDFS-14528: -- Description: *In a cluster with more than one Standby namenode, manual failover throws exception for some cases* *When trying to exectue the failover command from active to standby* *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on connection exception: java.net.ConnectException: Connection refused This is encountered in the following cases : Scenario 1 : Namenodes - NN1(Active) , NN2(Standby), NN3(Standby) When trying to manually failover from NN1 TO NN2 if NN3 is down, Exception is thrown Scenario 2 : Namenodes - NN1(Active) , NN2(Standby), NN3(Standby) ZKFC's - ZKFC1, ZKFC2, ZKFC3 When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is down, Exception is thrown was: *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* *When trying to exectue the failover command from active to standby* *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on connection exception: java.net.ConnectException: Connection refused This is encountered in two cases : When any other standby namenode is down or when any other zkfc is down > Failover from Active to Standby Failed > > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: HDFS-14528.003.patch, HDFS-14528.2.Patch, > ZKFC_issue.patch > > > *In a cluster with more than one Standby namenode, manual failover throws > exception for some cases* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused > This is encountered in the following cases : > Scenario 1 : > Namenodes - NN1(Active) , NN2(Standby), NN3(Standby) > When trying to manually failover from NN1 TO NN2 if NN3 is down, Exception is > thrown > Scenario 2 : > Namenodes - NN1(Active) , NN2(Standby), NN3(Standby) > ZKFC's - ZKFC1, ZKFC2, ZKFC3 > When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is > down, Exception is thrown -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14528) Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravuri Sushma sree updated HDFS-14528: -- Summary: Failover from Active to Standby Failed(was: [SBN Read]Failover from Active to Standby Failed ) > Failover from Active to Standby Failed > > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: HDFS-14528.003.patch, HDFS-14528.2.Patch, > ZKFC_issue.patch > > > *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused > This is encountered in two cases : When any other standby namenode is down or > when any other zkfc is down -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14528) [SBN Read]Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914054#comment-16914054 ] Ravuri Sushma sree commented on HDFS-14528: --- Thank you [~ayushtkn] for the review, will add a LOG inside and correct the checkstyle issues > [SBN Read]Failover from Active to Standby Failed > -- > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: HDFS-14528.003.patch, HDFS-14528.2.Patch, > ZKFC_issue.patch > > > *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused > This is encountered in two cases : When any other standby namenode is down or > when any other zkfc is down -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14765) fsck and dfsadmin -report gives different results at UnderReplicated Blocks after decomissioning
[ https://issues.apache.org/jira/browse/HDFS-14765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913167#comment-16913167 ] Ravuri Sushma sree commented on HDFS-14765: --- Working on a cluster with 2 Namenodes and 6 Datanodes with 6,00,000 files in it One of the datanode is decommissioned After the node is decommissioned, fsck , dfsadmin/WebUi shows the following result : *FSCK* : Status: HEALTHY Number of data-nodes: 6 Number of racks: 1 Total dirs: 83 Total symlinks: 0 Replicated Blocks: Total size: 7431010 B Total files: 60 Total blocks (validated): 581839 (avg. block size 12 B) Minimally replicated blocks: 581839 (100.0 %) Over-replicated blocks: 0 (0.0 %) {color:#de350b}Under-replicated blocks: 0 (0.0 %){color} Mis-replicated blocks: 0 (0.0 %) Default replication factor: 2 Average block replication: 3.0021775 Missing blocks: 0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) DecommissionedReplicas: 255396 *Dfsadmin -report* : Configured Capacity: 792665681920 (738.23 GB) Present Capacity: 447796567710 (417.04 GB) DFS Remaining: 43539168 (405.49 GB) DFS Used: 12405456542 (11.55 GB) DFS Used%: 2.77% Replicated Blocks: {color:#de350b}Under replicated blocks: 252067{color} Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 Pending deletion blocks: 0 > fsck and dfsadmin -report gives different results at UnderReplicated Blocks > after decomissioning > > > Key: HDFS-14765 > URL: https://issues.apache.org/jira/browse/HDFS-14765 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.2 > Environment: > >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Fix For: 3.3.0 > > > Fsck and dfsadmin show different results of underreplicated blocks after one > of the datanodes is decommmissioned > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14765) fsck and dfsadmin -report gives different results at UnderReplicated Blocks after decomissioning
[ https://issues.apache.org/jira/browse/HDFS-14765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravuri Sushma sree updated HDFS-14765: -- Environment: was: Working on a cluster with 2 Namenodes and 6 Datanodes with 6,00,000 files in it One of the datanode is decommissioned After the node is decommissioned, fsck , dfsadmin/WebUi shows the following result : *FSCK* : Status: HEALTHY Number of data-nodes: 6 Number of racks: 1 Total dirs: 83 Total symlinks: 0 Replicated Blocks: Total size: 7431010 B Total files: 60 Total blocks (validated): 581839 (avg. block size 12 B) Minimally replicated blocks: 581839 (100.0 %) Over-replicated blocks: 0 (0.0 %) {color:#de350b}Under-replicated blocks: 0 (0.0 %){color} Mis-replicated blocks: 0 (0.0 %) Default replication factor: 2 Average block replication: 3.0021775 Missing blocks: 0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) DecommissionedReplicas: 255396 *Dfsadmin -report* : Configured Capacity: 792665681920 (738.23 GB) Present Capacity: 447796567710 (417.04 GB) DFS Remaining: 43539168 (405.49 GB) DFS Used: 12405456542 (11.55 GB) DFS Used%: 2.77% Replicated Blocks: {color:#de350b}Under replicated blocks: 252067{color} Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 Pending deletion blocks: 0 > fsck and dfsadmin -report gives different results at UnderReplicated Blocks > after decomissioning > > > Key: HDFS-14765 > URL: https://issues.apache.org/jira/browse/HDFS-14765 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.2 > Environment: > >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Fix For: 3.3.0 > > > Fsck and dfsadmin show different results of underreplicated blocks after one > of the datanodes is decommmissioned -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14765) fsck and dfsadmin -report gives different results at UnderReplicated Blocks after decomissioning
[ https://issues.apache.org/jira/browse/HDFS-14765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravuri Sushma sree updated HDFS-14765: -- Description: Fsck and dfsadmin show different results of underreplicated blocks after one of the datanodes is decommmissioned was:Fsck and dfsadmin show different results of underreplicated blocks after one of the datanodes is decommmissioned > fsck and dfsadmin -report gives different results at UnderReplicated Blocks > after decomissioning > > > Key: HDFS-14765 > URL: https://issues.apache.org/jira/browse/HDFS-14765 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.2 > Environment: > >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Fix For: 3.3.0 > > > Fsck and dfsadmin show different results of underreplicated blocks after one > of the datanodes is decommmissioned > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14765) fsck and dfsadmin -report gives different results at UnderReplicated Blocks after decomissioning
Ravuri Sushma sree created HDFS-14765: - Summary: fsck and dfsadmin -report gives different results at UnderReplicated Blocks after decomissioning Key: HDFS-14765 URL: https://issues.apache.org/jira/browse/HDFS-14765 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.1.2 Environment: Working on a cluster with 2 Namenodes and 6 Datanodes with 6,00,000 files in it One of the datanode is decommissioned After the node is decommissioned, fsck , dfsadmin/WebUi shows the following result : *FSCK* : Status: HEALTHY Number of data-nodes: 6 Number of racks: 1 Total dirs: 83 Total symlinks: 0 Replicated Blocks: Total size: 7431010 B Total files: 60 Total blocks (validated): 581839 (avg. block size 12 B) Minimally replicated blocks: 581839 (100.0 %) Over-replicated blocks: 0 (0.0 %) {color:#de350b}Under-replicated blocks: 0 (0.0 %){color} Mis-replicated blocks: 0 (0.0 %) Default replication factor: 2 Average block replication: 3.0021775 Missing blocks: 0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) DecommissionedReplicas: 255396 *Dfsadmin -report* : Configured Capacity: 792665681920 (738.23 GB) Present Capacity: 447796567710 (417.04 GB) DFS Remaining: 43539168 (405.49 GB) DFS Used: 12405456542 (11.55 GB) DFS Used%: 2.77% Replicated Blocks: {color:#de350b}Under replicated blocks: 252067{color} Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 Pending deletion blocks: 0 Reporter: Ravuri Sushma sree Assignee: Ravuri Sushma sree Fix For: 3.3.0 Fsck and dfsadmin show different results of underreplicated blocks after one of the datanodes is decommmissioned -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14528) [SBN Read]Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911070#comment-16911070 ] Ravuri Sushma sree commented on HDFS-14528: --- [~csun] [~ayushtkn] [~brahmareddy] I have added connection exception handling and added UT . Please Review > [SBN Read]Failover from Active to Standby Failed > -- > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: HDFS-14528.003.patch, HDFS-14528.2.Patch, > ZKFC_issue.patch > > > *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused > This is encountered in two cases : When any other standby namenode is down or > when any other zkfc is down -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14528) [SBN Read]Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravuri Sushma sree updated HDFS-14528: -- Attachment: HDFS-14528.003.patch > [SBN Read]Failover from Active to Standby Failed > -- > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: HDFS-14528.003.patch, HDFS-14528.2.Patch, > ZKFC_issue.patch > > > *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused > This is encountered in two cases : When any other standby namenode is down or > when any other zkfc is down -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14528) [SBN Read]Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravuri Sushma sree updated HDFS-14528: -- Attachment: HDFS-14528.2.Patch > [SBN Read]Failover from Active to Standby Failed > -- > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: HDFS-14528.2.Patch, ZKFC_issue.patch > > > *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused > This is encountered in two cases : When any other standby namenode is down or > when any other zkfc is down -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14528) [SBN Read]Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravuri Sushma sree updated HDFS-14528: -- Attachment: (was: HDFS-14528.2.patch) > [SBN Read]Failover from Active to Standby Failed > -- > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: ZKFC_issue.patch > > > *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused > This is encountered in two cases : When any other standby namenode is down or > when any other zkfc is down -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14528) [SBN Read]Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravuri Sushma sree updated HDFS-14528: -- Attachment: HDFS-14528.2.patch > [SBN Read]Failover from Active to Standby Failed > -- > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: HDFS-14528.2.patch, ZKFC_issue.patch > > > *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused > This is encountered in two cases : When any other standby namenode is down or > when any other zkfc is down -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14528) [SBN Read]Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravuri Sushma sree updated HDFS-14528: -- Description: *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* *When trying to exectue the failover command from active to standby* *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on connection exception: java.net.ConnectException: Connection refused This is encountered in two cases : When any other standby namenode is down or when any other zkfc is down was: *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* *When trying to exectue the failover command from active to standby* *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on connection exception: java.net.ConnectException: Connection refused; For more details see: [http://wiki.apache.org/hadoop/ConnectionRefused] at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) > [SBN Read]Failover from Active to Standby Failed > -- > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: ZKFC_issue.patch > > > *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused > This is encountered in two cases : When any other standby namenode is down or > when any other zkfc is down -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HDFS-14528) [SBN Read]Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravuri Sushma sree updated HDFS-14528: -- Comment: was deleted (was: Thank you everyone for the discussion Chao Sun yes, as you are saying this can be a general issue where without observer also if any of the standby namenode's zkfc is down we may encounter the same error. I shall change the description and add an UT As per current scenario, observer is not to be included in failover (Observer to active)and hence this fix is skipping observer in phase 3 . If multiple standby namenodes are present, with any one of the zkfc of standby namenode down,it can be skipped here as well ( when encountered connection refused) will work on it and upload the patch here .Also, will update the summary and description of this jira with another UT for it accordingly) > [SBN Read]Failover from Active to Standby Failed > -- > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: ZKFC_issue.patch > > > *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused; For more > details see: [http://wiki.apache.org/hadoop/ConnectionRefused] > at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14528) [SBN Read]Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856490#comment-16856490 ] Ravuri Sushma sree commented on HDFS-14528: --- Thank you everyone for the discussion [~csun] yes, as you are saying this can be a general issue where without observer also if any of the standby namenode's zkfc is down we may encounter the same error. I shall change the description and add an UT As per current scenario, observer is not to be included in failover (Observer to active)and hence this fix is skipping observer in phase 3 . If multiple standby namenodes are present, with any one of the zkfc of standby namenode down,it can be skipped here as well ( when encountered connection refused) will work on it and upload the patch here .Also, will update the summary and description of this jira with another UT for it accordingly > [SBN Read]Failover from Active to Standby Failed > -- > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: ZKFC_issue.patch > > > *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused; For more > details see: [http://wiki.apache.org/hadoop/ConnectionRefused] > at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14528) [SBN Read]Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856489#comment-16856489 ] Ravuri Sushma sree commented on HDFS-14528: --- Thank you everyone for the discussion Chao Sun yes, as you are saying this can be a general issue where without observer also if any of the standby namenode's zkfc is down we may encounter the same error. I shall change the description and add an UT As per current scenario, observer is not to be included in failover (Observer to active)and hence this fix is skipping observer in phase 3 . If multiple standby namenodes are present, with any one of the zkfc of standby namenode down,it can be skipped here as well ( when encountered connection refused) will work on it and upload the patch here .Also, will update the summary and description of this jira with another UT for it accordingly > [SBN Read]Failover from Active to Standby Failed > -- > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: ZKFC_issue.patch > > > *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused; For more > details see: [http://wiki.apache.org/hadoop/ConnectionRefused] > at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14528) [SBN Read]Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravuri Sushma sree updated HDFS-14528: -- Attachment: (was: ZKFC_issue.patch) > [SBN Read]Failover from Active to Standby Failed > -- > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: ZKFC_issue.patch > > > *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused; For more > details see: [http://wiki.apache.org/hadoop/ConnectionRefused] > at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14528) [SBN Read]Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravuri Sushma sree updated HDFS-14528: -- Attachment: ZKFC_issue.patch Status: Patch Available (was: Open) > [SBN Read]Failover from Active to Standby Failed > -- > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: ZKFC_issue.patch, ZKFC_issue.patch > > > *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused; For more > details see: [http://wiki.apache.org/hadoop/ConnectionRefused] > at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14528) [SBN Read]Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravuri Sushma sree updated HDFS-14528: -- Attachment: ZKFC_issue.patch > [SBN Read]Failover from Active to Standby Failed > -- > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: ZKFC_issue.patch > > > *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused; For more > details see: [http://wiki.apache.org/hadoop/ConnectionRefused] > at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14528) [SBN Read]Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853015#comment-16853015 ] Ravuri Sushma sree commented on HDFS-14528: --- Can somebody assign this Jira to me, I have the patch ready for uploading > [SBN Read]Failover from Active to Standby Failed > -- > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Priority: Major > > *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused; For more > details see: [http://wiki.apache.org/hadoop/ConnectionRefused] > at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14528) [SBN Read]Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16852900#comment-16852900 ] Ravuri Sushma sree commented on HDFS-14528: --- h2. *EXCEPTION THROWN* Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on connection exception: java.net.ConnectException: Connection refused; For more details see: [http://wiki.apache.org/hadoop/ConnectionRefused] at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1556) at org.apache.hadoop.ipc.Client.call(Client.java:1498) at org.apache.hadoop.ipc.Client.call(Client.java:1397) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:234) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) at com.sun.proxy.$Proxy10.cedeActive(Unknown Source) at org.apache.hadoop.ha.protocolPB.ZKFCProtocolClientSideTranslatorPB.cedeActive(ZKFCProtocolClientSideTranslatorPB.java:64) at org.apache.hadoop.ha.ZKFailoverController.cedeRemoteActive(ZKFailoverController.java:730) at org.apache.hadoop.ha.ZKFailoverController.doGracefulFailover(ZKFailoverController.java:674) at org.apache.hadoop.ha.ZKFailoverController.access$400(ZKFailoverController.java:62) at org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:613) at org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:610) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.ha.ZKFailoverController.gracefulFailoverToYou(ZKFailoverController.java:610) at org.apache.hadoop.ha.ZKFCRpcServer.gracefulFailover(ZKFCRpcServer.java:94) at org.apache.hadoop.ha.protocolPB.ZKFCProtocolServerSideTranslatorPB.gracefulFailover(ZKFCProtocolServerSideTranslatorPB.java:61) at org.apache.hadoop.ha.proto.ZKFCProtocolProtos$ZKFCProtocolService$2.callBlockingMethod(ZKFCProtocolProtos.java:1548) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:862) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2791) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:720) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:823) at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:436) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1613) at org.apache.hadoop.ipc.Client.call(Client.java:1444) ... 25 more > [SBN Read]Failover from Active to Standby Failed > -- > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Priority: Major > > *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused; For more > details see: [http://wiki.apache.org/hadoop/ConnectionRefused] > at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapExcepti
[jira] [Updated] (HDFS-14528) [SBN Read]Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravuri Sushma sree updated HDFS-14528: -- Description: *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* *When trying to exectue the failover command from active to standby* *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* was: *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* *When trying to exectue the failover command from active to standby* *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* ** {color:#33}Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused{color} {color:#33} at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source){color} {color:#33} at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45){color} {color:#33} at java.lang.reflect.Constructor.newInstance(Constructor.java:422){color} {color:#33} at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831){color} {color:#33} at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755){color} {color:#33} at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1556){color} {color:#33} at org.apache.hadoop.ipc.Client.call(Client.java:1498){color} {color:#33} at org.apache.hadoop.ipc.Client.call(Client.java:1397){color} {color:#33} at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:234){color} {color:#33} at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118){color} {color:#33} at com.sun.proxy.$Proxy10.cedeActive(Unknown Source){color} {color:#33} at org.apache.hadoop.ha.protocolPB.ZKFCProtocolClientSideTranslatorPB.cedeActive(ZKFCProtocolClientSideTranslatorPB.java:64){color} {color:#33} at org.apache.hadoop.ha.ZKFailoverController.cedeRemoteActive(ZKFailoverController.java:730){color} {color:#33} at org.apache.hadoop.ha.ZKFailoverController.doGracefulFailover(ZKFailoverController.java:674){color} {color:#33} at org.apache.hadoop.ha.ZKFailoverController.access$400(ZKFailoverController.java:62){color} {color:#33} at org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:613){color} {color:#33} at org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:610){color} {color:#33} at java.security.AccessController.doPrivileged(Native Method){color} {color:#33} at javax.security.auth.Subject.doAs(Subject.java:422){color} {color:#33} at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729){color} {color:#33} at org.apache.hadoop.ha.ZKFailoverController.gracefulFailoverToYou(ZKFailoverController.java:610){color} {color:#33} at org.apache.hadoop.ha.ZKFCRpcServer.gracefulFailover(ZKFCRpcServer.java:94){color} {color:#33} at org.apache.hadoop.ha.protocolPB.ZKFCProtocolServerSideTranslatorPB.gracefulFailover(ZKFCProtocolServerSideTranslatorPB.java:61){color} {color:#33} at org.apache.hadoop.ha.proto.ZKFCProtocolProtos$ZKFCProtocolService$2.callBlockingMethod(ZKFCProtocolProtos.java:1548){color} {color:#33} at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530){color} {color:#33} at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036){color} {color:#33} at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927){color} {color:#33} at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:862){color} {color:#33} at java.security.AccessController.doPrivileged(Native Method){color} {color:#33} at javax.security.auth.Subject.doAs(Subject.java:422){color} {color:#33} at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729){color} {color:#33} at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2791){color} {color:#33}Caused by: java.net.ConnectException: Connection refused{color} {color:#33} at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method){color} {color:#33} at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717){color} {color:#33} at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206){color} {color:#33} at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531){color} {color:#33} at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:720){color} {color:#33} at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:823){color} {color:#33} at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:436){color} {color:#33} at org.apache.hadoop.ipc.Client.getConnecti
[jira] [Updated] (HDFS-14528) [SBN Read]Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravuri Sushma sree updated HDFS-14528: -- Description: *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* *When trying to exectue the failover command from active to standby* *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on connection exception: java.net.ConnectException: Connection refused; For more details see: [http://wiki.apache.org/hadoop/ConnectionRefused] at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) was: *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* *When trying to exectue the failover command from active to standby* *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > [SBN Read]Failover from Active to Standby Failed > -- > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Priority: Major > > *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused; For more > details see: [http://wiki.apache.org/hadoop/ConnectionRefused] > at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14528) [SBN Read]Failover from Active to Standby Failed
Ravuri Sushma sree created HDFS-14528: - Summary: [SBN Read]Failover from Active to Standby Failed Key: HDFS-14528 URL: https://issues.apache.org/jira/browse/HDFS-14528 Project: Hadoop HDFS Issue Type: Bug Components: ha Reporter: Ravuri Sushma sree *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* *When trying to exectue the failover command from active to standby* *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* ** {color:#33}Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused{color} {color:#33} at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source){color} {color:#33} at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45){color} {color:#33} at java.lang.reflect.Constructor.newInstance(Constructor.java:422){color} {color:#33} at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831){color} {color:#33} at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755){color} {color:#33} at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1556){color} {color:#33} at org.apache.hadoop.ipc.Client.call(Client.java:1498){color} {color:#33} at org.apache.hadoop.ipc.Client.call(Client.java:1397){color} {color:#33} at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:234){color} {color:#33} at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118){color} {color:#33} at com.sun.proxy.$Proxy10.cedeActive(Unknown Source){color} {color:#33} at org.apache.hadoop.ha.protocolPB.ZKFCProtocolClientSideTranslatorPB.cedeActive(ZKFCProtocolClientSideTranslatorPB.java:64){color} {color:#33} at org.apache.hadoop.ha.ZKFailoverController.cedeRemoteActive(ZKFailoverController.java:730){color} {color:#33} at org.apache.hadoop.ha.ZKFailoverController.doGracefulFailover(ZKFailoverController.java:674){color} {color:#33} at org.apache.hadoop.ha.ZKFailoverController.access$400(ZKFailoverController.java:62){color} {color:#33} at org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:613){color} {color:#33} at org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:610){color} {color:#33} at java.security.AccessController.doPrivileged(Native Method){color} {color:#33} at javax.security.auth.Subject.doAs(Subject.java:422){color} {color:#33} at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729){color} {color:#33} at org.apache.hadoop.ha.ZKFailoverController.gracefulFailoverToYou(ZKFailoverController.java:610){color} {color:#33} at org.apache.hadoop.ha.ZKFCRpcServer.gracefulFailover(ZKFCRpcServer.java:94){color} {color:#33} at org.apache.hadoop.ha.protocolPB.ZKFCProtocolServerSideTranslatorPB.gracefulFailover(ZKFCProtocolServerSideTranslatorPB.java:61){color} {color:#33} at org.apache.hadoop.ha.proto.ZKFCProtocolProtos$ZKFCProtocolService$2.callBlockingMethod(ZKFCProtocolProtos.java:1548){color} {color:#33} at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530){color} {color:#33} at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036){color} {color:#33} at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927){color} {color:#33} at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:862){color} {color:#33} at java.security.AccessController.doPrivileged(Native Method){color} {color:#33} at javax.security.auth.Subject.doAs(Subject.java:422){color} {color:#33} at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729){color} {color:#33} at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2791){color} {color:#33}Caused by: java.net.ConnectException: Connection refused{color} {color:#33} at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method){color} {color:#33} at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717){color} {color:#33} at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206){color} {color:#33} at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531){color} {color:#33} at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:720){color} {color:#33} at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:823){color} {color:#33} at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:436){color} {color:#33} at org.apache.hadoop.ipc.Client.getConnection(Client.java:1613){color} {color:#33} at org.apache.hadoop.ipc.
[jira] [Created] (HDFS-14358) Provide LiveNode and DeadNode filter in DataNode UI
Ravuri Sushma sree created HDFS-14358: - Summary: Provide LiveNode and DeadNode filter in DataNode UI Key: HDFS-14358 URL: https://issues.apache.org/jira/browse/HDFS-14358 Project: Hadoop HDFS Issue Type: Wish Affects Versions: 3.1.2 Reporter: Ravuri Sushma sree -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org