[jira] [Updated] (HBASE-22601) Misconfigured addition of peers leads to cluster shutdown.
[ https://issues.apache.org/jira/browse/HBASE-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-22601: -- Fix Version/s: 2.1.6 2.2.1 2.3.0 3.0.0 > Misconfigured addition of peers leads to cluster shutdown. > -- > > Key: HBASE-22601 > URL: https://issues.apache.org/jira/browse/HBASE-22601 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.2 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.1, 2.1.6 > > > Recently we added a peer to a production cluster which were in different > kerberos realm. > *Steps to reproduce:* > 1. Add a misconfigured peer which is in different kerberos realm. > 2. Remove that peer. > 3. All region servers will start to crash. > *RCA* > Enabled trace logging on one Region server for a short amount of time. > After adding peer, saw the following log lines. > {noformat} > 2019-06-18 22:19:20,949 INFO [main-EventThread] > replication.ReplicationTrackerZKImpl - /hbase/replication/peers znode > expired, triggering peerListChanged event > 2019-06-18 22:19:20,992 INFO [main-EventThread] > replication.ReplicationPeersZKImpl - Added new peer > cluster=:/hbase > 2019-06-18 22:19:21,113 INFO [main-EventThread] > zookeeper.RecoverableZooKeeper - Process identifier=hconnection-0x794a56d6 > connecting to ZooKeeper ensemble= > 2019-06-18 22:20:01,280 WARN [main-EventThread] zookeeper.ZKUtil - > hconnection-0x794a56d6-0x16b56265fbebb1b, quorum=, > baseZNode=/hbase Unable to set watcher on znode (/hbase/hbaseid) > org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = > AuthFailed for /hbase/hbaseid > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:123) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1102) > at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:220) > at > org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:421) > at > org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65) > at > org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:105) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.java:922) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.(ConnectionManager.java:706) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.(ConnectionManager.java:638) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238) > at > org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:432) > at > org.apache.hadoop.hbase.client.ConnectionManager.createConnectionInternal(ConnectionManager.java:341) > at > org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:144) > at > org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.init(HBaseInterClusterReplicationEndpoint.java:135) > at > com.salesforce.hbase.replication.TenantReplicationEndpoint.init(TenantReplicationEndpoint.java:30) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.getReplicationSource(ReplicationSourceManager.java:517) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.addSource(ReplicationSourceManager.java:273) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.peerListChanged(ReplicationSourceManager.java:635) > at > org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl$PeersWatcher.nodeChildrenChanged(ReplicationTrackerZKImpl.java:192) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:643) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:544) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:519) > 2019-06-18 22:20:42,999 WARN [Source,] zookeeper.ZKUtil - > connection to
[jira] [Updated] (HBASE-22601) Misconfigured addition of peers leads to cluster shutdown.
[ https://issues.apache.org/jira/browse/HBASE-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HBASE-22601: --- Description: Recently we added a peer to a production cluster which were in different kerberos realm. *Steps to reproduce:* 1. Add a misconfigured peer which is in different kerberos realm. 2. Remove that peer. 3. All region servers will start to crash. *RCA* Enabled trace logging on one Region server for a short amount of time. After adding peer, saw the following log lines. {noformat} 2019-06-18 22:19:20,949 INFO [main-EventThread] replication.ReplicationTrackerZKImpl - /hbase/replication/peers znode expired, triggering peerListChanged event 2019-06-18 22:19:20,992 INFO [main-EventThread] replication.ReplicationPeersZKImpl - Added new peer cluster=:/hbase 2019-06-18 22:19:21,113 INFO [main-EventThread] zookeeper.RecoverableZooKeeper - Process identifier=hconnection-0x794a56d6 connecting to ZooKeeper ensemble= 2019-06-18 22:20:01,280 WARN [main-EventThread] zookeeper.ZKUtil - hconnection-0x794a56d6-0x16b56265fbebb1b, quorum=, baseZNode=/hbase Unable to set watcher on znode (/hbase/hbaseid) org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /hbase/hbaseid at org.apache.zookeeper.KeeperException.create(KeeperException.java:123) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1102) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:220) at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:421) at org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65) at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:105) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.java:922) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.(ConnectionManager.java:706) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.(ConnectionManager.java:638) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238) at org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:432) at org.apache.hadoop.hbase.client.ConnectionManager.createConnectionInternal(ConnectionManager.java:341) at org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:144) at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.init(HBaseInterClusterReplicationEndpoint.java:135) at com.salesforce.hbase.replication.TenantReplicationEndpoint.init(TenantReplicationEndpoint.java:30) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.getReplicationSource(ReplicationSourceManager.java:517) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.addSource(ReplicationSourceManager.java:273) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.peerListChanged(ReplicationSourceManager.java:635) at org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl$PeersWatcher.nodeChildrenChanged(ReplicationTrackerZKImpl.java:192) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:643) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:544) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:519) 2019-06-18 22:20:42,999 WARN [Source,] zookeeper.ZKUtil - connection to cluster: -0x26b56265fe7b5cd, quorum=, baseZNode=/hbase Unable to set watcher on znode (/hbase/hbaseid) org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /hbase/hbaseid at org.apache.zookeeper.KeeperException.create(KeeperException.java:123) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1102) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:220) at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:421) at
[jira] [Updated] (HBASE-22601) Misconfigured addition of peers leads to cluster shutdown.
[ https://issues.apache.org/jira/browse/HBASE-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HBASE-22601: --- Description: Recently we added a peer to a production cluster which were in different kerberos realm. *Steps to reproduce:* 1. Add a misconfigured peer which is in different kerberos realm. 2. Remove that peer. 3. All region servers will start to crash. *RCA* Enabled trace logging on one Region server for a short amount of time. After adding peer, saw the following log lines. {noformat} 2019-06-18 22:19:20,949 INFO [main-EventThread] replication.ReplicationTrackerZKImpl - /hbase/replication/peers znode expired, triggering peerListChanged event 2019-06-18 22:19:20,992 INFO [main-EventThread] replication.ReplicationPeersZKImpl - Added new peer cluster=:/hbase 2019-06-18 22:19:21,113 INFO [main-EventThread] zookeeper.RecoverableZooKeeper - Process identifier=hconnection-0x794a56d6 connecting to ZooKeeper ensemble= 2019-06-18 22:20:01,280 WARN [main-EventThread] zookeeper.ZKUtil - hconnection-0x794a56d6-0x16b56265fbebb1b, quorum=, baseZNode=/hbase Unable to set watcher on znode (/hbase/hbaseid) org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /hbase/hbaseid at org.apache.zookeeper.KeeperException.create(KeeperException.java:123) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1102) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:220) at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:421) at org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65) at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:105) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.java:922) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.(ConnectionManager.java:706) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.(ConnectionManager.java:638) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238) at org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:432) at org.apache.hadoop.hbase.client.ConnectionManager.createConnectionInternal(ConnectionManager.java:341) at org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:144) at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.init(HBaseInterClusterReplicationEndpoint.java:135) at com.salesforce.hbase.replication.TenantReplicationEndpoint.init(TenantReplicationEndpoint.java:30) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.getReplicationSource(ReplicationSourceManager.java:517) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.addSource(ReplicationSourceManager.java:273) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.peerListChanged(ReplicationSourceManager.java:635) at org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl$PeersWatcher.nodeChildrenChanged(ReplicationTrackerZKImpl.java:192) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:643) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:544) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:519) 2019-06-18 22:20:43,002 TRACE [Source,] regionserver.ReplicationSource - Cannot contact the peer's zk ensemble, sleeping 1000 times 1 2019-06-18 22:20:44,008 TRACE [Source,] regionserver.ReplicationSource - Cannot contact the peer's zk ensemble, sleeping 1000 times 2 {noformat} This goes on and on until we removed the peer. After removing the peer, {noformat} 2019-06-18 22:21:20,731 INFO [main-EventThread] replication.ReplicationTrackerZKImpl - /hbase/replication/peers/ znode expired, triggering peerRemoved event 2019-06-18 22:21:20,731 INFO [main-EventThread] regionserver.ReplicationSourceManager - Closing the following queue , currently have 2 and another 0 that were recovered 2019-06-18 22:21:20,733 INFO [main-EventThread]