[jira] [Updated] (HBASE-22601) Misconfigured addition of peers leads to cluster shutdown.

2019-08-21 Thread Duo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-22601:
--
Fix Version/s: 2.1.6
   2.2.1
   2.3.0
   3.0.0

> Misconfigured addition of peers leads to cluster shutdown.
> --
>
> Key: HBASE-22601
> URL: https://issues.apache.org/jira/browse/HBASE-22601
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>
>
> Recently we added a peer to a production cluster which were in different 
> kerberos realm.
> *Steps to reproduce:*
>  1. Add a misconfigured peer which is in different kerberos realm.
>  2. Remove that peer.
>  3. All region servers will start to crash.
> *RCA*
>  Enabled trace logging on one Region server for a short amount of time.
>  After adding peer, saw the following log lines.
> {noformat}
> 2019-06-18 22:19:20,949 INFO  [main-EventThread] 
> replication.ReplicationTrackerZKImpl - /hbase/replication/peers znode 
> expired, triggering peerListChanged event
> 2019-06-18 22:19:20,992 INFO  [main-EventThread] 
> replication.ReplicationPeersZKImpl - Added new peer 
> cluster=:/hbase
> 2019-06-18 22:19:21,113 INFO  [main-EventThread] 
> zookeeper.RecoverableZooKeeper - Process identifier=hconnection-0x794a56d6 
> connecting to ZooKeeper ensemble=
> 2019-06-18 22:20:01,280 WARN  [main-EventThread] zookeeper.ZKUtil - 
> hconnection-0x794a56d6-0x16b56265fbebb1b, quorum=, 
> baseZNode=/hbase Unable to set watcher on znode (/hbase/hbaseid)
> org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
> AuthFailed for /hbase/hbaseid
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1102)
> at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:220)
> at 
> org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:421)
> at 
> org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65)
> at 
> org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:105)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.java:922)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.(ConnectionManager.java:706)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.(ConnectionManager.java:638)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at 
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:432)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager.createConnectionInternal(ConnectionManager.java:341)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:144)
> at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.init(HBaseInterClusterReplicationEndpoint.java:135)
> at 
> com.salesforce.hbase.replication.TenantReplicationEndpoint.init(TenantReplicationEndpoint.java:30)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.getReplicationSource(ReplicationSourceManager.java:517)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.addSource(ReplicationSourceManager.java:273)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.peerListChanged(ReplicationSourceManager.java:635)
> at 
> org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl$PeersWatcher.nodeChildrenChanged(ReplicationTrackerZKImpl.java:192)
> at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:643)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:544)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:519)
> 2019-06-18 22:20:42,999 WARN  [Source,] zookeeper.ZKUtil - 
> connection to 

[jira] [Updated] (HBASE-22601) Misconfigured addition of peers leads to cluster shutdown.

2019-06-18 Thread Rushabh S Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated HBASE-22601:
---
Description: 
Recently we added a peer to a production cluster which were in different 
kerberos realm.

*Steps to reproduce:*
 1. Add a misconfigured peer which is in different kerberos realm.
 2. Remove that peer.
 3. All region servers will start to crash.

*RCA*
 Enabled trace logging on one Region server for a short amount of time.
 After adding peer, saw the following log lines.
{noformat}
2019-06-18 22:19:20,949 INFO  [main-EventThread] 
replication.ReplicationTrackerZKImpl - /hbase/replication/peers znode expired, 
triggering peerListChanged event
2019-06-18 22:19:20,992 INFO  [main-EventThread] 
replication.ReplicationPeersZKImpl - Added new peer 
cluster=:/hbase
2019-06-18 22:19:21,113 INFO  [main-EventThread] zookeeper.RecoverableZooKeeper 
- Process identifier=hconnection-0x794a56d6 connecting to ZooKeeper 
ensemble=

2019-06-18 22:20:01,280 WARN  [main-EventThread] zookeeper.ZKUtil - 
hconnection-0x794a56d6-0x16b56265fbebb1b, quorum=, 
baseZNode=/hbase Unable to set watcher on znode (/hbase/hbaseid)
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
AuthFailed for /hbase/hbaseid
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1102)
at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:220)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:421)
at 
org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65)
at 
org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:105)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.java:922)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.(ConnectionManager.java:706)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.(ConnectionManager.java:638)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
at 
org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:432)
at 
org.apache.hadoop.hbase.client.ConnectionManager.createConnectionInternal(ConnectionManager.java:341)
at 
org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:144)
at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.init(HBaseInterClusterReplicationEndpoint.java:135)
at 
com.salesforce.hbase.replication.TenantReplicationEndpoint.init(TenantReplicationEndpoint.java:30)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.getReplicationSource(ReplicationSourceManager.java:517)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.addSource(ReplicationSourceManager.java:273)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.peerListChanged(ReplicationSourceManager.java:635)
at 
org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl$PeersWatcher.nodeChildrenChanged(ReplicationTrackerZKImpl.java:192)
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:643)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:544)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:519)


2019-06-18 22:20:42,999 WARN  [Source,] zookeeper.ZKUtil - 
connection to cluster: -0x26b56265fe7b5cd, 
quorum=, baseZNode=/hbase Unable to set watcher on znode 
(/hbase/hbaseid)

org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
AuthFailed for /hbase/hbaseid

        at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)

        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1102)

        at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:220)

        at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:421)

        at 

[jira] [Updated] (HBASE-22601) Misconfigured addition of peers leads to cluster shutdown.

2019-06-18 Thread Rushabh S Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated HBASE-22601:
---
Description: 
Recently we added a peer to a production cluster which were in different 
kerberos realm.

*Steps to reproduce:*
 1. Add a misconfigured peer which is in different kerberos realm.
 2. Remove that peer.
 3. All region servers will start to crash.

*RCA*
 Enabled trace logging on one Region server for a short amount of time.
 After adding peer, saw the following log lines.
{noformat}
2019-06-18 22:19:20,949 INFO  [main-EventThread] 
replication.ReplicationTrackerZKImpl - /hbase/replication/peers znode expired, 
triggering peerListChanged event
2019-06-18 22:19:20,992 INFO  [main-EventThread] 
replication.ReplicationPeersZKImpl - Added new peer 
cluster=:/hbase
2019-06-18 22:19:21,113 INFO  [main-EventThread] zookeeper.RecoverableZooKeeper 
- Process identifier=hconnection-0x794a56d6 connecting to ZooKeeper 
ensemble=

2019-06-18 22:20:01,280 WARN  [main-EventThread] zookeeper.ZKUtil - 
hconnection-0x794a56d6-0x16b56265fbebb1b, quorum=, 
baseZNode=/hbase Unable to set watcher on znode (/hbase/hbaseid)
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
AuthFailed for /hbase/hbaseid
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1102)
at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:220)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:421)
at 
org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65)
at 
org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:105)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.java:922)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.(ConnectionManager.java:706)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.(ConnectionManager.java:638)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
at 
org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:432)
at 
org.apache.hadoop.hbase.client.ConnectionManager.createConnectionInternal(ConnectionManager.java:341)
at 
org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:144)
at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.init(HBaseInterClusterReplicationEndpoint.java:135)
at 
com.salesforce.hbase.replication.TenantReplicationEndpoint.init(TenantReplicationEndpoint.java:30)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.getReplicationSource(ReplicationSourceManager.java:517)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.addSource(ReplicationSourceManager.java:273)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.peerListChanged(ReplicationSourceManager.java:635)
at 
org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl$PeersWatcher.nodeChildrenChanged(ReplicationTrackerZKImpl.java:192)
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:643)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:544)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:519)

2019-06-18 22:20:43,002 TRACE [Source,] 
regionserver.ReplicationSource - Cannot contact the peer's zk ensemble, 
sleeping 1000 times 1


2019-06-18 22:20:44,008 TRACE [Source,] 
regionserver.ReplicationSource - Cannot contact the peer's zk ensemble, 
sleeping 1000 times 2

{noformat}
This goes on and on until we removed the peer.
 After removing the peer,
{noformat}
2019-06-18 22:21:20,731 INFO  [main-EventThread] 
replication.ReplicationTrackerZKImpl - /hbase/replication/peers/ 
znode expired, triggering peerRemoved event
2019-06-18 22:21:20,731 INFO  [main-EventThread] 
regionserver.ReplicationSourceManager - Closing the following queue 
, currently have 2 and another 0 that were recovered
2019-06-18 22:21:20,733 INFO  [main-EventThread]