[jira] [Updated] (HBASE-18111) Replication stuck when cluster connection is closed

2018-12-11 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18111:
---
Fix Version/s: 1.3.3

> Replication stuck when cluster connection is closed
> ---
>
> Key: HBASE-18111
> URL: https://issues.apache.org/jira/browse/HBASE-18111
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.0, 1.3.1, 1.2.5, 0.98.24, 1.1.10, 2.0.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 1.4.0, 1.3.3, 2.0.0
>
> Attachments: HBASE-18111-v1.patch, HBASE-18111-v2.patch, 
> HBASE-18111.patch
>
>
> Log:
> {code}
> 2017-05-24,03:01:25,603 ERROR [regionserver13700-SendThread(hostxxx:11000)] 
> org.apache.zookeeper.ClientCnxn: SASL authentication with Zookeeper Quorum 
> member failed: javax.security.sasl.SaslException: An error: 
> (java.security.PrivilegedActionException: javax.security.sasl.SaslException: 
> GSS initiate failed [Caused by GSSException: No valid credentials provided 
> (Mechanism level: Connection reset)]) occurred when evaluating Zookeeper 
> Quorum Member's  received SASL token. Zookeeper Client will go to AUTH_FAILED 
> state.
> 2017-05-24,03:01:25,615 FATAL [regionserver13700-EventThread] 
> org.apache.hadoop.hbase.client.HConnectionImplementation: 
> hconnection-0x1148dd9b-0x35b6b4d4ca999c6, 
> quorum=10.108.37.30:11000,10.108.38.30:11000,10.108.39.30:11000,10.108.84.25:11000,10.108.84.32:11000,
>  baseZNode=/hbase/c3prc-xiaomi98 hconnection-0x1148dd9b-0x35b6b4d4ca999c6 
> received auth failed from ZooKeeper, aborting
> org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
> AuthFailed
> at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:425)
> at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:333)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-05-24,03:01:25,615 INFO [regionserver13700-EventThread] 
> org.apache.hadoop.hbase.client.HConnectionImplementation: Closing zookeeper 
> sessionid=0x35b6b4d4ca999c6
> 2017-05-24,03:01:25,623 WARN [regionserver13700.replicationSource,800] 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
>  Replicate edites to peer cluster failed.
> java.io.IOException: Call to hostxxx/10.136.22.6:24600 failed on local 
> exception: java.io.IOException: Connection closed
> {code}
> jstack
> {code}
>  java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.sleepForRetries(HBaseInterClusterReplicationEndpoint.java:127)
> at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:199)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:905)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:492)
> {code}
> The cluster connection was aborted when the ZookeeperWatcher receive a 
> AuthFailed event. Then the HBaseInterClusterReplicationEndpoint's replicate() 
> method will stuck in a while loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18111) Replication stuck when cluster connection is closed

2017-06-01 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18111:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 1.4.0
   2.0.0
   Status: Resolved  (was: Patch Available)

> Replication stuck when cluster connection is closed
> ---
>
> Key: HBASE-18111
> URL: https://issues.apache.org/jira/browse/HBASE-18111
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.4.0, 1.3.1, 1.2.5, 0.98.24, 1.1.10
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-18111.patch, HBASE-18111-v1.patch, 
> HBASE-18111-v2.patch
>
>
> Log:
> {code}
> 2017-05-24,03:01:25,603 ERROR [regionserver13700-SendThread(hostxxx:11000)] 
> org.apache.zookeeper.ClientCnxn: SASL authentication with Zookeeper Quorum 
> member failed: javax.security.sasl.SaslException: An error: 
> (java.security.PrivilegedActionException: javax.security.sasl.SaslException: 
> GSS initiate failed [Caused by GSSException: No valid credentials provided 
> (Mechanism level: Connection reset)]) occurred when evaluating Zookeeper 
> Quorum Member's  received SASL token. Zookeeper Client will go to AUTH_FAILED 
> state.
> 2017-05-24,03:01:25,615 FATAL [regionserver13700-EventThread] 
> org.apache.hadoop.hbase.client.HConnectionImplementation: 
> hconnection-0x1148dd9b-0x35b6b4d4ca999c6, 
> quorum=10.108.37.30:11000,10.108.38.30:11000,10.108.39.30:11000,10.108.84.25:11000,10.108.84.32:11000,
>  baseZNode=/hbase/c3prc-xiaomi98 hconnection-0x1148dd9b-0x35b6b4d4ca999c6 
> received auth failed from ZooKeeper, aborting
> org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
> AuthFailed
> at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:425)
> at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:333)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-05-24,03:01:25,615 INFO [regionserver13700-EventThread] 
> org.apache.hadoop.hbase.client.HConnectionImplementation: Closing zookeeper 
> sessionid=0x35b6b4d4ca999c6
> 2017-05-24,03:01:25,623 WARN [regionserver13700.replicationSource,800] 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
>  Replicate edites to peer cluster failed.
> java.io.IOException: Call to hostxxx/10.136.22.6:24600 failed on local 
> exception: java.io.IOException: Connection closed
> {code}
> jstack
> {code}
>  java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.sleepForRetries(HBaseInterClusterReplicationEndpoint.java:127)
> at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:199)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:905)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:492)
> {code}
> The cluster connection was aborted when the ZookeeperWatcher receive a 
> AuthFailed event. Then the HBaseInterClusterReplicationEndpoint's replicate() 
> method will stuck in a while loop.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18111) Replication stuck when cluster connection is closed

2017-05-31 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18111:
---
Attachment: HBASE-18111-v2.patch

bq. HBaseInterClusterReplicationEndpoint is a hbase client and write entries to 
peer cluster. We should handle the connection close case no matter what reason 
lead it? And now the replication will stuck in the while loop.

Sounds reasonable to me. Here's a v2 patch rebased on latest master for another 
round of HadoopQA

> Replication stuck when cluster connection is closed
> ---
>
> Key: HBASE-18111
> URL: https://issues.apache.org/jira/browse/HBASE-18111
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.4.0, 1.3.1, 1.2.5, 0.98.24, 1.1.10
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Attachments: HBASE-18111.patch, HBASE-18111-v1.patch, 
> HBASE-18111-v2.patch
>
>
> Log:
> {code}
> 2017-05-24,03:01:25,603 ERROR [regionserver13700-SendThread(hostxxx:11000)] 
> org.apache.zookeeper.ClientCnxn: SASL authentication with Zookeeper Quorum 
> member failed: javax.security.sasl.SaslException: An error: 
> (java.security.PrivilegedActionException: javax.security.sasl.SaslException: 
> GSS initiate failed [Caused by GSSException: No valid credentials provided 
> (Mechanism level: Connection reset)]) occurred when evaluating Zookeeper 
> Quorum Member's  received SASL token. Zookeeper Client will go to AUTH_FAILED 
> state.
> 2017-05-24,03:01:25,615 FATAL [regionserver13700-EventThread] 
> org.apache.hadoop.hbase.client.HConnectionImplementation: 
> hconnection-0x1148dd9b-0x35b6b4d4ca999c6, 
> quorum=10.108.37.30:11000,10.108.38.30:11000,10.108.39.30:11000,10.108.84.25:11000,10.108.84.32:11000,
>  baseZNode=/hbase/c3prc-xiaomi98 hconnection-0x1148dd9b-0x35b6b4d4ca999c6 
> received auth failed from ZooKeeper, aborting
> org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
> AuthFailed
> at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:425)
> at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:333)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-05-24,03:01:25,615 INFO [regionserver13700-EventThread] 
> org.apache.hadoop.hbase.client.HConnectionImplementation: Closing zookeeper 
> sessionid=0x35b6b4d4ca999c6
> 2017-05-24,03:01:25,623 WARN [regionserver13700.replicationSource,800] 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
>  Replicate edites to peer cluster failed.
> java.io.IOException: Call to hostxxx/10.136.22.6:24600 failed on local 
> exception: java.io.IOException: Connection closed
> {code}
> jstack
> {code}
>  java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.sleepForRetries(HBaseInterClusterReplicationEndpoint.java:127)
> at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:199)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:905)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:492)
> {code}
> The cluster connection was aborted when the ZookeeperWatcher receive a 
> AuthFailed event. Then the HBaseInterClusterReplicationEndpoint's replicate() 
> method will stuck in a while loop.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18111) Replication stuck when cluster connection is closed

2017-05-24 Thread Guanghao Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-18111:
---
Attachment: HBASE-18111-v1.patch

> Replication stuck when cluster connection is closed
> ---
>
> Key: HBASE-18111
> URL: https://issues.apache.org/jira/browse/HBASE-18111
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.4.0, 1.3.1, 1.2.5, 0.98.24, 1.1.10
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Attachments: HBASE-18111.patch, HBASE-18111-v1.patch
>
>
> Log:
> {code}
> 2017-05-24,03:01:25,603 ERROR [regionserver13700-SendThread(hostxxx:11000)] 
> org.apache.zookeeper.ClientCnxn: SASL authentication with Zookeeper Quorum 
> member failed: javax.security.sasl.SaslException: An error: 
> (java.security.PrivilegedActionException: javax.security.sasl.SaslException: 
> GSS initiate failed [Caused by GSSException: No valid credentials provided 
> (Mechanism level: Connection reset)]) occurred when evaluating Zookeeper 
> Quorum Member's  received SASL token. Zookeeper Client will go to AUTH_FAILED 
> state.
> 2017-05-24,03:01:25,615 FATAL [regionserver13700-EventThread] 
> org.apache.hadoop.hbase.client.HConnectionImplementation: 
> hconnection-0x1148dd9b-0x35b6b4d4ca999c6, 
> quorum=10.108.37.30:11000,10.108.38.30:11000,10.108.39.30:11000,10.108.84.25:11000,10.108.84.32:11000,
>  baseZNode=/hbase/c3prc-xiaomi98 hconnection-0x1148dd9b-0x35b6b4d4ca999c6 
> received auth failed from ZooKeeper, aborting
> org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
> AuthFailed
> at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:425)
> at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:333)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-05-24,03:01:25,615 INFO [regionserver13700-EventThread] 
> org.apache.hadoop.hbase.client.HConnectionImplementation: Closing zookeeper 
> sessionid=0x35b6b4d4ca999c6
> 2017-05-24,03:01:25,623 WARN [regionserver13700.replicationSource,800] 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
>  Replicate edites to peer cluster failed.
> java.io.IOException: Call to hostxxx/10.136.22.6:24600 failed on local 
> exception: java.io.IOException: Connection closed
> {code}
> jstack
> {code}
>  java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.sleepForRetries(HBaseInterClusterReplicationEndpoint.java:127)
> at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:199)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:905)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:492)
> {code}
> The cluster connection was aborted when the ZookeeperWatcher receive a 
> AuthFailed event. Then the HBaseInterClusterReplicationEndpoint's replicate() 
> method will stuck in a while loop.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18111) Replication stuck when cluster connection is closed

2017-05-24 Thread Guanghao Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-18111:
---
Status: Patch Available  (was: Open)

> Replication stuck when cluster connection is closed
> ---
>
> Key: HBASE-18111
> URL: https://issues.apache.org/jira/browse/HBASE-18111
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.10, 0.98.24, 1.2.5, 1.3.1, 2.0.0, 1.4.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Attachments: HBASE-18111.patch
>
>
> Log:
> {code}
> 2017-05-24,03:01:25,603 ERROR [regionserver13700-SendThread(hostxxx:11000)] 
> org.apache.zookeeper.ClientCnxn: SASL authentication with Zookeeper Quorum 
> member failed: javax.security.sasl.SaslException: An error: 
> (java.security.PrivilegedActionException: javax.security.sasl.SaslException: 
> GSS initiate failed [Caused by GSSException: No valid credentials provided 
> (Mechanism level: Connection reset)]) occurred when evaluating Zookeeper 
> Quorum Member's  received SASL token. Zookeeper Client will go to AUTH_FAILED 
> state.
> 2017-05-24,03:01:25,615 FATAL [regionserver13700-EventThread] 
> org.apache.hadoop.hbase.client.HConnectionImplementation: 
> hconnection-0x1148dd9b-0x35b6b4d4ca999c6, 
> quorum=10.108.37.30:11000,10.108.38.30:11000,10.108.39.30:11000,10.108.84.25:11000,10.108.84.32:11000,
>  baseZNode=/hbase/c3prc-xiaomi98 hconnection-0x1148dd9b-0x35b6b4d4ca999c6 
> received auth failed from ZooKeeper, aborting
> org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
> AuthFailed
> at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:425)
> at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:333)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-05-24,03:01:25,615 INFO [regionserver13700-EventThread] 
> org.apache.hadoop.hbase.client.HConnectionImplementation: Closing zookeeper 
> sessionid=0x35b6b4d4ca999c6
> 2017-05-24,03:01:25,623 WARN [regionserver13700.replicationSource,800] 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
>  Replicate edites to peer cluster failed.
> java.io.IOException: Call to hostxxx/10.136.22.6:24600 failed on local 
> exception: java.io.IOException: Connection closed
> {code}
> jstack
> {code}
>  java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.sleepForRetries(HBaseInterClusterReplicationEndpoint.java:127)
> at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:199)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:905)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:492)
> {code}
> The cluster connection was aborted when the ZookeeperWatcher receive a 
> AuthFailed event. Then the HBaseInterClusterReplicationEndpoint's replicate() 
> method will stuck in a while loop.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18111) Replication stuck when cluster connection is closed

2017-05-24 Thread Guanghao Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-18111:
---
Attachment: HBASE-18111.patch

> Replication stuck when cluster connection is closed
> ---
>
> Key: HBASE-18111
> URL: https://issues.apache.org/jira/browse/HBASE-18111
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.4.0, 1.3.1, 1.2.5, 0.98.24, 1.1.10
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Attachments: HBASE-18111.patch
>
>
> Log:
> {code}
> 2017-05-24,03:01:25,603 ERROR [regionserver13700-SendThread(hostxxx:11000)] 
> org.apache.zookeeper.ClientCnxn: SASL authentication with Zookeeper Quorum 
> member failed: javax.security.sasl.SaslException: An error: 
> (java.security.PrivilegedActionException: javax.security.sasl.SaslException: 
> GSS initiate failed [Caused by GSSException: No valid credentials provided 
> (Mechanism level: Connection reset)]) occurred when evaluating Zookeeper 
> Quorum Member's  received SASL token. Zookeeper Client will go to AUTH_FAILED 
> state.
> 2017-05-24,03:01:25,615 FATAL [regionserver13700-EventThread] 
> org.apache.hadoop.hbase.client.HConnectionImplementation: 
> hconnection-0x1148dd9b-0x35b6b4d4ca999c6, 
> quorum=10.108.37.30:11000,10.108.38.30:11000,10.108.39.30:11000,10.108.84.25:11000,10.108.84.32:11000,
>  baseZNode=/hbase/c3prc-xiaomi98 hconnection-0x1148dd9b-0x35b6b4d4ca999c6 
> received auth failed from ZooKeeper, aborting
> org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
> AuthFailed
> at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:425)
> at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:333)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-05-24,03:01:25,615 INFO [regionserver13700-EventThread] 
> org.apache.hadoop.hbase.client.HConnectionImplementation: Closing zookeeper 
> sessionid=0x35b6b4d4ca999c6
> 2017-05-24,03:01:25,623 WARN [regionserver13700.replicationSource,800] 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
>  Replicate edites to peer cluster failed.
> java.io.IOException: Call to hostxxx/10.136.22.6:24600 failed on local 
> exception: java.io.IOException: Connection closed
> {code}
> jstack
> {code}
>  java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.sleepForRetries(HBaseInterClusterReplicationEndpoint.java:127)
> at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:199)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:905)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:492)
> {code}
> The cluster connection was aborted when the ZookeeperWatcher receive a 
> AuthFailed event. Then the HBaseInterClusterReplicationEndpoint's replicate() 
> method will stuck in a while loop.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)