Andrew Purtell created HBASE-20597:
--------------------------------------

             Summary: Use a lock to serialize access to a shared reference to 
ZooKeeperWatcher in HBaseReplicationEndpoint
                 Key: HBASE-20597
                 URL: https://issues.apache.org/jira/browse/HBASE-20597
             Project: HBase
          Issue Type: Bug
    Affects Versions: 1.4.4, 1.3.2
            Reporter: Andrew Purtell
            Assignee: Andrew Purtell
             Fix For: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 2.0.1, 1.4.5


The code that closes down a ZKW that fails to initialize when attempting to 
connect to the remote cluster is not MT safe and can in theory leak 
ZooKeeperWatcher instances. The allocation of a new ZKW and store to the 
reference is not atomic. Might have concurrent allocations with only one 
winning store, leading to leaked ZKW instances. If the connection problem is 
persistent, like loss of shared trust between the clusters, we may accumulate 
unclosed ZKW instances over time, with a ZK send thread and event thread each, 
and eventually have enough leaked threads to cause OOME (cannot allocate native 
thread). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to