Sandeep Pal created HBASE-25741:
-----------------------------------

             Summary: Replication Source still having the replication metrics 
for peer ID which doesn't exist.
                 Key: HBASE-25741
                 URL: https://issues.apache.org/jira/browse/HBASE-25741
             Project: HBase
          Issue Type: Bug
    Affects Versions: 1.8.0
            Reporter: Sandeep Pal
            Assignee: Sandeep Pal


We have observed that replication source metrics for peer exists on some region 
servers even though peer has been removed.  This is because when we encounter 
the NoNodeException in ReplicationSource, it calls the `peerRemoved` workflow 
which should eventually terminate the source and removes the source from the 
source manager. Now, the problem is ReplicationSource thread terminates itself 
and thus the action to removePeer is not complete leaving the metrics there 
forever for source. This is the flow, replication source trying to clean wals 
[here|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java#L801]
 and on NoNodeException it calls the 
[peerRemoved|https://github.com/apache/hbase/blob/b231dd620f107b488b88599e16dc846eb856972c/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java#L244]
 and terminate the source (itself), leaving the terminated source there in 
sourcemanager and not clearing it's 
[metrics|https://github.com/apache/hbase/blob/b231dd620f107b488b88599e16dc846eb856972c/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java#L645].

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to