Replication Lag Issue in HBase DR Cluster after Upgrade

Valli Fri, 11 Aug 2023 07:51:01 -0700

Hello HBase Community,

We recently upgraded our HBase cluster from version 1.2.6 to 1.4.14 and
have encountered an issue with replication lag in our Disaster Recovery
(DR) cluster. We have two clusters in our setup: an active write cluster
and a DR cluster that receives replication from the active cluster. The
replication lag in the DR cluster has been building up, even though there
are no direct writes to it.


Here's a brief overview of the problem:
- We have an active write cluster with no replication lag.
- The DR cluster only receives replication from the active cluster and
doesn't have direct writes.
- Replication lag builds up in the DR cluster over time, even though there
is no active write.
- When a 'put' call is made in the DR cluster, the replication lag reduces
momentarily, but then starts building up .

We have experienced similar kind of issue in 1.4.9 version in another
cluster.  We used the below patch for it.

https://issues.apache.org/jira/browse/HBASE-22784

But 1.4.14 version contains above patch but still we experience issue.

If there are any specific configurations or adjustments we should be making
to address this problem. It's important for us to maintain a reliable DR
setup, and any guidance or insights you can provide would be greatly
appreciated.

If anyone has experienced a similar issue after upgrading HBase or has any
recommendations on how to troubleshoot and resolve replication lag in a DR
cluster, please share your thoughts.

Thank you in advance for your time and assistance. Your expertise and
insights are invaluable to us as we work to resolve this issue and maintain
the stability of our HBase setup.

Best regards,
Manimekalai K
-- 
*Regards,*
*Manimekalai K*

Replication Lag Issue in HBase DR Cluster after Upgrade

Reply via email to