I have exactly this issue. Fixed by moving replicated table region to “stalled” region server. Kim, thank you for descriptive answer and good luck in fixing.
> On 8 Nov 2019, at 06:01, Jungdae Kim <[email protected]> wrote: > > Hello, Alexander > > HBase 1.4.x have some issues related to updating the position of WALs being > replicated. > one of the issues is about stacking old WALs, when a region server has no > regions of the table being replicated, or no mutations come in for a while. > > I'm not sure you have the same issue, with the your logs. > If you are suffering the same issue, you can find many old WALs in HDFS > oldWals directory({hbase.rootdir}/oldWALs), and in zookeeper replication > queues ({znodeParent/replication/rs/{rs}/{peer}/}, and also detour the > issue by assigning a region of tables being replicated to the region server. > > The issue has already reported ( > https://issues.apache.org/jira/browse/HBASE-22784), and resolved. > But, unfortunately, the patch spawned the other issues such as region > server aborting (https://issues.apache.org/jira/browse/HBASE-23169) > > I'm working on these issues in > https://issues.apache.org/jira/browse/HBASE-23205 (not merged yet) > > I hope this will be helpful to you. > > On Thu, Nov 7, 2019 at 12:30 AM Alexander Batyrshin <[email protected]> > wrote: > >> Hello all, >> Sometimes we observer that replication is not working at HBase-1.4.10 >> >> hbase07.prod.hbcluster: >> SOURCE: PeerID=lp_analytics, AgeOfLastShippedOp=0, >> SizeOfLogQueue=1, TimeStampsOfLastShippedOp=Thu Jan 01 03:00:00 MSK 1970, >> Replication Lag=1573052815347 >> SINK : AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Mon Oct 07 >> 19:15:54 MSK 2019 >> >> At logs: >> >> 2019-11-06 18:10:54,252 INFO [hbase07:60020Replication Statistics #0] >> regionserver.Replication: Normal source for cluster lp_analytics: Total >> replicated edits: 0, current progress: >> walGroup [hbase07.prod.hbcluster%2C60020%2C1570464952456]: currently >> replicating from: >> hdfs://prodfashion01/hbase/WALs/hbase07.prod.hbcluster,60020,1570464952456/hbase07.prod.hbcluster%2C60020%2C1570464952456.1573051524020 >> at position: -1 >> 2019-11-06 18:15:54,252 INFO [hbase07:60020Replication Statistics #0] >> regionserver.Replication: Normal source for cluster lp_analytics: Total >> replicated edits: 0, current progress: >> walGroup [hbase07.prod.hbcluster%2C60020%2C1570464952456]: currently >> replicating from: >> hdfs://prodfashion01/hbase/WALs/hbase07.prod.hbcluster,60020,1570464952456/hbase07.prod.hbcluster%2C60020%2C1570464952456.1573051524020 >> at position: -1 >> >> I can’t find any errors or something that could help me to diagnose why >> replication not working at this node. At other nodes replication works like >> a charm. >> Any ideas what’s is wrong?
