Hi all, I'm running HBase replication on CDH 5.9.0 and am wondering if there are known configurations/methods to decrease the replication lag/latency. I am monitoring replication latency via two separate methods:
1) The JMX 'replication.source.ageOfLastShippedOp' exposed by the region server. The 99th percentile latency (assuming I'm constantly writing data), according to this metric, averages around ~480-500ms. 2) A worker constantly writing data to the source cluster (2,000 writes/sec) and constantly reading data from the sink cluster. It tries to read the data it just wrote and reports latency as `currentTime - resultTimestamp`. The 99th percentile latency, according to this metric, averages around ~1,470-1,500ms. As expected, (1) is a lower bound of (2). I'm just curious as to whether or not anyone has figured out ways to reduce the replication latency so that the 99th percentile latency could hover closer to the 300-400ms range. I have tried changing `hbase.replication.handler.count` on the sink cluster from 3 to 15, but did not observe too large a difference. I looked through some HBaseCon 2014 slides and saw that Flurry achieved 85ms latency between DCs ( https://www.slideshare.net/HBaseCon/operations-session-2-35938496). Any thoughts on how something like this might be possible?
