We run two hbase clusters, and one (master) replicates to the other (standby). We did some maintenance last night which involved bringing all of hbase down while we made changes to HDFS. After bringing things back up, our ageOfLastShippedOp on a few of the master region servers jumped to around -9 petaseconds, and the ageOfLastShippedOp on one of the slave region servers jumped to a little over 800 seconds.
We fixed the -9 petaseconds issue (which I assume was a reporting problem) by restarting the handful of (master) region servers that had that issue. For the slave region server with 800 seconds of ageOfLastAppliedOp, I've restarted the process, but the value didn't change. I assume that nobody in the master cluster is replicating to this node, so the value will just stay at 800. This is a problem, since we have alerts that trigger on high values. Is there any way to address this? Thanks, Patrick We're on hbase 0.92 (CHD4.1)
