Hi, I have two cluster setup in a lab, each has 1 Master and 3 RS. I'm inserting roughly 15GB into the master cluster, but I see between 5 - 10 minutes delay between master and slave cluster (ageOfLastShippedOp) them.
On my Graphite I see that replicateLogEntries_num_ops is increasing in one region server (IP 85) of the slave cluster, out of 3 (IPs 83,84,85). I ran a grep on the logs of each region server of the master, and saw Chosen peer message saying the following: RS ip 74: Chosen peer 83 RS ip 75: Chosen peer 85 RS ip 76: Chosen peer 85 So first problem: Why only two slave RS (83,85) are receiving replicated log entries instead of 3? Second and biggest problem: I ran netstat -tnp and grepped for 83,84,85 on the RS ip 74, and saw that it is in fact talking with RS 85! This was correlated with the Graphite graph of replicateLogEntries_num_ops which showed that only RS 85 was receiving replicated log entries. For me it looks like a bug. Anyone has any ideas how to solve those two issues?