I have four HBase clusters (A, B, C, D) with replication between them. I've configured each cluster to replicate to all other clusters in an attempt to have a hot-hot, eventually-consistent-across-all-clusters setup. One nice property of this configuration is that any individual cluster can completely fail without any impact on the others. If the failed cluster comes back up, replication eventually catches up. No manual intervention is needed during the failure or subsequent recovery.
However, reading the docs and making observations, it seems this setup results in each replication wal entry arriving at each remote cluster 5 times! Here's how a wal entry originating on A arrives at D 5 times: A -> D A -> B -> D A -> C -> D A -> B -> C -> D A -> C -> B -> D In other words, my replication topology choice combined with HBase's record-hops-and-don't-revisit replication loop prevention strategy leads to a lot of network and compute resource waste. I understand I could use a different topology. A ring for example or maybe slightly better: A <-> B <-> C <-> D But alternate topologies seem to lead to failure scenarios that require manual intervention (ie new peer connections at failure time and sync-up jobs). Is there a way I can maintain the property where no manual intervention is needed when an HBase cluster goes away but also eliminate all these redundant replications? For example, is there a way to place a limit on the number of hops a wal entry can make? Thanks! Whitney