Hi, We are using delayed rebalancer to manage a Master-Slave cluster. In the event when a large portion of a cluster disconnect from ZK (network partition, or service crash due to a bug), helix controller will try hard to move shards to the rest of the cluster. This could make the thing worse if it's very expensive to rebuild a replica or there is no live replica left in the rest of the cluster. I am wondering what's the suggested way to handle this case? Is there a way to let Helix controller pause when the change of live instances is more than a threshold?
-- Best regards, Bo
