I think this might be a corner case when partitions * replicas < TOTAL_NUMBER_OF_NODES. Can you try with many partitions and replicas and check if the issue still exists.
On Wed, Oct 19, 2016 at 11:53 AM, Michael Craig <[email protected]> wrote: > I've noticed that partitions/replicas assigned to disconnected instances > are not automatically redistributed to live instances. What's the correct > way to do this? > > For example, given this setup with Helix 0.6.5: > - 1 resource > - 2 replicas > - LeaderStandby state model > - FULL_AUTO rebalance mode > - 3 nodes (N1 is Leader, N2 is Standby, N3 is just sitting) > > Then drop N1: > - N2 becomes LEADER > - Nothing happens to N3 > > Naively, I would have expected N3 to transition from Offline to Standby, > but that doesn't happen. > > I can force redistribution from GenericHelixController#onLiveInstanceChange > by > - dropping non-live instances from the cluster > - calling rebalance > > The instance dropping seems pretty unsafe! Is there a better way? >
