I think this might be a corner case when partitions * replicas <
TOTAL_NUMBER_OF_NODES. Can you try with many partitions and replicas and
check if the issue still exists.



On Wed, Oct 19, 2016 at 11:53 AM, Michael Craig <[email protected]> wrote:

> I've noticed that partitions/replicas assigned to disconnected instances
> are not automatically redistributed to live instances. What's the correct
> way to do this?
>
> For example, given this setup with Helix 0.6.5:
> - 1 resource
> - 2 replicas
> - LeaderStandby state model
> - FULL_AUTO rebalance mode
> - 3 nodes (N1 is Leader, N2 is Standby, N3 is just sitting)
>
> Then drop N1:
> - N2 becomes LEADER
> - Nothing happens to N3
>
> Naively, I would have expected N3 to transition from Offline to Standby,
> but that doesn't happen.
>
> I can force redistribution from GenericHelixController#onLiveInstanceChange
> by
> - dropping non-live instances from the cluster
> - calling rebalance
>
> The instance dropping seems pretty unsafe! Is there a better way?
>

Reply via email to