Thanks Phong. Can you clarify which you are looking for?
1. parallel number of state transitions for bootstrapping replicas.
2. number of replicas holding in an instance for limitation.

Best,

Junkai


On Mon, Feb 8, 2021 at 3:06 PM Phong X. Nguyen <[email protected]>
wrote:

> Hello!
>
> I'm currently on a project that uses Apache Helix 0.8.4 (with a pending
> upgrade to Helix 1.0.1) to distribute partitions across a number of hosts
> (currently 32 partitions, 16 hosts). Once a partition is allocated to a
> host a bunch of expensive initialization steps occur, and the
> system proceeds to do a bunch of computations for the partition on a
> scheduled interval. We seek to minimize initializations when possible.
>
> If a system goes down (due to either maintenance or failure), the
> partitions get reshuffled. Currently we are using
> the CrushEdRebalanceStrategy in the hopes of minimizing partition
> movements. However, we noticed that unlike the earlier AutoRebalancer
> scheme, the CrushEdRebalanceStrategy does not limit the number of
> partitions per node. In our case, this can cause severe out-of-memory
> issues, which will then cascade as node after node gets more and more
> partitions that it cannot handle. We have on rare occasion seen our entire
> cluster fail as a result, and then our production engineers must manually -
> and carefully - bring the system back online. This is undesirable.
>
> Does Helix have a rebalancing strategy that minimizes partition movement
> yet also permits enforcement of maximum partitions per node?
>
> Thanks,
> - Phong X. Nguyen
>

Reply via email to