CRUSHED is trying its best to evenly distribute the replicas. So you
dont need identical assignments for each of the instances?
If that's the case, I would suggest you to migrate to WAGED rebalancer with
constraints setup. For more details, you can refer:
https://github.com/apache/helix/wiki/Weight-aware-Globally-Evenly-distributed-Rebalancer

Best,

Junkai


On Mon, Feb 8, 2021 at 3:28 PM Phong X. Nguyen <[email protected]>
wrote:

> I believe it's #2, but perhaps I should explain:
>
> Here's a simplified view of mapFields;
>   "mapFields" : {
>     "partition_11" : {
>       "server05.verizonmedia.com" : "ONLINE"
>     },
>     "partition_22" : {
>       "server05.verizonmedia.com" : "ONLINE"
>     },
> },
>
> Server 5 has partitions (replicas?) 11 and 22 assigned to it; and that's
> currently fine. We could, for example, have partition_17 also assigned,
> which would be fine, but if a fourth one were to be assigned then we stand
> a high likelihood of crashing.
>
> Bootstrapping replicas is also expensive, so we'd like to minimize that as
> well.
>
> On Mon, Feb 8, 2021 at 3:14 PM Xue Junkai <[email protected]> wrote:
>
>> Thanks Phong. Can you clarify which you are looking for?
>> 1. parallel number of state transitions for bootstrapping replicas.
>> 2. number of replicas holding in an instance for limitation.
>>
>> Best,
>>
>> Junkai
>>
>>
>> On Mon, Feb 8, 2021 at 3:06 PM Phong X. Nguyen <[email protected]>
>> wrote:
>>
>>> Hello!
>>>
>>> I'm currently on a project that uses Apache Helix 0.8.4 (with a pending
>>> upgrade to Helix 1.0.1) to distribute partitions across a number of hosts
>>> (currently 32 partitions, 16 hosts). Once a partition is allocated to a
>>> host a bunch of expensive initialization steps occur, and the
>>> system proceeds to do a bunch of computations for the partition on a
>>> scheduled interval. We seek to minimize initializations when possible.
>>>
>>> If a system goes down (due to either maintenance or failure), the
>>> partitions get reshuffled. Currently we are using
>>> the CrushEdRebalanceStrategy in the hopes of minimizing partition
>>> movements. However, we noticed that unlike the earlier AutoRebalancer
>>> scheme, the CrushEdRebalanceStrategy does not limit the number of
>>> partitions per node. In our case, this can cause severe out-of-memory
>>> issues, which will then cascade as node after node gets more and more
>>> partitions that it cannot handle. We have on rare occasion seen our entire
>>> cluster fail as a result, and then our production engineers must manually -
>>> and carefully - bring the system back online. This is undesirable.
>>>
>>> Does Helix have a rebalancing strategy that minimizes partition movement
>>> yet also permits enforcement of maximum partitions per node?
>>>
>>> Thanks,
>>> - Phong X. Nguyen
>>>
>>

Reply via email to