Hi Phong,

The WAGED rebalancer respects the MAX_PARTITIONS_PER_INSTANCE
automatically. So probably you don't need to do any specific configuration.
However, you do need to be on the new version to use the WAGED rebalancer.

Also to confirm what you said, I believe the consistent hashing based
strategies (Crush and CrushEd) do not respect
the MAX_PARTITIONS_PER_INSTANCE. I guess there was some design concern.

Anyway, using WAGED is the current recommendation : ) Could you please have
a try and let us know if it is a good fit?

Best Regards,
Jiajun


On Mon, Feb 8, 2021 at 3:55 PM Xue Junkai <[email protected]> wrote:

> CRUSHED is trying its best to evenly distribute the replicas. So you
> dont need identical assignments for each of the instances?
> If that's the case, I would suggest you to migrate to WAGED rebalancer
> with constraints setup. For more details, you can refer:
> https://github.com/apache/helix/wiki/Weight-aware-Globally-Evenly-distributed-Rebalancer
>
> Best,
>
> Junkai
>
>
> On Mon, Feb 8, 2021 at 3:28 PM Phong X. Nguyen <[email protected]>
> wrote:
>
>> I believe it's #2, but perhaps I should explain:
>>
>> Here's a simplified view of mapFields;
>>   "mapFields" : {
>>     "partition_11" : {
>>       "server05.verizonmedia.com" : "ONLINE"
>>     },
>>     "partition_22" : {
>>       "server05.verizonmedia.com" : "ONLINE"
>>     },
>> },
>>
>> Server 5 has partitions (replicas?) 11 and 22 assigned to it; and that's
>> currently fine. We could, for example, have partition_17 also assigned,
>> which would be fine, but if a fourth one were to be assigned then we stand
>> a high likelihood of crashing.
>>
>> Bootstrapping replicas is also expensive, so we'd like to minimize that
>> as well.
>>
>> On Mon, Feb 8, 2021 at 3:14 PM Xue Junkai <[email protected]> wrote:
>>
>>> Thanks Phong. Can you clarify which you are looking for?
>>> 1. parallel number of state transitions for bootstrapping replicas.
>>> 2. number of replicas holding in an instance for limitation.
>>>
>>> Best,
>>>
>>> Junkai
>>>
>>>
>>> On Mon, Feb 8, 2021 at 3:06 PM Phong X. Nguyen <
>>> [email protected]> wrote:
>>>
>>>> Hello!
>>>>
>>>> I'm currently on a project that uses Apache Helix 0.8.4 (with a pending
>>>> upgrade to Helix 1.0.1) to distribute partitions across a number of hosts
>>>> (currently 32 partitions, 16 hosts). Once a partition is allocated to a
>>>> host a bunch of expensive initialization steps occur, and the
>>>> system proceeds to do a bunch of computations for the partition on a
>>>> scheduled interval. We seek to minimize initializations when possible.
>>>>
>>>> If a system goes down (due to either maintenance or failure), the
>>>> partitions get reshuffled. Currently we are using
>>>> the CrushEdRebalanceStrategy in the hopes of minimizing partition
>>>> movements. However, we noticed that unlike the earlier AutoRebalancer
>>>> scheme, the CrushEdRebalanceStrategy does not limit the number of
>>>> partitions per node. In our case, this can cause severe out-of-memory
>>>> issues, which will then cascade as node after node gets more and more
>>>> partitions that it cannot handle. We have on rare occasion seen our entire
>>>> cluster fail as a result, and then our production engineers must manually -
>>>> and carefully - bring the system back online. This is undesirable.
>>>>
>>>> Does Helix have a rebalancing strategy that minimizes partition
>>>> movement yet also permits enforcement of maximum partitions per node?
>>>>
>>>> Thanks,
>>>> - Phong X. Nguyen
>>>>
>>>

Reply via email to