We're definitely going to give WAGED a try.

Are there any constraints for upgrading from Helix 0.8.4 to 1.0.1? We were
on 0.6 for the longest time and knew we had to upgrade first to 0.8.X.

Thanks,
- Phong X. Nguyen

On Mon, Feb 8, 2021 at 4:03 PM Wang Jiajun <[email protected]> wrote:

> Hi Phong,
>
> The WAGED rebalancer respects the MAX_PARTITIONS_PER_INSTANCE
> automatically. So probably you don't need to do any specific configuration.
> However, you do need to be on the new version to use the WAGED rebalancer.
>
> Also to confirm what you said, I believe the consistent hashing based
> strategies (Crush and CrushEd) do not respect
> the MAX_PARTITIONS_PER_INSTANCE. I guess there was some design concern.
>
> Anyway, using WAGED is the current recommendation : ) Could you please
> have a try and let us know if it is a good fit?
>
> Best Regards,
> Jiajun
>
>
> On Mon, Feb 8, 2021 at 3:55 PM Xue Junkai <[email protected]> wrote:
>
>> CRUSHED is trying its best to evenly distribute the replicas. So you
>> dont need identical assignments for each of the instances?
>> If that's the case, I would suggest you to migrate to WAGED rebalancer
>> with constraints setup. For more details, you can refer:
>> https://github.com/apache/helix/wiki/Weight-aware-Globally-Evenly-distributed-Rebalancer
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_helix_wiki_Weight-2Daware-2DGlobally-2DEvenly-2Ddistributed-2DRebalancer&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=OK-6RxrdKOH6KRwDOySNaLx6hy0DI7lQsJgNkY9rapU&m=y8sLBZjx235emP_H8CEdxlyUfhGoxD7ogIhyTUj8qtA&s=hFcMAED5DL1uYTHQNjzaOQ2twDmdmS3-bgpLAbLZnRo&e=>
>>
>> Best,
>>
>> Junkai
>>
>>
>> On Mon, Feb 8, 2021 at 3:28 PM Phong X. Nguyen <[email protected]>
>> wrote:
>>
>>> I believe it's #2, but perhaps I should explain:
>>>
>>> Here's a simplified view of mapFields;
>>>   "mapFields" : {
>>>     "partition_11" : {
>>>       "server05.verizonmedia.com" : "ONLINE"
>>>     },
>>>     "partition_22" : {
>>>       "server05.verizonmedia.com" : "ONLINE"
>>>     },
>>> },
>>>
>>> Server 5 has partitions (replicas?) 11 and 22 assigned to it; and that's
>>> currently fine. We could, for example, have partition_17 also assigned,
>>> which would be fine, but if a fourth one were to be assigned then we stand
>>> a high likelihood of crashing.
>>>
>>> Bootstrapping replicas is also expensive, so we'd like to minimize that
>>> as well.
>>>
>>> On Mon, Feb 8, 2021 at 3:14 PM Xue Junkai <[email protected]> wrote:
>>>
>>>> Thanks Phong. Can you clarify which you are looking for?
>>>> 1. parallel number of state transitions for bootstrapping replicas.
>>>> 2. number of replicas holding in an instance for limitation.
>>>>
>>>> Best,
>>>>
>>>> Junkai
>>>>
>>>>
>>>> On Mon, Feb 8, 2021 at 3:06 PM Phong X. Nguyen <
>>>> [email protected]> wrote:
>>>>
>>>>> Hello!
>>>>>
>>>>> I'm currently on a project that uses Apache Helix 0.8.4 (with a
>>>>> pending upgrade to Helix 1.0.1) to distribute partitions across a number 
>>>>> of
>>>>> hosts (currently 32 partitions, 16 hosts). Once a partition is allocated 
>>>>> to
>>>>> a host a bunch of expensive initialization steps occur, and the
>>>>> system proceeds to do a bunch of computations for the partition on a
>>>>> scheduled interval. We seek to minimize initializations when possible.
>>>>>
>>>>> If a system goes down (due to either maintenance or failure), the
>>>>> partitions get reshuffled. Currently we are using
>>>>> the CrushEdRebalanceStrategy in the hopes of minimizing partition
>>>>> movements. However, we noticed that unlike the earlier AutoRebalancer
>>>>> scheme, the CrushEdRebalanceStrategy does not limit the number of
>>>>> partitions per node. In our case, this can cause severe out-of-memory
>>>>> issues, which will then cascade as node after node gets more and more
>>>>> partitions that it cannot handle. We have on rare occasion seen our entire
>>>>> cluster fail as a result, and then our production engineers must manually 
>>>>> -
>>>>> and carefully - bring the system back online. This is undesirable.
>>>>>
>>>>> Does Helix have a rebalancing strategy that minimizes partition
>>>>> movement yet also permits enforcement of maximum partitions per node?
>>>>>
>>>>> Thanks,
>>>>> - Phong X. Nguyen
>>>>>
>>>>

Reply via email to