Glad that helps!

Best,

Junkai

On Tue, Mar 30, 2021 at 11:50 AM Phong X. Nguyen <[email protected]>
wrote:

> Hi everyone,
>
> We upgraded to Helix 1.0.1 and switched to the WAGED rebalancer and it
> worked extremely well for us. Thank you again for all your assistance!
>
> On Mon, Feb 8, 2021 at 4:03 PM Wang Jiajun <[email protected]> wrote:
>
>> Hi Phong,
>>
>> The WAGED rebalancer respects the MAX_PARTITIONS_PER_INSTANCE
>> automatically. So probably you don't need to do any specific configuration.
>> However, you do need to be on the new version to use the WAGED rebalancer.
>>
>> Also to confirm what you said, I believe the consistent hashing based
>> strategies (Crush and CrushEd) do not respect
>> the MAX_PARTITIONS_PER_INSTANCE. I guess there was some design concern.
>>
>> Anyway, using WAGED is the current recommendation : ) Could you please
>> have a try and let us know if it is a good fit?
>>
>> Best Regards,
>> Jiajun
>>
>>
>> On Mon, Feb 8, 2021 at 3:55 PM Xue Junkai <[email protected]> wrote:
>>
>>> CRUSHED is trying its best to evenly distribute the replicas. So you
>>> dont need identical assignments for each of the instances?
>>> If that's the case, I would suggest you to migrate to WAGED rebalancer
>>> with constraints setup. For more details, you can refer:
>>> https://github.com/apache/helix/wiki/Weight-aware-Globally-Evenly-distributed-Rebalancer
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_helix_wiki_Weight-2Daware-2DGlobally-2DEvenly-2Ddistributed-2DRebalancer&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=OK-6RxrdKOH6KRwDOySNaLx6hy0DI7lQsJgNkY9rapU&m=y8sLBZjx235emP_H8CEdxlyUfhGoxD7ogIhyTUj8qtA&s=hFcMAED5DL1uYTHQNjzaOQ2twDmdmS3-bgpLAbLZnRo&e=>
>>>
>>> Best,
>>>
>>> Junkai
>>>
>>>
>>> On Mon, Feb 8, 2021 at 3:28 PM Phong X. Nguyen <
>>> [email protected]> wrote:
>>>
>>>> I believe it's #2, but perhaps I should explain:
>>>>
>>>> Here's a simplified view of mapFields;
>>>>   "mapFields" : {
>>>>     "partition_11" : {
>>>>       "server05.verizonmedia.com" : "ONLINE"
>>>>     },
>>>>     "partition_22" : {
>>>>       "server05.verizonmedia.com" : "ONLINE"
>>>>     },
>>>> },
>>>>
>>>> Server 5 has partitions (replicas?) 11 and 22 assigned to it; and
>>>> that's currently fine. We could, for example, have partition_17 also
>>>> assigned, which would be fine, but if a fourth one were to be assigned then
>>>> we stand a high likelihood of crashing.
>>>>
>>>> Bootstrapping replicas is also expensive, so we'd like to minimize that
>>>> as well.
>>>>
>>>> On Mon, Feb 8, 2021 at 3:14 PM Xue Junkai <[email protected]> wrote:
>>>>
>>>>> Thanks Phong. Can you clarify which you are looking for?
>>>>> 1. parallel number of state transitions for bootstrapping replicas.
>>>>> 2. number of replicas holding in an instance for limitation.
>>>>>
>>>>> Best,
>>>>>
>>>>> Junkai
>>>>>
>>>>>
>>>>> On Mon, Feb 8, 2021 at 3:06 PM Phong X. Nguyen <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hello!
>>>>>>
>>>>>> I'm currently on a project that uses Apache Helix 0.8.4 (with a
>>>>>> pending upgrade to Helix 1.0.1) to distribute partitions across a number 
>>>>>> of
>>>>>> hosts (currently 32 partitions, 16 hosts). Once a partition is allocated 
>>>>>> to
>>>>>> a host a bunch of expensive initialization steps occur, and the
>>>>>> system proceeds to do a bunch of computations for the partition on a
>>>>>> scheduled interval. We seek to minimize initializations when possible.
>>>>>>
>>>>>> If a system goes down (due to either maintenance or failure), the
>>>>>> partitions get reshuffled. Currently we are using
>>>>>> the CrushEdRebalanceStrategy in the hopes of minimizing partition
>>>>>> movements. However, we noticed that unlike the earlier AutoRebalancer
>>>>>> scheme, the CrushEdRebalanceStrategy does not limit the number of
>>>>>> partitions per node. In our case, this can cause severe out-of-memory
>>>>>> issues, which will then cascade as node after node gets more and more
>>>>>> partitions that it cannot handle. We have on rare occasion seen our 
>>>>>> entire
>>>>>> cluster fail as a result, and then our production engineers must 
>>>>>> manually -
>>>>>> and carefully - bring the system back online. This is undesirable.
>>>>>>
>>>>>> Does Helix have a rebalancing strategy that minimizes partition
>>>>>> movement yet also permits enforcement of maximum partitions per node?
>>>>>>
>>>>>> Thanks,
>>>>>> - Phong X. Nguyen
>>>>>>
>>>>>

Reply via email to