Re: [E] Re: Maximum Partition enforcement?

Wang Jiajun Mon, 08 Feb 2021 17:38:47 -0800

Correct. For the WAGED rebalancer, the "strategy" is the constraints and
partition weight. So there is no need to specify a certain "strategy" in
addition to the rebalancer class.


BTW, the wiki page contains an out-of-date classpath. Please use
"org.apache.helix.controller.rebalancer.waged.WagedRebalancer". I also
updated the tutorial section a little bit so it is easier for you to find
the entrance.

Best Regards,
Jiajun


On Mon, Feb 8, 2021 at 5:18 PM Phong X. Nguyen <[email protected]>
wrote:

> Currently our configuration looks like this:
>
>   "simpleFields" : {
>     "HELIX_ENABLED" : "true",
>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>     "NUM_PARTITIONS" : "32",
>     "REBALANCE_MODE" : "FULL_AUTO",
>     "REBALANCE_STRATEGY" :
> "org.apache.helix.controller.rebalancer.strategy.CrushEdRebalanceStrategy",
>     "REPLICAS" : "1",
>     "STATE_MODEL_DEF_REF" : "OnlineOffline",
>     "STATE_MODEL_FACTORY_NAME" : "DEFAULT"
>   }
>
> Looking at the documentation, this seems like this should be our target
> configuration. It looks like we're also supposed to
> use REBALANCER_CLASS_NAME instead of REBALANCE_STRATEGY now?
>
>   "simpleFields" : {
>     "HELIX_ENABLED" : "true",
>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>     "MAX_PARTITIONS_PER_INSTANCE" : "3",
>     "NUM_PARTITIONS" : "32",
>     "REBALANCE_MODE" : "FULL_AUTO",
>     "REBALANCER_CLASS_NAME" :
> "org.apache.helix.controller.rebalancer.WagedRebalancer",
>     "REPLICAS" : "1",
>     "STATE_MODEL_DEF_REF" : "OnlineOffline",
>     "STATE_MODEL_FACTORY_NAME" : "DEFAULT"
>   }
>
> Thanks, everyone, for all of your help so far!
>
>
> On Mon, Feb 8, 2021 at 4:20 PM Hunter Lee <[email protected]> wrote:
>
>> You might find that some classes have been moved to a separate module.
>> Rest assured, most are backward-compatible and the only difference should
>> be change in the package name. If you have any other specific questions
>> that you cannot resolve on your own, you can reach out to the community for
>> help. Depending on the complexity of your implementation, it shouldn't take
>> more than a day or two.
>>
>> Hunter
>>
>> On Mon, Feb 8, 2021 at 4:08 PM Phong X. Nguyen <[email protected]>
>> wrote:
>>
>>> We're definitely going to give WAGED a try.
>>>
>>> Are there any constraints for upgrading from Helix 0.8.4 to 1.0.1? We
>>> were on 0.6 for the longest time and knew we had to upgrade first to 0.8.X.
>>>
>>> Thanks,
>>> - Phong X. Nguyen
>>>
>>> On Mon, Feb 8, 2021 at 4:03 PM Wang Jiajun <[email protected]>
>>> wrote:
>>>
>>>> Hi Phong,
>>>>
>>>> The WAGED rebalancer respects the MAX_PARTITIONS_PER_INSTANCE
>>>> automatically. So probably you don't need to do any specific configuration.
>>>> However, you do need to be on the new version to use the WAGED rebalancer.
>>>>
>>>> Also to confirm what you said, I believe the consistent hashing based
>>>> strategies (Crush and CrushEd) do not respect
>>>> the MAX_PARTITIONS_PER_INSTANCE. I guess there was some design concern.
>>>>
>>>> Anyway, using WAGED is the current recommendation : ) Could you please
>>>> have a try and let us know if it is a good fit?
>>>>
>>>> Best Regards,
>>>> Jiajun
>>>>
>>>>
>>>> On Mon, Feb 8, 2021 at 3:55 PM Xue Junkai <[email protected]> wrote:
>>>>
>>>>> CRUSHED is trying its best to evenly distribute the replicas. So you
>>>>> dont need identical assignments for each of the instances?
>>>>> If that's the case, I would suggest you to migrate to WAGED rebalancer
>>>>> with constraints setup. For more details, you can refer:
>>>>> https://github.com/apache/helix/wiki/Weight-aware-Globally-Evenly-distributed-Rebalancer
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_helix_wiki_Weight-2Daware-2DGlobally-2DEvenly-2Ddistributed-2DRebalancer&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=OK-6RxrdKOH6KRwDOySNaLx6hy0DI7lQsJgNkY9rapU&m=y8sLBZjx235emP_H8CEdxlyUfhGoxD7ogIhyTUj8qtA&s=hFcMAED5DL1uYTHQNjzaOQ2twDmdmS3-bgpLAbLZnRo&e=>
>>>>>
>>>>> Best,
>>>>>
>>>>> Junkai
>>>>>
>>>>>
>>>>> On Mon, Feb 8, 2021 at 3:28 PM Phong X. Nguyen <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> I believe it's #2, but perhaps I should explain:
>>>>>>
>>>>>> Here's a simplified view of mapFields;
>>>>>>   "mapFields" : {
>>>>>>     "partition_11" : {
>>>>>>       "server05.verizonmedia.com" : "ONLINE"
>>>>>>     },
>>>>>>     "partition_22" : {
>>>>>>       "server05.verizonmedia.com" : "ONLINE"
>>>>>>     },
>>>>>> },
>>>>>>
>>>>>> Server 5 has partitions (replicas?) 11 and 22 assigned to it; and
>>>>>> that's currently fine. We could, for example, have partition_17 also
>>>>>> assigned, which would be fine, but if a fourth one were to be assigned 
>>>>>> then
>>>>>> we stand a high likelihood of crashing.
>>>>>>
>>>>>> Bootstrapping replicas is also expensive, so we'd like to minimize
>>>>>> that as well.
>>>>>>
>>>>>> On Mon, Feb 8, 2021 at 3:14 PM Xue Junkai <[email protected]> wrote:
>>>>>>
>>>>>>> Thanks Phong. Can you clarify which you are looking for?
>>>>>>> 1. parallel number of state transitions for bootstrapping replicas.
>>>>>>> 2. number of replicas holding in an instance for limitation.
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Junkai
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Feb 8, 2021 at 3:06 PM Phong X. Nguyen <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hello!
>>>>>>>>
>>>>>>>> I'm currently on a project that uses Apache Helix 0.8.4 (with a
>>>>>>>> pending upgrade to Helix 1.0.1) to distribute partitions across a 
>>>>>>>> number of
>>>>>>>> hosts (currently 32 partitions, 16 hosts). Once a partition is 
>>>>>>>> allocated to
>>>>>>>> a host a bunch of expensive initialization steps occur, and the
>>>>>>>> system proceeds to do a bunch of computations for the partition on a
>>>>>>>> scheduled interval. We seek to minimize initializations when possible.
>>>>>>>>
>>>>>>>> If a system goes down (due to either maintenance or failure), the
>>>>>>>> partitions get reshuffled. Currently we are using
>>>>>>>> the CrushEdRebalanceStrategy in the hopes of minimizing partition
>>>>>>>> movements. However, we noticed that unlike the earlier AutoRebalancer
>>>>>>>> scheme, the CrushEdRebalanceStrategy does not limit the number of
>>>>>>>> partitions per node. In our case, this can cause severe out-of-memory
>>>>>>>> issues, which will then cascade as node after node gets more and more
>>>>>>>> partitions that it cannot handle. We have on rare occasion seen our 
>>>>>>>> entire
>>>>>>>> cluster fail as a result, and then our production engineers must 
>>>>>>>> manually -
>>>>>>>> and carefully - bring the system back online. This is undesirable.
>>>>>>>>
>>>>>>>> Does Helix have a rebalancing strategy that minimizes partition
>>>>>>>> movement yet also permits enforcement of maximum partitions per node?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> - Phong X. Nguyen
>>>>>>>>
>>>>>>>

Re: [E] Re: Maximum Partition enforcement?

Reply via email to