Correct. For the WAGED rebalancer, the "strategy" is the constraints and partition weight. So there is no need to specify a certain "strategy" in addition to the rebalancer class.
BTW, the wiki page contains an out-of-date classpath. Please use "org.apache.helix.controller.rebalancer.waged.WagedRebalancer". I also updated the tutorial section a little bit so it is easier for you to find the entrance. Best Regards, Jiajun On Mon, Feb 8, 2021 at 5:18 PM Phong X. Nguyen <[email protected]> wrote: > Currently our configuration looks like this: > > "simpleFields" : { > "HELIX_ENABLED" : "true", > "IDEAL_STATE_MODE" : "AUTO_REBALANCE", > "NUM_PARTITIONS" : "32", > "REBALANCE_MODE" : "FULL_AUTO", > "REBALANCE_STRATEGY" : > "org.apache.helix.controller.rebalancer.strategy.CrushEdRebalanceStrategy", > "REPLICAS" : "1", > "STATE_MODEL_DEF_REF" : "OnlineOffline", > "STATE_MODEL_FACTORY_NAME" : "DEFAULT" > } > > Looking at the documentation, this seems like this should be our target > configuration. It looks like we're also supposed to > use REBALANCER_CLASS_NAME instead of REBALANCE_STRATEGY now? > > "simpleFields" : { > "HELIX_ENABLED" : "true", > "IDEAL_STATE_MODE" : "AUTO_REBALANCE", > "MAX_PARTITIONS_PER_INSTANCE" : "3", > "NUM_PARTITIONS" : "32", > "REBALANCE_MODE" : "FULL_AUTO", > "REBALANCER_CLASS_NAME" : > "org.apache.helix.controller.rebalancer.WagedRebalancer", > "REPLICAS" : "1", > "STATE_MODEL_DEF_REF" : "OnlineOffline", > "STATE_MODEL_FACTORY_NAME" : "DEFAULT" > } > > Thanks, everyone, for all of your help so far! > > > On Mon, Feb 8, 2021 at 4:20 PM Hunter Lee <[email protected]> wrote: > >> You might find that some classes have been moved to a separate module. >> Rest assured, most are backward-compatible and the only difference should >> be change in the package name. If you have any other specific questions >> that you cannot resolve on your own, you can reach out to the community for >> help. Depending on the complexity of your implementation, it shouldn't take >> more than a day or two. >> >> Hunter >> >> On Mon, Feb 8, 2021 at 4:08 PM Phong X. Nguyen <[email protected]> >> wrote: >> >>> We're definitely going to give WAGED a try. >>> >>> Are there any constraints for upgrading from Helix 0.8.4 to 1.0.1? We >>> were on 0.6 for the longest time and knew we had to upgrade first to 0.8.X. >>> >>> Thanks, >>> - Phong X. Nguyen >>> >>> On Mon, Feb 8, 2021 at 4:03 PM Wang Jiajun <[email protected]> >>> wrote: >>> >>>> Hi Phong, >>>> >>>> The WAGED rebalancer respects the MAX_PARTITIONS_PER_INSTANCE >>>> automatically. So probably you don't need to do any specific configuration. >>>> However, you do need to be on the new version to use the WAGED rebalancer. >>>> >>>> Also to confirm what you said, I believe the consistent hashing based >>>> strategies (Crush and CrushEd) do not respect >>>> the MAX_PARTITIONS_PER_INSTANCE. I guess there was some design concern. >>>> >>>> Anyway, using WAGED is the current recommendation : ) Could you please >>>> have a try and let us know if it is a good fit? >>>> >>>> Best Regards, >>>> Jiajun >>>> >>>> >>>> On Mon, Feb 8, 2021 at 3:55 PM Xue Junkai <[email protected]> wrote: >>>> >>>>> CRUSHED is trying its best to evenly distribute the replicas. So you >>>>> dont need identical assignments for each of the instances? >>>>> If that's the case, I would suggest you to migrate to WAGED rebalancer >>>>> with constraints setup. For more details, you can refer: >>>>> https://github.com/apache/helix/wiki/Weight-aware-Globally-Evenly-distributed-Rebalancer >>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_helix_wiki_Weight-2Daware-2DGlobally-2DEvenly-2Ddistributed-2DRebalancer&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=OK-6RxrdKOH6KRwDOySNaLx6hy0DI7lQsJgNkY9rapU&m=y8sLBZjx235emP_H8CEdxlyUfhGoxD7ogIhyTUj8qtA&s=hFcMAED5DL1uYTHQNjzaOQ2twDmdmS3-bgpLAbLZnRo&e=> >>>>> >>>>> Best, >>>>> >>>>> Junkai >>>>> >>>>> >>>>> On Mon, Feb 8, 2021 at 3:28 PM Phong X. Nguyen < >>>>> [email protected]> wrote: >>>>> >>>>>> I believe it's #2, but perhaps I should explain: >>>>>> >>>>>> Here's a simplified view of mapFields; >>>>>> "mapFields" : { >>>>>> "partition_11" : { >>>>>> "server05.verizonmedia.com" : "ONLINE" >>>>>> }, >>>>>> "partition_22" : { >>>>>> "server05.verizonmedia.com" : "ONLINE" >>>>>> }, >>>>>> }, >>>>>> >>>>>> Server 5 has partitions (replicas?) 11 and 22 assigned to it; and >>>>>> that's currently fine. We could, for example, have partition_17 also >>>>>> assigned, which would be fine, but if a fourth one were to be assigned >>>>>> then >>>>>> we stand a high likelihood of crashing. >>>>>> >>>>>> Bootstrapping replicas is also expensive, so we'd like to minimize >>>>>> that as well. >>>>>> >>>>>> On Mon, Feb 8, 2021 at 3:14 PM Xue Junkai <[email protected]> wrote: >>>>>> >>>>>>> Thanks Phong. Can you clarify which you are looking for? >>>>>>> 1. parallel number of state transitions for bootstrapping replicas. >>>>>>> 2. number of replicas holding in an instance for limitation. >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Junkai >>>>>>> >>>>>>> >>>>>>> On Mon, Feb 8, 2021 at 3:06 PM Phong X. Nguyen < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hello! >>>>>>>> >>>>>>>> I'm currently on a project that uses Apache Helix 0.8.4 (with a >>>>>>>> pending upgrade to Helix 1.0.1) to distribute partitions across a >>>>>>>> number of >>>>>>>> hosts (currently 32 partitions, 16 hosts). Once a partition is >>>>>>>> allocated to >>>>>>>> a host a bunch of expensive initialization steps occur, and the >>>>>>>> system proceeds to do a bunch of computations for the partition on a >>>>>>>> scheduled interval. We seek to minimize initializations when possible. >>>>>>>> >>>>>>>> If a system goes down (due to either maintenance or failure), the >>>>>>>> partitions get reshuffled. Currently we are using >>>>>>>> the CrushEdRebalanceStrategy in the hopes of minimizing partition >>>>>>>> movements. However, we noticed that unlike the earlier AutoRebalancer >>>>>>>> scheme, the CrushEdRebalanceStrategy does not limit the number of >>>>>>>> partitions per node. In our case, this can cause severe out-of-memory >>>>>>>> issues, which will then cascade as node after node gets more and more >>>>>>>> partitions that it cannot handle. We have on rare occasion seen our >>>>>>>> entire >>>>>>>> cluster fail as a result, and then our production engineers must >>>>>>>> manually - >>>>>>>> and carefully - bring the system back online. This is undesirable. >>>>>>>> >>>>>>>> Does Helix have a rebalancing strategy that minimizes partition >>>>>>>> movement yet also permits enforcement of maximum partitions per node? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> - Phong X. Nguyen >>>>>>>> >>>>>>>
