I would like to add one more thing. One major change from 0.6 -> 0.8 is Helix moved from Java 1.7 to 1.8.
On Mon, Feb 8, 2021 at 4:20 PM Hunter Lee <[email protected]> wrote: > You might find that some classes have been moved to a separate module. > Rest assured, most are backward-compatible and the only difference should > be change in the package name. If you have any other specific questions > that you cannot resolve on your own, you can reach out to the community for > help. Depending on the complexity of your implementation, it shouldn't take > more than a day or two. > > Hunter > > On Mon, Feb 8, 2021 at 4:08 PM Phong X. Nguyen <[email protected]> > wrote: > >> We're definitely going to give WAGED a try. >> >> Are there any constraints for upgrading from Helix 0.8.4 to 1.0.1? We >> were on 0.6 for the longest time and knew we had to upgrade first to 0.8.X. >> >> Thanks, >> - Phong X. Nguyen >> >> On Mon, Feb 8, 2021 at 4:03 PM Wang Jiajun <[email protected]> >> wrote: >> >>> Hi Phong, >>> >>> The WAGED rebalancer respects the MAX_PARTITIONS_PER_INSTANCE >>> automatically. So probably you don't need to do any specific configuration. >>> However, you do need to be on the new version to use the WAGED rebalancer. >>> >>> Also to confirm what you said, I believe the consistent hashing based >>> strategies (Crush and CrushEd) do not respect >>> the MAX_PARTITIONS_PER_INSTANCE. I guess there was some design concern. >>> >>> Anyway, using WAGED is the current recommendation : ) Could you please >>> have a try and let us know if it is a good fit? >>> >>> Best Regards, >>> Jiajun >>> >>> >>> On Mon, Feb 8, 2021 at 3:55 PM Xue Junkai <[email protected]> wrote: >>> >>>> CRUSHED is trying its best to evenly distribute the replicas. So you >>>> dont need identical assignments for each of the instances? >>>> If that's the case, I would suggest you to migrate to WAGED rebalancer >>>> with constraints setup. For more details, you can refer: >>>> https://github.com/apache/helix/wiki/Weight-aware-Globally-Evenly-distributed-Rebalancer >>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_helix_wiki_Weight-2Daware-2DGlobally-2DEvenly-2Ddistributed-2DRebalancer&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=OK-6RxrdKOH6KRwDOySNaLx6hy0DI7lQsJgNkY9rapU&m=y8sLBZjx235emP_H8CEdxlyUfhGoxD7ogIhyTUj8qtA&s=hFcMAED5DL1uYTHQNjzaOQ2twDmdmS3-bgpLAbLZnRo&e=> >>>> >>>> Best, >>>> >>>> Junkai >>>> >>>> >>>> On Mon, Feb 8, 2021 at 3:28 PM Phong X. Nguyen < >>>> [email protected]> wrote: >>>> >>>>> I believe it's #2, but perhaps I should explain: >>>>> >>>>> Here's a simplified view of mapFields; >>>>> "mapFields" : { >>>>> "partition_11" : { >>>>> "server05.verizonmedia.com" : "ONLINE" >>>>> }, >>>>> "partition_22" : { >>>>> "server05.verizonmedia.com" : "ONLINE" >>>>> }, >>>>> }, >>>>> >>>>> Server 5 has partitions (replicas?) 11 and 22 assigned to it; and >>>>> that's currently fine. We could, for example, have partition_17 also >>>>> assigned, which would be fine, but if a fourth one were to be assigned >>>>> then >>>>> we stand a high likelihood of crashing. >>>>> >>>>> Bootstrapping replicas is also expensive, so we'd like to minimize >>>>> that as well. >>>>> >>>>> On Mon, Feb 8, 2021 at 3:14 PM Xue Junkai <[email protected]> wrote: >>>>> >>>>>> Thanks Phong. Can you clarify which you are looking for? >>>>>> 1. parallel number of state transitions for bootstrapping replicas. >>>>>> 2. number of replicas holding in an instance for limitation. >>>>>> >>>>>> Best, >>>>>> >>>>>> Junkai >>>>>> >>>>>> >>>>>> On Mon, Feb 8, 2021 at 3:06 PM Phong X. Nguyen < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hello! >>>>>>> >>>>>>> I'm currently on a project that uses Apache Helix 0.8.4 (with a >>>>>>> pending upgrade to Helix 1.0.1) to distribute partitions across a >>>>>>> number of >>>>>>> hosts (currently 32 partitions, 16 hosts). Once a partition is >>>>>>> allocated to >>>>>>> a host a bunch of expensive initialization steps occur, and the >>>>>>> system proceeds to do a bunch of computations for the partition on a >>>>>>> scheduled interval. We seek to minimize initializations when possible. >>>>>>> >>>>>>> If a system goes down (due to either maintenance or failure), the >>>>>>> partitions get reshuffled. Currently we are using >>>>>>> the CrushEdRebalanceStrategy in the hopes of minimizing partition >>>>>>> movements. However, we noticed that unlike the earlier AutoRebalancer >>>>>>> scheme, the CrushEdRebalanceStrategy does not limit the number of >>>>>>> partitions per node. In our case, this can cause severe out-of-memory >>>>>>> issues, which will then cascade as node after node gets more and more >>>>>>> partitions that it cannot handle. We have on rare occasion seen our >>>>>>> entire >>>>>>> cluster fail as a result, and then our production engineers must >>>>>>> manually - >>>>>>> and carefully - bring the system back online. This is undesirable. >>>>>>> >>>>>>> Does Helix have a rebalancing strategy that minimizes partition >>>>>>> movement yet also permits enforcement of maximum partitions per node? >>>>>>> >>>>>>> Thanks, >>>>>>> - Phong X. Nguyen >>>>>>> >>>>>>
