Thanks Grainier! It could be a potential bug in the code. But usually this is not the right way of doing it.
I think when you choose OnlineOffline mode, you expect the same state between replicas. And OnlineOffline state model does not require you to set the upbound state as it is not limited to be 1 as the LeaderFollower model. For the Cli tool part, we only offer Cli tool for the admin operation. We do have the prediction code in HelixUtil in java but not cli: https://github.com/apache/helix/blob/master/helix-core/src/main/java/org/apache/helix/util/HelixUtil.java#L433 Best, Junkai On Wed, Sep 27, 2023 at 1:58 AM Grainier Perera <[email protected]> wrote: > Hi Helix Dev Team, > > I'm currently working on a project involving Apache Helix and have > encountered a scenario that raised some questions regarding configuration. > I'd like to seek your guidance on the following: > > I've created a sample application[1] using Helix, where I'm adding a > resource named *MyResource* with *1 partition and 2 replicas* to a *4-node > cluster*. This cluster uses the OnlineOfflineStateModel. In this sample, > I'm setting the dynamicUpperBound of the ONLINE state to *R*. > > When I run the sample and trigger a rebalance, I observe that eventually, > there are *3* ONLINE instances of the resource, even though the replica > count is specified as *2*. However, if I set the dynamicUpperBound of the > ONLINE state to 1, I consistently see only 1 instance of the resource > throughout the test. > > My question is: why am I getting 3 ONLINE instances of the resource when > the replica count is set to 2, and the dynamicUpperBound is set to R? > > Additionally, when using FULL_AUTO as the rebalance mode, I'm curious to > know if there is a command-line interface (CLI) command or method that > allows me to determine which partition/replica is deployed on which node. > Is there a specific CLI command for this purpose? > > Your insights and assistance on these matters would be greatly > appreciated. Thank you for your time and support. > > [1] https://gist.github.com/grainier/a2b38c1b22aa7db71789b1c023044da1 > > Output: > > *With `builder.upperBound(OnlineOfflineSMD.States.ONLINE.name > <http://OnlineOfflineSMD.States.ONLINE.name>(), 1);`* > Adding a resource MyResource: with 1 partitions and 2 replicas > OnlineOfflineStateModelFactory.onBecomeOnlineFromOffline():localhost_12000 > transitioning from OFFLINE to ONLINE for MyResource MyResource_0 > CLUSTER STATE: After starting 4 nodes > localhost_12000 localhost_12001 localhost_12002 localhost_12003 > MyResource_0 *ONLINE* - - - > ################################################################### > ADDING NEW NODE :localhost_12004. Partitions will move from old nodes to > the new node. > CLUSTER STATE: After adding the 5 node > localhost_12000 localhost_12001 localhost_12002 localhost_12003 > localhost_12004 > MyResource_0 *ONLINE* - - - - > ################################################################### > STOPPING localhost_12004. Leadership will be transferred to the remaining > nodes > CLUSTER STATE: After the node 5 stops/crashes > localhost_12000 localhost_12001 localhost_12002 localhost_12003 > localhost_12004 > MyResource_0 *ONLINE* - - - - > ################################################################### > > > > *With `builder.dynamicUpperBound(OnlineOfflineSMD.States.ONLINE.name > <http://OnlineOfflineSMD.States.ONLINE.name>(), "R");`* > Adding a resource MyResource: with 1 partitions and 2 replicas > OnlineOfflineStateModelFactory.onBecomeOnlineFromOffline():localhost_12001 > transitioning from OFFLINE to ONLINE for MyResource MyResource_0 > OnlineOfflineStateModelFactory.onBecomeOnlineFromOffline():localhost_12000 > transitioning from OFFLINE to ONLINE for MyResource MyResource_0 > CLUSTER STATE: After starting 4 nodes > localhost_12000 localhost_12001 localhost_12002 localhost_12003 > MyResource_0 *ONLINE* *ONLINE* - - > ################################################################### > ADDING NEW NODE :localhost_12004. Partitions will move from old nodes to > the new node. > OnlineOfflineStateModelFactory.onBecomeOnlineFromOffline():localhost_12002 > transitioning from OFFLINE to ONLINE for MyResource MyResource_0 > CLUSTER STATE: After adding the 5 node > localhost_12000 localhost_12001 localhost_12002 localhost_12003 > localhost_12004 > MyResource_0 *ONLINE* *ONLINE* *ONLINE* - - > ################################################################### > STOPPING localhost_12004. Leadership will be transferred to the remaining > nodes > CLUSTER STATE: After the node 5 stops/crashes > localhost_12000 localhost_12001 localhost_12002 localhost_12003 > localhost_12004 > MyResource_0 *ONLINE* *ONLINE* *ONLINE* - - > ################################################################### > > Best regards, > Grainier Perera. >
