Hi Helix Dev Team, I'm currently working on a project involving Apache Helix and have encountered a scenario that raised some questions regarding configuration. I'd like to seek your guidance on the following:
I've created a sample application[1] using Helix, where I'm adding a resource named *MyResource* with *1 partition and 2 replicas* to a *4-node cluster*. This cluster uses the OnlineOfflineStateModel. In this sample, I'm setting the dynamicUpperBound of the ONLINE state to *R*. When I run the sample and trigger a rebalance, I observe that eventually, there are *3* ONLINE instances of the resource, even though the replica count is specified as *2*. However, if I set the dynamicUpperBound of the ONLINE state to 1, I consistently see only 1 instance of the resource throughout the test. My question is: why am I getting 3 ONLINE instances of the resource when the replica count is set to 2, and the dynamicUpperBound is set to R? Additionally, when using FULL_AUTO as the rebalance mode, I'm curious to know if there is a command-line interface (CLI) command or method that allows me to determine which partition/replica is deployed on which node. Is there a specific CLI command for this purpose? Your insights and assistance on these matters would be greatly appreciated. Thank you for your time and support. [1] https://gist.github.com/grainier/a2b38c1b22aa7db71789b1c023044da1 Output: *With `builder.upperBound(OnlineOfflineSMD.States.ONLINE.name <http://OnlineOfflineSMD.States.ONLINE.name>(), 1);`* Adding a resource MyResource: with 1 partitions and 2 replicas OnlineOfflineStateModelFactory.onBecomeOnlineFromOffline():localhost_12000 transitioning from OFFLINE to ONLINE for MyResource MyResource_0 CLUSTER STATE: After starting 4 nodes localhost_12000 localhost_12001 localhost_12002 localhost_12003 MyResource_0 *ONLINE* - - - ################################################################### ADDING NEW NODE :localhost_12004. Partitions will move from old nodes to the new node. CLUSTER STATE: After adding the 5 node localhost_12000 localhost_12001 localhost_12002 localhost_12003 localhost_12004 MyResource_0 *ONLINE* - - - - ################################################################### STOPPING localhost_12004. Leadership will be transferred to the remaining nodes CLUSTER STATE: After the node 5 stops/crashes localhost_12000 localhost_12001 localhost_12002 localhost_12003 localhost_12004 MyResource_0 *ONLINE* - - - - ################################################################### *With `builder.dynamicUpperBound(OnlineOfflineSMD.States.ONLINE.name <http://OnlineOfflineSMD.States.ONLINE.name>(), "R");`* Adding a resource MyResource: with 1 partitions and 2 replicas OnlineOfflineStateModelFactory.onBecomeOnlineFromOffline():localhost_12001 transitioning from OFFLINE to ONLINE for MyResource MyResource_0 OnlineOfflineStateModelFactory.onBecomeOnlineFromOffline():localhost_12000 transitioning from OFFLINE to ONLINE for MyResource MyResource_0 CLUSTER STATE: After starting 4 nodes localhost_12000 localhost_12001 localhost_12002 localhost_12003 MyResource_0 *ONLINE* *ONLINE* - - ################################################################### ADDING NEW NODE :localhost_12004. Partitions will move from old nodes to the new node. OnlineOfflineStateModelFactory.onBecomeOnlineFromOffline():localhost_12002 transitioning from OFFLINE to ONLINE for MyResource MyResource_0 CLUSTER STATE: After adding the 5 node localhost_12000 localhost_12001 localhost_12002 localhost_12003 localhost_12004 MyResource_0 *ONLINE* *ONLINE* *ONLINE* - - ################################################################### STOPPING localhost_12004. Leadership will be transferred to the remaining nodes CLUSTER STATE: After the node 5 stops/crashes localhost_12000 localhost_12001 localhost_12002 localhost_12003 localhost_12004 MyResource_0 *ONLINE* *ONLINE* *ONLINE* - - ################################################################### Best regards, Grainier Perera.
