Hi Helix Dev Team,

I'm currently working on a project involving Apache Helix and have
encountered a scenario that raised some questions regarding configuration.
I'd like to seek your guidance on the following:

I've created a sample application[1] using Helix, where I'm adding a
resource named *MyResource* with *1 partition and 2 replicas* to a *4-node
cluster*. This cluster uses the OnlineOfflineStateModel. In this sample,
I'm setting the dynamicUpperBound of the ONLINE state to *R*.

When I run the sample and trigger a rebalance, I observe that eventually,
there are *3* ONLINE instances of the resource, even though the replica
count is specified as *2*. However, if I set the dynamicUpperBound of the
ONLINE state to 1, I consistently see only 1 instance of the resource
throughout the test.

My question is: why am I getting 3 ONLINE instances of the resource when
the replica count is set to 2, and the dynamicUpperBound is set to R?

Additionally, when using FULL_AUTO as the rebalance mode, I'm curious to
know if there is a command-line interface (CLI) command or method that
allows me to determine which partition/replica is deployed on which node.
Is there a specific CLI command for this purpose?

Your insights and assistance on these matters would be greatly appreciated.
Thank you for your time and support.

[1] https://gist.github.com/grainier/a2b38c1b22aa7db71789b1c023044da1

Output:

*With `builder.upperBound(OnlineOfflineSMD.States.ONLINE.name
<http://OnlineOfflineSMD.States.ONLINE.name>(), 1);`*
Adding a resource MyResource:  with 1 partitions and 2 replicas
OnlineOfflineStateModelFactory.onBecomeOnlineFromOffline():localhost_12000
transitioning from OFFLINE to ONLINE for MyResource MyResource_0
CLUSTER STATE: After starting 4 nodes
localhost_12000 localhost_12001 localhost_12002 localhost_12003
MyResource_0 *ONLINE* - - -
###################################################################
ADDING NEW NODE :localhost_12004. Partitions will move from old nodes to
the new node.
CLUSTER STATE: After adding the 5 node
localhost_12000 localhost_12001 localhost_12002 localhost_12003
localhost_12004
MyResource_0 *ONLINE* - - - -
###################################################################
STOPPING localhost_12004. Leadership will be transferred to the remaining
nodes
CLUSTER STATE: After the node 5 stops/crashes
localhost_12000 localhost_12001 localhost_12002 localhost_12003
localhost_12004
MyResource_0 *ONLINE* - - - -
###################################################################



*With `builder.dynamicUpperBound(OnlineOfflineSMD.States.ONLINE.name
<http://OnlineOfflineSMD.States.ONLINE.name>(), "R");`*
Adding a resource MyResource:  with 1 partitions and 2 replicas
OnlineOfflineStateModelFactory.onBecomeOnlineFromOffline():localhost_12001
transitioning from OFFLINE to ONLINE for MyResource MyResource_0
OnlineOfflineStateModelFactory.onBecomeOnlineFromOffline():localhost_12000
transitioning from OFFLINE to ONLINE for MyResource MyResource_0
CLUSTER STATE: After starting 4 nodes
localhost_12000 localhost_12001 localhost_12002 localhost_12003
MyResource_0 *ONLINE* *ONLINE* - -
###################################################################
ADDING NEW NODE :localhost_12004. Partitions will move from old nodes to
the new node.
OnlineOfflineStateModelFactory.onBecomeOnlineFromOffline():localhost_12002
transitioning from OFFLINE to ONLINE for MyResource MyResource_0
CLUSTER STATE: After adding the 5 node
localhost_12000 localhost_12001 localhost_12002 localhost_12003
localhost_12004
MyResource_0 *ONLINE* *ONLINE* *ONLINE* - -
###################################################################
STOPPING localhost_12004. Leadership will be transferred to the remaining
nodes
CLUSTER STATE: After the node 5 stops/crashes
localhost_12000 localhost_12001 localhost_12002 localhost_12003
localhost_12004
MyResource_0 *ONLINE* *ONLINE* *ONLINE* - -
###################################################################

Best regards,
Grainier Perera.

Reply via email to