Re: [ClusterLabs] Restarting parent of ordered clone resources on specific node causes restart of all resources in the ordering constraint on all nodes of the cluster

ChittaNagaraj, Raghav Tue, 12 Apr 2022 06:52:15 -0700

I was able to fix this by using meta interleave=true on the clones(step 2 below)


New steps:

  1.  Resource creates(same as before):

$ sudo pcs resource create test-1 ocf:heartbeat:Dummy op monitor timeout="20" 
interval="10" clone

$ sudo pcs resource create test-2 ocf:heartbeat:Dummy op monitor timeout="20" 
interval="10" clone

$ sudo pcs resource create test-3 ocf:heartbeat:Dummy op monitor timeout="20" 
interval="10" clone




  1.  Add meta interleave=true on the clones explicitly via update. Adding meta 
interleave=true to the create above DOES NOT work:

$ sudo pcs resource update test-2-clone meta interleave=true

$ sudo pcs resource update test-3-clone meta interleave=true



  1.  Then order them(same as before):

$ sudo pcs constraint order test-1-clone then test-2-clone

Adding test-1-clone test-2-clone (kind: Mandatory) (Options: first-action=start 
then-action=start)

$ sudo pcs constraint order test-2-clone then test-3-clone

Adding test-2-clone test-3-clone (kind: Mandatory) (Options: first-action=start 
then-action=start)



  1.  Then when I restart test-1-clone(same as before), only the resources on 
the affected nodes, restart:

$ sudo pcs resource restart test-1 node1_a

Warning: using test-1-clone... (if a resource is a clone, master/slave or 
bundle you must use the clone, master/slave or bundle name)

test-1-clone successfully restarted



  1.  Result of step 4 above:
Apr 11 17:58:39 NODE1-B pacemaker-schedulerd[103050]:  notice:  * Stop       
test-1:0                                   (                 node1_a )   due to 
node availability
Apr 11 17:58:39 NODE1-B pacemaker-schedulerd[103050]:  notice:  * Stop       
test-2:0                                   (                 node1_a )
Apr 11 17:58:39 NODE1-B pacemaker-schedulerd[103050]:  notice:  * Stop       
test-3:0                                   (                 node1_a )   due to 
unrunnable test-2:0 start
Apr 11 17:58:39 NODE1-B pacemaker-controld[103051]:  notice: Initiating stop 
operation test-3_stop_0 on node1_a
Apr 11 17:58:39 NODE1-B pacemaker-controld[103051]:  notice: Initiating stop 
operation test-2_stop_0 on node1_a
Apr 11 17:58:39 NODE1-B pacemaker-controld[103051]:  notice: Initiating stop 
operation test-1_stop_0 on node1_a
Apr 11 17:58:41 NODE1-B pacemaker-schedulerd[103050]:  notice:  * Start      
test-1:3                                   (                 node1_a )
Apr 11 17:58:41 NODE1-B pacemaker-schedulerd[103050]:  notice:  * Start      
test-2:3                                   (                 node1_a )
Apr 11 17:58:41 NODE1-B pacemaker-schedulerd[103050]:  notice:  * Start      
test-3:3                                   (                 node1_a )



What I would like to call out is the documentation here does not explicitly 
state this behavior - 
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/high_availability_add-on_reference/ch-advancedresource-haar


interleave
Changes the behavior of ordering constraints (between clones/masters) so that 
copies of the first clone can start or stop as soon as the copy on the same 
node of the second clone has started or stopped (rather than waiting until 
every instance of the second clone has started or stopped). Allowed values: 
false, true. The default value is false.

"stopped (rather than waiting until every instance of the second clone has 
started or stopped)" - this may suggest this implicitly but definitely not 
clear.

Please let me know if I am missing something and if there is a better 
recommendation.


Thanks,
Raghav


Internal Use - Confidential
From: ChittaNagaraj, Raghav
Sent: Monday, April 11, 2022 10:59 AM
To: Strahil Nikolov; Cluster Labs - All topics related to open-source 
clustering welcomed
Cc: Haase, David; Hicks, Richard; gandhi, rajesh; Burney, Scott; Farnsworth, 
Devin
Subject: RE: [ClusterLabs] Restarting parent of ordered clone resources on 
specific node causes restart of all resources in the ordering constraint on all 
nodes of the cluster

Hello Strahil,

Thank you for your response.

The actual problem I wanted to discuss here is restart of ordered resources on 
unaffected nodes.

>From the observation in my original email:


  1.  I have 4 pacemaker nodes -

node2_a

node2_b

node1_a

node1_b



  1.  I restarted test-1 on node1_a


  1.  This restarted test-2 and test-3 clones on node1_a. This is fine as 
node1_a is the affected node
  2.  But, it also restarted test-2 and test-3  on the unaffected nodes. Below 
indicating test-2 restart on unaffected nodes node1_b, node2_b and node2_a 
which I don't want:

Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]:  notice:  * Restart    
test-2:0                                   (                               
node1_b )   due to required test-1-clone running

Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]:  notice:  * Restart    
test-2:2                                   (                               
node2_b )   due to required test-1-clone running

Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]:  notice:  * Restart    
test-2:3                                   (                               
node2_a )   due to required test-1-clone running

Please let me know if you have any further questions.


Thanks,
Raghav

From: Strahil Nikolov <[email protected]<mailto:[email protected]>>
Sent: Friday, April 8, 2022 12:00 PM
To: Cluster Labs - All topics related to open-source clustering welcomed; 
ChittaNagaraj, Raghav
Cc: Haase, David; Hicks, Richard; gandhi, rajesh; Burney, Scott; Farnsworth, 
Devin; ChittaNagaraj, Raghav
Subject: Re: [ClusterLabs] Restarting parent of ordered clone resources on 
specific node causes restart of all resources in the ordering constraint on all 
nodes of the cluster


[EXTERNAL EMAIL]
You can use 'kind' and 'symmetrical' to control order constraints. The default 
value for symmetrical is 'true' which means that in order to stop dummy1 , the 
cluster has to stop dummy1 & dummy2.

Best Regards,
Strahil Nikolov
On Fri, Apr 8, 2022 at 15:29, ChittaNagaraj, Raghav
<[email protected]<mailto:[email protected]>> wrote:

Hello Team,



Hope you are doing well.



I have a 4 node pacemaker cluster where I created clone dummy resources test-1, 
test-2 and test-3 below:



$ sudo pcs resource create test-1 ocf:heartbeat:Dummy op monitor timeout="20" 
interval="10" clone

$ sudo pcs resource create test-2 ocf:heartbeat:Dummy op monitor timeout="20" 
interval="10" clone

$ sudo pcs resource create test-3 ocf:heartbeat:Dummy op monitor timeout="20" 
interval="10" clone



Then I ordered them so test-2-clone starts after test-1-clone and test-3-clone 
starts after test-2-clone:

$ sudo pcs constraint order test-1-clone then test-2-clone

Adding test-1-clone test-2-clone (kind: Mandatory) (Options: first-action=start 
then-action=start)

$ sudo pcs constraint order test-2-clone then test-3-clone

Adding test-2-clone test-3-clone (kind: Mandatory) (Options: first-action=start 
then-action=start)



Here are my clone sets(snippet of "pcs status" output pasted below):

  * Clone Set: test-1-clone [test-1]:

    * Started: [ node2_a node2_b node1_a node1_b ]

  * Clone Set: test-2-clone [test-2]:

    * Started: [ node2_a node2_b node1_a node1_b ]

  * Clone Set: test-3-clone [test-3]:

    * Started: [ node2_a node2_b node1_a node1_b ]



Then I restart test-1 on just node1_a:

$ sudo pcs resource restart test-1 node1_a

Warning: using test-1-clone... (if a resource is a clone, master/slave or 
bundle you must use the clone, master/slave or bundle name)

test-1-clone successfully restarted





This causes test-2 and test-3 clones to restart on all pacemaker nodes when my 
intention is for them to restart on just node1_a.

Below is the log tracing seen on the Designated Controller NODE1-B:

Apr 07 20:25:01 NODE1-B pacemaker-schedulerd[95746]:  notice:  * Stop       
test-1:1                                   (                               
node1_a )   due to node availability

Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]:  notice:  * Restart    
test-2:0                                   (                               
node1_b )   due to required test-1-clone running

Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]:  notice:  * Restart    
test-2:1                                   (                               
node1_a )   due to required test-1-clone running

Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]:  notice:  * Restart    
test-2:2                                   (                               
node2_b )   due to required test-1-clone running

Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]:  notice:  * Restart    
test-2:3                                   (                               
node2_a )   due to required test-1-clone running



Above is a representation of the observed behavior using dummy resources.

Is this the expected behavior of cloned resources?



My goal is to be able to restart test-2-clone and test-3-clone on just the node 
that experienced test-1 restart rather than all other nodes in the cluster.



Please let us know if any additional information will help for you to be able 
to provide feedback.



Thanks for your help!



- Raghav


Internal Use - Confidential
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users 
[lists.clusterlabs.org]<https://urldefense.com/v3/__https:/lists.clusterlabs.org/mailman/listinfo/users__;!!LpKI!2bhPbGcGES9BufPlX3bTNMxt-PJQ3Jt8xF18oplZizQBsxfwlSe8KMEjLAhHOeP7aEE73qs$>

ClusterLabs home: https://www.clusterlabs.org/ 
[clusterlabs.org]<https://urldefense.com/v3/__https:/www.clusterlabs.org/__;!!LpKI!2bhPbGcGES9BufPlX3bTNMxt-PJQ3Jt8xF18oplZizQBsxfwlSe8KMEjLAhHOeP7uDjzH5E$>


Internal Use - Confidential

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Restarting parent of ordered clone resources on specific node causes restart of all resources in the ordering constraint on all nodes of the cluster

Reply via email to