Hi Ken, I managed to reproduce this on a simplified version of the cluster, and on Pacemaker 1.1.15, 1.1.16, as well as 1.1.18-rc1
The steps to create the cluster are: pcs property set stonith-enabled=false pcs property set placement-strategy=balanced pcs node utilization vm1 cpu=100 pcs node utilization vm2 cpu=100 pcs node utilization vm3 cpu=100 pcs property set maintenance-mode=true pcs resource create sv-fencer ocf:pacemaker:Dummy pcs resource create sv ocf:pacemaker:Dummy clone notify=false pcs resource create std ocf:pacemaker:Dummy meta resource-stickiness=100 pcs resource create partition1 ocf:pacemaker:Dummy meta resource-stickiness=100 pcs resource create partition2 ocf:pacemaker:Dummy meta resource-stickiness=100 pcs resource create partition3 ocf:pacemaker:Dummy meta resource-stickiness=100 pcs resource utilization partition1 cpu=5 pcs resource utilization partition2 cpu=5 pcs resource utilization partition3 cpu=5 pcs constraint colocation add std with sv-clone INFINITY pcs constraint colocation add partition1 with sv-clone INFINITY pcs constraint colocation add partition2 with sv-clone INFINITY pcs constraint colocation add partition3 with sv-clone INFINITY pcs property set maintenance-mode=false I can then reproduce the issues in the following way: $ pcs resource sv-fencer (ocf::pacemaker:Dummy): Started vm1 Clone Set: sv-clone [sv] Started: [ vm1 vm2 vm3 ] std (ocf::pacemaker:Dummy): Started vm2 partition1 (ocf::pacemaker:Dummy): Started vm3 partition2 (ocf::pacemaker:Dummy): Started vm1 partition3 (ocf::pacemaker:Dummy): Started vm2 $ pcs cluster standby vm3 # Check that all resources have moved off vm3 $ pcs resource sv-fencer (ocf::pacemaker:Dummy): Started vm1 Clone Set: sv-clone [sv] Started: [ vm1 vm2 ] Stopped: [ vm3 ] std (ocf::pacemaker:Dummy): Started vm2 partition1 (ocf::pacemaker:Dummy): Started vm1 partition2 (ocf::pacemaker:Dummy): Started vm1 partition3 (ocf::pacemaker:Dummy): Started vm2 # Wait for any outstanding actions to complete. $ crm_resource --wait --timeout 300 Pending actions: Action 22: sv-fencer_monitor_10000 on vm2 Action 21: sv-fencer_start_0 on vm2 Action 20: sv-fencer_stop_0 on vm1 Error performing operation: Timer expired # Check the resources again - sv-fencer is still on vm1 $ pcs resource sv-fencer (ocf::pacemaker:Dummy): Started vm1 Clone Set: sv-clone [sv] Started: [ vm1 vm2 ] Stopped: [ vm3 ] std (ocf::pacemaker:Dummy): Started vm2 partition1 (ocf::pacemaker:Dummy): Started vm1 partition2 (ocf::pacemaker:Dummy): Started vm1 partition3 (ocf::pacemaker:Dummy): Started vm2 # Perform a random update to the CIB. $ pcs resource update std op monitor interval=20 timeout=20 # Check resource status again - sv_fencer has now moved to vm2 (the action crm_resource was waiting for) $ pcs resource sv-fencer (ocf::pacemaker:Dummy): Started vm2 <<<============ Clone Set: sv-clone [sv] Started: [ vm1 vm2 ] Stopped: [ vm3 ] std (ocf::pacemaker:Dummy): Started vm2 partition1 (ocf::pacemaker:Dummy): Started vm1 partition2 (ocf::pacemaker:Dummy): Started vm1 partition3 (ocf::pacemaker:Dummy): Started vm2 I do not get the problem if I: 1) remove the "std" resource; or 2) remove the co-location constraints; or 3) remove the utilization attributes for the partition resources. In these cases the sv-fencer resource is happy to stay on vm1, and crm_resource --wait returns immediately. It looks like the pcs cluster standby call is creating/registering the actions to move the sv-fencer resource to vm2, but it doesn't include it in the cluster transition. When the CIB is later updated by something else, the action is included in that transition. Regards, Leon
_______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org