[Pacemaker] resources not migrating when some are not runnable on one node, maybe because of groups or master/slave clones?

Phil Frost Mon, 18 Jun 2012 06:45:20 -0700

I'm attempting to configure an NFS cluster, and I've observed that undersome failure conditions, resources that depend on a failed resourcesimply stop, and no migration to another node is attempted, even thougha manual migration demonstrates the other node can run all resources,and the resources will remain on the good node even after the migrationconstraint is removed.


I was able to reduce the configuration to this:


node storage01
node storage02
primitive drbd_nfsexports ocf:pacemaker:Stateful
primitive fs_test ocf:pacemaker:Dummy
primitive vg_nfsexports ocf:pacemaker:Dummy
group test fs_test
ms drbd_nfsexports_ms drbd_nfsexports \
        meta master-max="1" master-node-max="1" \
        clone-max="2" clone-node-max="1" \
        notify="true" target-role="Started"
location l fs_test -inf: storage02

colocation colo_drbd_master inf: ( test ) ( vg_nfsexports ) (drbd_nfsexports_ms:Master )

property $id="cib-bootstrap-options" \
        no-quorum-policy="ignore" \
        stonith-enabled="false" \
        dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        last-lrm-refresh="1339793579"

The location constraint "l" exists only to demonstrate the problem; Iadded it to simulate the NFS server being unrunnable on one node.

To see the issue I'm experiencing, put storage01 in standby to forceeverything on storage02. fs_test will not be able to run. Now bringstorage01, which can satisfy all the constraints, and see that nomigration takes place. Put storage02 in standby, and everything willmigrate to storage01 and start successfully. Take storage02 out ofstandby, and the services remain on storage01. This demonstrates thateven though there is a clear "best" solution where all resources canrun, Pacemaker isn't finding it.


So far, I've noticed any of the following changes will "fix" the problem:

- removing colo_drbd_master
- removing any one resource from colo_drbd_master

- eliminating the group "test" and referencing fs_test directly inconstraints

- using a simple clone instead of a master/slave pair for drbd_nfsexports_ms

My current understanding is that if there exists a way to run allresources, Pacemaker should find it and prefer it. Is that not the case?Maybe I need to restructure my colocation constraint somehow? Obviouslythis is a much reduced version of a more complex practicalconfiguration, so I'm trying to understand the underlying mechanismsmore than just the solution to this particular scenario.

In particular, I'm not really sure how I inspect what Pacemaker isthinking when it places resources. I've tried running crm_simulate -LRs,but I'm a little bit unclear on how to interpret the results. In theoutput, I do see this:


drbd_nfsexports:1 promotion score on storage02: 10
drbd_nfsexports:0 promotion score on storage01: 5

those numbers seem to account for the default stickiness of 1 formaster/slave resources, but don't seem to incorporate at all thecolocation constraints. Is that expected?



_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] resources not migrating when some are not runnable on one node, maybe because of groups or master/slave clones?

Reply via email to