To be more specific: I've tried following the example on page 25/26 of this document to the teeth: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
And it does work as advertised. When I stop corosync, the resource goes to the other node. I start corosync and it remains there as it should. However, if I simply unplug the ethernet connection, let the resource migrate, then plug it back in, it will fail back to the original node. Is this the intended behavior? It seems a bad NIC could wreck havoc on such a setup. Thanks! Daniel On May 16, 2011, at 5:33 PM, Daniel Bozeman wrote: > For the life of me, I cannot prevent auto-failback from occurring in a > master-slave setup I have in virtual machines. I have a very simple > configuration: > > node $id="4fe75075-333c-4614-8a8a-87149c7c9fbb" ha2 \ > attributes standby="off" > node $id="70718968-41b5-4aee-ace1-431b5b65fd52" ha1 \ > attributes standby="off" > primitive FAILOVER-IP ocf:heartbeat:IPaddr \ > params ip="192.168.1.79" \ > op monitor interval="10s" > primitive PGPOOL lsb:pgpool2 \ > op monitor interval="10s" > group PGPOOL-AND-IP FAILOVER-IP PGPOOL > colocation IP-WITH-PGPOOL inf: FAILOVER-IP PGPOOL > property $id="cib-bootstrap-options" \ > dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \ > cluster-infrastructure="Heartbeat" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" > rsc_defaults $id="rsc-options" \ > resource-stickiness="1000" > > No matter what I do with resource stickiness, I cannot prevent fail-back. I > usually don't have a problem with failback when I restart the current master, > but when I disable network connectivity to the master, everything fails over > fine. Then I enable the network adapter and everything jumps back to the > original "failed" node. I've done some "watch ptest -Ls"ing, and the scores > seem to signify that failback should not occur. I'm also seeing resources > bounce more times than necessary when a node is added (~3 times each) and > resources seem to bounce when a node returns to the cluster even if it isn't > necessary for them to do so. I also had an order directive in my > configuration at one time, and often the second resource would start, then > stop, then allow the first resource to start, then start itself. Quite weird. > Any nods in the right direction would be greatly appreciated. I've scoured > Google and read the official documentation to no avail. I suppose I should > mention I am using heartbeat as well. My LSB resource implements > start/stop/status properly without error. > > I've been testing this with a floating IP + Postgres as well with the same > issues. One thing I notice is that my "group" resources have no score. Is > this normal? There doesn't seem to be any way to assign a stickiness to a > group, and default stickiness has no effect. > > Thanks! > > Daniel Bozeman Daniel Bozeman American Roamer Systems Administrator daniel.boze...@americanroamer.com
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker