Hi, On Wed, May 18, 2011 at 11:30 AM, Max Williams <max.willi...@betfair.com>wrote:
> Hi Daniel, > > You might want to set “on-fail=standby” for the resource group or > individual resources. This will put the host in to standby when a failure > occurs thus preventing failback: > This is not the most optimal solution. > > http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-operations.html#s-resource-failure > > > > Another option is to set resource stickiness which will stop resources > moving back after a failure: > > > http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch05s03s02.html > That is set globally in his config. > > > Also note if you are using a two node cluster you will also need the > property “no-quorum-policy=ignore” set. > This as well. > > > Hope that helps! > > Cheers, > > Max > > > > *From:* Daniel Bozeman [mailto:daniel.boze...@americanroamer.com] > *Sent:* 17 May 2011 19:09 > *To:* pacemaker@oss.clusterlabs.org > *Subject:* Re: [Pacemaker] Preventing auto-fail-back > > > > To be more specific: > > > > I've tried following the example on page 25/26 of this document to the > teeth: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Well, not really, that's why there are errors in your config. > > > And it does work as advertised. When I stop corosync, the resource goes to > the other node. I start corosync and it remains there as it should. > > > > However, if I simply unplug the ethernet connection, let the resource > migrate, then plug it back in, it will fail back to the original node. Is > this the intended behavior? It seems a bad NIC could wreck havoc on such a > setup. > > > > Thanks! > > > > Daniel > > > > On May 16, 2011, at 5:33 PM, Daniel Bozeman wrote: > > > > For the life of me, I cannot prevent auto-failback from occurring in a > master-slave setup I have in virtual machines. I have a very simple > configuration: > > node $id="4fe75075-333c-4614-8a8a-87149c7c9fbb" ha2 \ > attributes standby="off" > node $id="70718968-41b5-4aee-ace1-431b5b65fd52" ha1 \ > attributes standby="off" > primitive FAILOVER-IP ocf:heartbeat:IPaddr \ > params ip="192.168.1.79" \ > op monitor interval="10s" > primitive PGPOOL lsb:pgpool2 \ > op monitor interval="10s" > group PGPOOL-AND-IP FAILOVER-IP PGPOOL > colocation IP-WITH-PGPOOL inf: FAILOVER-IP PGPOOL > property $id="cib-bootstrap-options" \ > dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \ > Change to cluster-infrastructure="openais" > cluster-infrastructure="Heartbeat" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" > You're missing expected-quorum-votes here, it should be expected-quorum-votes="2" and it's usually added automatically when the nodes are added/seen to/by the cluster, I assume it's related to the cluster-infrastructure="Heartbeat". Regards, Dan > rsc_defaults $id="rsc-options" \ > resource-stickiness="1000" > > No matter what I do with resource stickiness, I cannot prevent fail-back. I > usually don't have a problem with failback when I restart the current > master, but when I disable network connectivity to the master, everything > fails over fine. Then I enable the network adapter and everything jumps back > to the original "failed" node. I've done some "watch ptest -Ls"ing, and the > scores seem to signify that failback should not occur. I'm also seeing > resources bounce more times than necessary when a node is added (~3 times > each) and resources seem to bounce when a node returns to the cluster even > if it isn't necessary for them to do so. I also had an order directive in my > configuration at one time, and often the second resource would start, then > stop, then allow the first resource to start, then start itself. Quite > weird. Any nods in the right direction would be greatly appreciated. I've > scoured Google and read the official documentation to no avail. I suppose I > should mention I am using heartbeat as well. My LSB resource implements > start/stop/status properly without error. > > I've been testing this with a floating IP + Postgres as well with the same > issues. One thing I notice is that my "group" resources have no score. Is > this normal? There doesn't seem to be any way to assign a stickiness to a > group, and default stickiness has no effect. > > Thanks! > > Daniel Bozeman > > > > Daniel Bozeman > American Roamer > Systems Administrator > daniel.boze...@americanroamer.com > > > > ________________________________________________________________________ > In order to protect our email recipients, Betfair Group use SkyScan from > MessageLabs to scan all Incoming and Outgoing mail for viruses. > > ________________________________________________________________________ > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > -- Dan Frincu CCNA, RHCE
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker