I was originally using heartbeat and my original config as I mentioned in my first post, but I moved on to set up a config identical to that in the documentation for troubleshooting.
Why is the "on-fail=standby" not optimal? I have tried this in the past but it did not help. As far as I can tell, pacemaker does not consider a loss of network connectivity a failure on the part of the server itself or any of its resources. As I've said, everything works fine should I kill a process, kill corosync, etc. I think this may be what I am looking for: http://www.clusterlabs.org/wiki/Example_configurations#Set_up_pingd But I am still having issues. How can I reset the scores once the node has been recovered? Is there some sort of "score reset" command? Once the node is set to -INF as this example shows, nothing is going to return to it. Thank you all for your help On May 18, 2011, at 4:02 AM, Dan Frincu wrote: > Hi, > > On Wed, May 18, 2011 at 11:30 AM, Max Williams <max.willi...@betfair.com> > wrote: > Hi Daniel, > > You might want to set “on-fail=standby” for the resource group or individual > resources. This will put the host in to standby when a failure occurs thus > preventing failback: > > > This is not the most optimal solution. > > http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-operations.html#s-resource-failure > > > Another option is to set resource stickiness which will stop resources moving > back after a failure: > > http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch05s03s02.html > > > That is set globally in his config. > > > Also note if you are using a two node cluster you will also need the property > “no-quorum-policy=ignore” set. > > > This as well. > > > Hope that helps! > > Cheers, > > Max > > > From: Daniel Bozeman [mailto:daniel.boze...@americanroamer.com] > Sent: 17 May 2011 19:09 > To: pacemaker@oss.clusterlabs.org > Subject: Re: [Pacemaker] Preventing auto-fail-back > > > To be more specific: > > > I've tried following the example on page 25/26 of this document to the teeth: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > > Well, not really, that's why there are errors in your config. > > > And it does work as advertised. When I stop corosync, the resource goes to > the other node. I start corosync and it remains there as it should. > > > However, if I simply unplug the ethernet connection, let the resource > migrate, then plug it back in, it will fail back to the original node. Is > this the intended behavior? It seems a bad NIC could wreck havoc on such a > setup. > > > Thanks! > > > Daniel > > > On May 16, 2011, at 5:33 PM, Daniel Bozeman wrote: > > > > > For the life of me, I cannot prevent auto-failback from occurring in a > master-slave setup I have in virtual machines. I have a very simple > configuration: > > node $id="4fe75075-333c-4614-8a8a-87149c7c9fbb" ha2 \ > attributes standby="off" > node $id="70718968-41b5-4aee-ace1-431b5b65fd52" ha1 \ > attributes standby="off" > primitive FAILOVER-IP ocf:heartbeat:IPaddr \ > params ip="192.168.1.79" \ > op monitor interval="10s" > primitive PGPOOL lsb:pgpool2 \ > op monitor interval="10s" > group PGPOOL-AND-IP FAILOVER-IP PGPOOL > colocation IP-WITH-PGPOOL inf: FAILOVER-IP PGPOOL > property $id="cib-bootstrap-options" \ > dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \ > > Change to cluster-infrastructure="openais" > > cluster-infrastructure="Heartbeat" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" > > You're missing expected-quorum-votes here, it should be > expected-quorum-votes="2" and it's usually added automatically when the nodes > are added/seen to/by the cluster, I assume it's related to the > cluster-infrastructure="Heartbeat". > > Regards, > Dan > > rsc_defaults $id="rsc-options" \ > resource-stickiness="1000" > > No matter what I do with resource stickiness, I cannot prevent fail-back. I > usually don't have a problem with failback when I restart the current master, > but when I disable network connectivity to the master, everything fails over > fine. Then I enable the network adapter and everything jumps back to the > original "failed" node. I've done some "watch ptest -Ls"ing, and the scores > seem to signify that failback should not occur. I'm also seeing resources > bounce more times than necessary when a node is added (~3 times each) and > resources seem to bounce when a node returns to the cluster even if it isn't > necessary for them to do so. I also had an order directive in my > configuration at one time, and often the second resource would start, then > stop, then allow the first resource to start, then start itself. Quite weird. > Any nods in the right direction would be greatly appreciated. I've scoured > Google and read the official documentation to no avail. I suppose I should > mention I am using heartbeat as well. My LSB resource implements > start/stop/status properly without error. > > I've been testing this with a floating IP + Postgres as well with the same > issues. One thing I notice is that my "group" resources have no score. Is > this normal? There doesn't seem to be any way to assign a stickiness to a > group, and default stickiness has no effect. > > Thanks! > > Daniel Bozeman > > > Daniel Bozeman > American Roamer > Systems Administrator > daniel.boze...@americanroamer.com > > > > ________________________________________________________________________ > In order to protect our email recipients, Betfair Group use SkyScan from > MessageLabs to scan all Incoming and Outgoing mail for viruses. > > ________________________________________________________________________ > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > > > > -- > Dan Frincu > CCNA, RHCE > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker Daniel Bozeman American Roamer Systems Administrator daniel.boze...@americanroamer.com
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker