Sorry for the double-post--- I find pinging the network gateway (192.168.1.1) works better actually. Otherwise, the nodes will have equal pingd scores as the pingd resource is cloned.
On May 18, 2011, at 2:45 PM, Daniel Bozeman wrote: > Here is my solution for others to reference. It may not be ideal or possible > for everyone, and I am up for suggestions. > > I've got two machines connected via crossover (will be two crossovers for > redundancy in production) with static IPs. Corosync communicates over this > network. Then each machine is connected to the main network (.1.77 and .1.78) > > This way, the machines can continue to communicate with one-another despite a > network failure affecting one machine and react appropriately. > > Using postgres as a test resource, I have the following (desired) results: > > The primary node loses network connectivity and postgres is fired up on the > other. When the former primary node regains connectivity, the process does > not failback nor does it restart. > > Please see my configuration below > > node postmaster > node postslave > primitive pingd ocf:pacemaker:pingd \ > params host_list="192.168.1.77 192.168.1.78" multiplier="100" \ > op monitor interval="15s" timeout="5s" > primitive postgres lsb:postgresql \ > op monitor interval="20s" > clone pingdclone pingd \ > meta globally-unique="false" > location postgres_location postgres \ > rule $id="postgres_location-rule" pingd: defined pingd > property $id="cib-bootstrap-options" \ > dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" \ > last-lrm-refresh="1305736421" > > Naturally, this is a very simple configuration that only tests network > failure failover and failback prevention. > > Are there any downsides to my method? I'd love to hear feedback. Thank you > all for your help. "on-fail=standby" did absolutely nothing for me by the way. > > On May 18, 2011, at 9:16 AM, Daniel Bozeman wrote: > >> I was originally using heartbeat and my original config as I mentioned in my >> first post, but I moved on to set up a config identical to that in the >> documentation for troubleshooting. >> >> Why is the "on-fail=standby" not optimal? I have tried this in the past but >> it did not help. As far as I can tell, pacemaker does not consider a loss of >> network connectivity a failure on the part of the server itself or any of >> its resources. As I've said, everything works fine should I kill a process, >> kill corosync, etc. >> >> I think this may be what I am looking for: >> >> http://www.clusterlabs.org/wiki/Example_configurations#Set_up_pingd >> >> But I am still having issues. How can I reset the scores once the node has >> been recovered? Is there some sort of "score reset" command? Once the node >> is set to -INF as this example shows, nothing is going to return to it. >> >> Thank you all for your help >> >> On May 18, 2011, at 4:02 AM, Dan Frincu wrote: >> >>> Hi, >>> >>> On Wed, May 18, 2011 at 11:30 AM, Max Williams <max.willi...@betfair.com> >>> wrote: >>> Hi Daniel, >>> >>> You might want to set “on-fail=standby” for the resource group or >>> individual resources. This will put the host in to standby when a failure >>> occurs thus preventing failback: >>> >>> >>> This is not the most optimal solution. >>> >>> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-operations.html#s-resource-failure >>> >>> >>> Another option is to set resource stickiness which will stop resources >>> moving back after a failure: >>> >>> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch05s03s02.html >>> >>> >>> That is set globally in his config. >>> >>> >>> Also note if you are using a two node cluster you will also need the >>> property “no-quorum-policy=ignore” set. >>> >>> >>> This as well. >>> >>> >>> Hope that helps! >>> >>> Cheers, >>> >>> Max >>> >>> >>> From: Daniel Bozeman [mailto:daniel.boze...@americanroamer.com] >>> Sent: 17 May 2011 19:09 >>> To: pacemaker@oss.clusterlabs.org >>> Subject: Re: [Pacemaker] Preventing auto-fail-back >>> >>> >>> To be more specific: >>> >>> >>> I've tried following the example on page 25/26 of this document to the >>> teeth: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> >>> >>> Well, not really, that's why there are errors in your config. >>> >>> >>> And it does work as advertised. When I stop corosync, the resource goes to >>> the other node. I start corosync and it remains there as it should. >>> >>> >>> However, if I simply unplug the ethernet connection, let the resource >>> migrate, then plug it back in, it will fail back to the original node. Is >>> this the intended behavior? It seems a bad NIC could wreck havoc on such a >>> setup. >>> >>> >>> Thanks! >>> >>> >>> Daniel >>> >>> >>> On May 16, 2011, at 5:33 PM, Daniel Bozeman wrote: >>> >>> >>> >>> >>> For the life of me, I cannot prevent auto-failback from occurring in a >>> master-slave setup I have in virtual machines. I have a very simple >>> configuration: >>> >>> node $id="4fe75075-333c-4614-8a8a-87149c7c9fbb" ha2 \ >>> attributes standby="off" >>> node $id="70718968-41b5-4aee-ace1-431b5b65fd52" ha1 \ >>> attributes standby="off" >>> primitive FAILOVER-IP ocf:heartbeat:IPaddr \ >>> params ip="192.168.1.79" \ >>> op monitor interval="10s" >>> primitive PGPOOL lsb:pgpool2 \ >>> op monitor interval="10s" >>> group PGPOOL-AND-IP FAILOVER-IP PGPOOL >>> colocation IP-WITH-PGPOOL inf: FAILOVER-IP PGPOOL >>> property $id="cib-bootstrap-options" \ >>> dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \ >>> >>> Change to cluster-infrastructure="openais" >>> >>> cluster-infrastructure="Heartbeat" \ >>> stonith-enabled="false" \ >>> no-quorum-policy="ignore" >>> >>> You're missing expected-quorum-votes here, it should be >>> expected-quorum-votes="2" and it's usually added automatically when the >>> nodes are added/seen to/by the cluster, I assume it's related to the >>> cluster-infrastructure="Heartbeat". >>> >>> Regards, >>> Dan >>> >>> rsc_defaults $id="rsc-options" \ >>> resource-stickiness="1000" >>> >>> No matter what I do with resource stickiness, I cannot prevent fail-back. I >>> usually don't have a problem with failback when I restart the current >>> master, but when I disable network connectivity to the master, everything >>> fails over fine. Then I enable the network adapter and everything jumps >>> back to the original "failed" node. I've done some "watch ptest -Ls"ing, >>> and the scores seem to signify that failback should not occur. I'm also >>> seeing resources bounce more times than necessary when a node is added (~3 >>> times each) and resources seem to bounce when a node returns to the cluster >>> even if it isn't necessary for them to do so. I also had an order directive >>> in my configuration at one time, and often the second resource would start, >>> then stop, then allow the first resource to start, then start itself. Quite >>> weird. Any nods in the right direction would be greatly appreciated. I've >>> scoured Google and read the official documentation to no avail. I suppose I >>> should mention I am using heartbeat as well. My LSB resource implements >>> start/stop/status properly without error. >>> >>> I've been testing this with a floating IP + Postgres as well with the same >>> issues. One thing I notice is that my "group" resources have no score. Is >>> this normal? There doesn't seem to be any way to assign a stickiness to a >>> group, and default stickiness has no effect. >>> >>> Thanks! >>> >>> Daniel Bozeman >>> >>> >>> Daniel Bozeman >>> American Roamer >>> Systems Administrator >>> daniel.boze...@americanroamer.com >>> >>> >>> >>> ________________________________________________________________________ >>> In order to protect our email recipients, Betfair Group use SkyScan from >>> MessageLabs to scan all Incoming and Outgoing mail for viruses. >>> >>> ________________________________________________________________________ >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: >>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >>> >>> >>> >>> >>> -- >>> Dan Frincu >>> CCNA, RHCE >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: >>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >> >> Daniel Bozeman >> American Roamer >> Systems Administrator >> daniel.boze...@americanroamer.com >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > Daniel Bozeman > American Roamer > Systems Administrator > daniel.boze...@americanroamer.com > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker Daniel Bozeman American Roamer Systems Administrator daniel.boze...@americanroamer.com
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker