I was originally using heartbeat and my original config as I mentioned in my 
first post, but I moved on to set up a config identical to that in the 
documentation for troubleshooting.

Why is the "on-fail=standby" not optimal? I have tried this in the past but it 
did not help. As far as I can tell, pacemaker does not consider a loss of 
network connectivity a failure on the part of the server itself or any of its 
resources. As I've said, everything works fine should I kill a process, kill 
corosync, etc.

I think this may be what I am looking for:

http://www.clusterlabs.org/wiki/Example_configurations#Set_up_pingd

But I am still having issues. How can I reset the scores once the node has been 
recovered? Is there some sort of "score reset" command? Once the node is set to 
-INF as this example shows, nothing is going to return to it.

Thank you all for your help

On May 18, 2011, at 4:02 AM, Dan Frincu wrote:

> Hi,
> 
> On Wed, May 18, 2011 at 11:30 AM, Max Williams <max.willi...@betfair.com> 
> wrote:
> Hi Daniel,
> 
> You might want to set “on-fail=standby” for the resource group or individual 
> resources. This will put the host in to standby when a failure occurs thus 
> preventing failback:
> 
> 
> This is not the most optimal solution.
>  
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-operations.html#s-resource-failure
> 
>  
> Another option is to set resource stickiness which will stop resources moving 
> back after a failure:
> 
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch05s03s02.html
> 
> 
> That is set globally in his config.
>  
>  
> Also note if you are using a two node cluster you will also need the property 
> “no-quorum-policy=ignore” set.
> 
> 
> This as well.
>  
>  
> Hope that helps!
> 
> Cheers,
> 
> Max
> 
>  
> From: Daniel Bozeman [mailto:daniel.boze...@americanroamer.com] 
> Sent: 17 May 2011 19:09
> To: pacemaker@oss.clusterlabs.org
> Subject: Re: [Pacemaker] Preventing auto-fail-back
> 
>  
> To be more specific:
> 
>  
> I've tried following the example on page 25/26 of this document to the teeth: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> 
> 
> Well, not really, that's why there are errors in your config.
>  
>  
> And it does work as advertised. When I stop corosync, the resource goes to 
> the other node. I start corosync and it remains there as it should.
> 
>  
> However, if I simply unplug the ethernet connection, let the resource 
> migrate, then plug it back in, it will fail back to the original node. Is 
> this the intended behavior? It seems a bad NIC could wreck havoc on such a 
> setup.
> 
>  
> Thanks!
> 
>  
> Daniel
> 
>  
> On May 16, 2011, at 5:33 PM, Daniel Bozeman wrote:
> 
> 
> 
> 
> For the life of me, I cannot prevent auto-failback from occurring in a 
> master-slave setup I have in virtual machines. I have a very simple 
> configuration:
> 
> node $id="4fe75075-333c-4614-8a8a-87149c7c9fbb" ha2 \
>        attributes standby="off"
> node $id="70718968-41b5-4aee-ace1-431b5b65fd52" ha1 \
>        attributes standby="off"
> primitive FAILOVER-IP ocf:heartbeat:IPaddr \
>        params ip="192.168.1.79" \
>        op monitor interval="10s"
> primitive PGPOOL lsb:pgpool2 \
>        op monitor interval="10s"
> group PGPOOL-AND-IP FAILOVER-IP PGPOOL
> colocation IP-WITH-PGPOOL inf: FAILOVER-IP PGPOOL
> property $id="cib-bootstrap-options" \
>        dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
> 
> Change to cluster-infrastructure="openais"
>  
>        cluster-infrastructure="Heartbeat" \
>        stonith-enabled="false" \
>        no-quorum-policy="ignore"
> 
> You're missing expected-quorum-votes here, it should be 
> expected-quorum-votes="2" and it's usually added automatically when the nodes 
> are added/seen to/by the cluster, I assume it's related to the 
> cluster-infrastructure="Heartbeat".
> 
> Regards,
> Dan
>  
> rsc_defaults $id="rsc-options" \
>        resource-stickiness="1000"
> 
> No matter what I do with resource stickiness, I cannot prevent fail-back. I 
> usually don't have a problem with failback when I restart the current master, 
> but when I disable network connectivity to the master, everything fails over 
> fine. Then I enable the network adapter and everything jumps back to the 
> original "failed" node. I've done some "watch ptest -Ls"ing, and the scores 
> seem to signify that failback should not occur. I'm also seeing resources 
> bounce more times than necessary when a node is added (~3 times each) and 
> resources seem to bounce when a node returns to the cluster even if it isn't 
> necessary for them to do so. I also had an order directive in my 
> configuration at one time, and often the second resource would start, then 
> stop, then allow the first resource to start, then start itself. Quite weird. 
> Any nods in the right direction would be greatly appreciated. I've scoured 
> Google and read the official documentation to no avail. I suppose I should 
> mention I am using heartbeat as well. My LSB resource implements 
> start/stop/status properly without error.
> 
> I've been testing this with a floating IP + Postgres as well with the same 
> issues. One thing I notice is that my "group" resources have no score. Is 
> this normal? There doesn't seem to be any way to assign a stickiness to a 
> group, and default stickiness has no effect.
> 
> Thanks!
> 
> Daniel Bozeman
> 
>  
> Daniel Bozeman
> American Roamer
> Systems Administrator
> daniel.boze...@americanroamer.com
> 
>  
> 
> ________________________________________________________________________
> In order to protect our email recipients, Betfair Group use SkyScan from 
> MessageLabs to scan all Incoming and Outgoing mail for viruses.
> 
> ________________________________________________________________________
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> 
> 
> 
> 
> -- 
> Dan Frincu
> CCNA, RHCE
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Daniel Bozeman
American Roamer
Systems Administrator
daniel.boze...@americanroamer.com

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Reply via email to