[Pacemaker] Recovery after simple master-master failover

David Gubler Tue, 21 Feb 2012 05:08:13 -0800

Hi list,

We have two entry servers (running Apache on Debian Squeeze/Pacemaker1.0.9 with Heartbeat), both of which are active at the same time. Usersmay use any of these two servers at any time.

Now, if one of them fails, users should all be redirected to the otherserver, as transparently as possible, using two virtual IP addresses.

I absolutely don't want pacemaker interfering with Apache - All I wantit to do is monitor Apache and move the IP address if it goes down.



Thus, I set up this configuration (simplified, IPv6 removed):

node $id="101b0c74-2fd5-46a5-bb65-702cb3188c11" entry1
node $id="6ec6b85c-c44c-406d-97aa-1a8da56dc041" entry2
primitive apache ocf:heartbeat:apache \
        params statusurl="http://localhost/server-status"; \
        op monitor interval="30s" \
        meta is-managed="false"
primitive siteIp4A ocf:heartbeat:IPaddr \
        params ip="188.92.145.78" cidr_netmask="255.255.255.192" nic="eth0" \
        op monitor interval="15s"
primitive siteIp4B ocf:heartbeat:IPaddr \
        params ip="188.92.145.79" cidr_netmask="255.255.255.192" nic="eth0" \
        op monitor interval="15s"
clone apacheClone apache
colocation coloDistribute -100: siteIp4A siteIp4B
colocation coloSiteA inf: siteIp4A apacheClone
colocation coloSiteB inf: siteIp4B apacheClone
property $id="cib-bootstrap-options" \
        dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
        cluster-infrastructure="Heartbeat" \
        stonith-enabled="false" \
        last-lrm-refresh="1329758239"

Yes, I know the usual disclamer about stonith, but we don't care,because the worst thing that could happen is that both nodes take bothIP addresses, which is a risk we can totally live with. Even if thatsituation happens, pacemaker recovers from it as soon as the two nodessee each other again.

So far, so good, failover appears to work (e.g. if I simulate a monitorfailure using iptables to cut of the monitor), but:

1. After the failed Apache comes back up, pacemaker doesn't notice this,unless I do a manual resource cleanup. I think this is because themonitor is stopped on failure. I have played with

monitor on-fail="ignore" and "restart"
and
failure-timeout=60s

on the "apache" primitive, but no luck - The cluster doesn't notice thatApache is back up.I need this to happen automatically, because monitor failures can happenfrom time to time, and I do not want to use migration-threshold becauseI really want a quick failover.Yes, I know, I can do a cronjob doing cleanup every minute, but thatcannot be the way to go, right? Especially since that might have otherside effects (IPs stopped during cleanup or the like?)

2. When I reconfigure things or restart Heartbeat (and Pacemaker withit), the apache primitive can get into the "orphaned" state, which meansthat Pacemaker will stop it. While this may be reasonnable for the IPprimitives, it looks like a bug for a resource with is-managed="false"(I mean, which part of "do not start or stop this resource" doesPacemaker not understand?). Unfortunately, I couldn't find any way todisable this behaviour except for the global "stop-orphan-actions"option, which is probably not what I want. Am I missing something here?

I have spent hours trying to figure out how this is supposed to work,but no dice :(


Any help would be greatly appreciated. Thanks!

Best regards,

David

--
David Gubler
Senior Software & Operations Engineer
MeetMe: http://doodle.com/david
E-Mail: d...@doodle.com

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Recovery after simple master-master failover

Reply via email to