Hi list,

We have two entry servers (running Apache on Debian Squeeze/Pacemaker 1.0.9 with Heartbeat), both of which are active at the same time. Users may use any of these two servers at any time.

Now, if one of them fails, users should all be redirected to the other server, as transparently as possible, using two virtual IP addresses.

I absolutely don't want pacemaker interfering with Apache - All I want it to do is monitor Apache and move the IP address if it goes down.


Thus, I set up this configuration (simplified, IPv6 removed):

node $id="101b0c74-2fd5-46a5-bb65-702cb3188c11" entry1
node $id="6ec6b85c-c44c-406d-97aa-1a8da56dc041" entry2
primitive apache ocf:heartbeat:apache \
        params statusurl="http://localhost/server-status"; \
        op monitor interval="30s" \
        meta is-managed="false"
primitive siteIp4A ocf:heartbeat:IPaddr \
        params ip="188.92.145.78" cidr_netmask="255.255.255.192" nic="eth0" \
        op monitor interval="15s"
primitive siteIp4B ocf:heartbeat:IPaddr \
        params ip="188.92.145.79" cidr_netmask="255.255.255.192" nic="eth0" \
        op monitor interval="15s"
clone apacheClone apache
colocation coloDistribute -100: siteIp4A siteIp4B
colocation coloSiteA inf: siteIp4A apacheClone
colocation coloSiteB inf: siteIp4B apacheClone
property $id="cib-bootstrap-options" \
        dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
        cluster-infrastructure="Heartbeat" \
        stonith-enabled="false" \
        last-lrm-refresh="1329758239"


Yes, I know the usual disclamer about stonith, but we don't care, because the worst thing that could happen is that both nodes take both IP addresses, which is a risk we can totally live with. Even if that situation happens, pacemaker recovers from it as soon as the two nodes see each other again.

So far, so good, failover appears to work (e.g. if I simulate a monitor failure using iptables to cut of the monitor), but:

1. After the failed Apache comes back up, pacemaker doesn't notice this, unless I do a manual resource cleanup. I think this is because the monitor is stopped on failure. I have played with
monitor on-fail="ignore" and "restart"
and
failure-timeout=60s
on the "apache" primitive, but no luck - The cluster doesn't notice that Apache is back up. I need this to happen automatically, because monitor failures can happen from time to time, and I do not want to use migration-threshold because I really want a quick failover. Yes, I know, I can do a cronjob doing cleanup every minute, but that cannot be the way to go, right? Especially since that might have other side effects (IPs stopped during cleanup or the like?)

2. When I reconfigure things or restart Heartbeat (and Pacemaker with it), the apache primitive can get into the "orphaned" state, which means that Pacemaker will stop it. While this may be reasonnable for the IP primitives, it looks like a bug for a resource with is-managed="false" (I mean, which part of "do not start or stop this resource" does Pacemaker not understand?). Unfortunately, I couldn't find any way to disable this behaviour except for the global "stop-orphan-actions" option, which is probably not what I want. Am I missing something here?

I have spent hours trying to figure out how this is supposed to work, but no dice :(

Any help would be greatly appreciated. Thanks!

Best regards,

David

--
David Gubler
Senior Software & Operations Engineer
MeetMe: http://doodle.com/david
E-Mail: d...@doodle.com

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to