It is not necessarily the case that the outside world can't reach the
cluster. Ours is a multi-homed device connecting to multiple WANs and
LANs. We want the device with the best connectivity to be the active
device. To get around the problem of failovers occurring when a ping
node reboots for example, I have written an fping OCF RA that uses
different dampening delays based on if it is running on the active or
idle device. I have also patched pacemaker attrd.c to fix it so it
doesn't send an immediate update when it receives a flush message from
the other node. This was causing it to ignore any running delay timer.
Here is that patch:
--- tools/attrd.orig.c 2011-09-13 08:29:46.946820348 -0500
+++ tools/attrd.c 2011-09-14 13:33:59.606894754 -0500
@@ -348,10 +348,14 @@
attrd_local_callback(xml);
} else if(ignore == NULL || safe_str_neq(from, attrd_uname)) {
+ const char *attr = crm_element_value(xml, F_ATTRD_ATTRIBUTE);
+ /* Don't send update for score if msg is from other node */
+ if(safe_str_eq(from, attrd_uname) || safe_str_neq(attr, "pingd")) {
crm_info("%s message from %s", op, from);
hash_entry = find_hash_entry(xml);
stop_attrd_timer(hash_entry);
attrd_perform_update(hash_entry);
+ }
}
free_xml(xml);
}
On 09/19/2011 10:51 PM, Andrew Beekhof wrote:
On Sun, Sep 11, 2011 at 2:30 AM, Vadym Chepkov<vchep...@gmail.com> wrote:
On Sep 8, 2011, at 3:40 PM, Florian Haas wrote:
On 09/08/11 20:59, Brad Johnson wrote:
We have a 2 node cluster with a single resource. The resource must run
on only a single node at one time. Using the pacemaker:ocf:ping RA we
are pinging a WAN gateway and a LAN host on each node so the resource
runs on the node with the greatest connectivity. The problem is when a
ping host goes down (so both nodes lose connectivity to it), the
resource moves to the other node due to timing differences in how fast
they update the score attribute. The dampening value has no effect,
since it delays both nodes by the same amount. These unnecessary
fail-overs aren't acceptable since they are disruptive to the network
for no reason.
Is there a way to dampen the ping update by different amounts on the
active and passive nodes? Or some other way to configure the cluster to
try to keep the resource where it is during these tie score scenarios?
location pingd-constraint group_1 \
rule $id="pingd-constraint-rule" pingd: defined pingd
May I suggest that you simply change this constraint to
location pingd-constraint group_1 \
rule $id="pingd-constraint-rule" \
-inf: not_defined pingd or pingd lte 0
That way, only a host that definitely has _no_ connectivity carries a
-INF score for that resource group. And I believe that is what you
really want, rather than take the actual ping score as a placement
weight (your "best connectivity" approach).
Just my 2 cents, though.
Even though this approach was recommended many times, there is a problem with
it.
What if all nodes for some reason are not able to ping ?
This rule would cause a resource to be brought down completely, whereas if you use
"best connectivity" approach it will stay up where it was before network failed.
If the outside[1] world can't reach the cluster, is there much benefit
in having it running?
[1] Substitute "outside" for wherever your users are, hopefully you
picked a ping node from the same area.
Vadym
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker