On 09/13/2011 10:36 PM, Brad Johnson wrote: > Yes, the suggested approach has the problem when both nodes drop to a > score of zero the resource can not run anywhere. I have gone back to my > original "best connectivity" approach, but now using my own ping RA > which uses different dampening delay on the active vs. standby node. On > the active node when the score is rising, and on the standby node when > the score is falling, a delay of zero is used. The other cases use the > configured delay. This works much better at keeping our resource from > failing over when ping hosts are brought down and back up. But the > problem still happens some of the time. > There are 2 problems I see: > 1) the dampening delay is over-ridden when we receive a flush message > from the other node - instead we immediately send an update with the > current value. > 2) the dampen value should be as large as the product of the OCF RA > attempts * timeout values, since the nodes are asynchronously pinging > and may be off as much as an entire interval. BUT pacemaker seems to not > work properly when the dampen value is larger than the resource interval.
There have been some fixes in Pacemaker 1.0.11 to make this work properly ... dampen value is a multiple of monitor interval Regards, Andreas > > Any suggestions please would be appreciated. > > ...Brad > > On 09/10/2011 11:30 AM, Vadym Chepkov wrote: >> On Sep 8, 2011, at 3:40 PM, Florian Haas wrote: >> >>>>> On 09/08/11 20:59, Brad Johnson wrote: >>>>>> We have a 2 node cluster with a single resource. The resource must >>>>>> run >>>>>> on only a single node at one time. Using the pacemaker:ocf:ping RA we >>>>>> are pinging a WAN gateway and a LAN host on each node so the resource >>>>>> runs on the node with the greatest connectivity. The problem is >>>>>> when a >>>>>> ping host goes down (so both nodes lose connectivity to it), the >>>>>> resource moves to the other node due to timing differences in how >>>>>> fast >>>>>> they update the score attribute. The dampening value has no effect, >>>>>> since it delays both nodes by the same amount. These unnecessary >>>>>> fail-overs aren't acceptable since they are disruptive to the network >>>>>> for no reason. >>>>>> Is there a way to dampen the ping update by different amounts on the >>>>>> active and passive nodes? Or some other way to configure the >>>>>> cluster to >>>>>> try to keep the resource where it is during these tie score >>>>>> scenarios? >>> location pingd-constraint group_1 \ >>> rule $id="pingd-constraint-rule" pingd: defined pingd >>> >>> May I suggest that you simply change this constraint to >>> >>> location pingd-constraint group_1 \ >>> rule $id="pingd-constraint-rule" \ >>> -inf: not_defined pingd or pingd lte 0 >>> >>> That way, only a host that definitely has _no_ connectivity carries a >>> -INF score for that resource group. And I believe that is what you >>> really want, rather than take the actual ping score as a placement >>> weight (your "best connectivity" approach). >>> >>> Just my 2 cents, though. >>> >> Even though this approach was recommended many times, there is a >> problem with it. >> What if all nodes for some reason are not able to ping ? >> This rule would cause a resource to be brought down completely, >> whereas if you use "best connectivity" approach it will stay up where >> it was before network failed. >> >> Vadym >> >> >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker