On Fri, Sep 23, 2011 at 9:53 PM, Brad Johnson <bjohn...@ecessa.com> wrote: > Yes, but the patch only affects the pingd attribute.
Use of the name 'pingd' isnt mandatory though. > And we do not want the > other node to be able to challenge us to an immediate score comparison. That > is the whole idea behind the fping OCF resource agent we are using, to give > the timing advantage to the node currently running the resource by delaying > rising scores on the idle, and falling scores on the active node. Why not just set dampen=0? > > On 09/22/2011 09:10 PM, Andrew Beekhof wrote: >> >> On Tue, Sep 20, 2011 at 10:34 PM, Brad Johnson<bjohn...@ecessa.com> >> wrote: >>> >>> It is not necessarily the case that the outside world can't reach the >>> cluster. Ours is a multi-homed device connecting to multiple WANs and >>> LANs. >>> We want the device with the best connectivity to be the active device. To >>> get around the problem of failovers occurring when a ping node reboots >>> for >>> example, I have written an fping OCF RA that uses different dampening >>> delays >>> based on if it is running on the active or idle device. I have also >>> patched >>> pacemaker attrd.c to fix it so it doesn't send an immediate update when >>> it >>> receives a flush message from the other node. This was causing it to >>> ignore >>> any running delay timer. >> >> Thats the point of the flush message though. So that all nodes write >> their current value at the same time. >> >>> Here is that patch: >>> >>> --- tools/attrd.orig.c 2011-09-13 08:29:46.946820348 -0500 >>> +++ tools/attrd.c 2011-09-14 13:33:59.606894754 -0500 >>> @@ -348,10 +348,14 @@ >>> attrd_local_callback(xml); >>> >>> } else if(ignore == NULL || safe_str_neq(from, attrd_uname)) { >>> + const char *attr = crm_element_value(xml, F_ATTRD_ATTRIBUTE); >>> + /* Don't send update for score if msg is from other node */ >>> + if(safe_str_eq(from, attrd_uname) || safe_str_neq(attr, >>> "pingd")) { >>> crm_info("%s message from %s", op, from); >>> hash_entry = find_hash_entry(xml); >>> stop_attrd_timer(hash_entry); >>> attrd_perform_update(hash_entry); >>> + } >>> } >>> free_xml(xml); >>> } >>> >>> >>> On 09/19/2011 10:51 PM, Andrew Beekhof wrote: >>>> >>>> On Sun, Sep 11, 2011 at 2:30 AM, Vadym Chepkov<vchep...@gmail.com> >>>> wrote: >>>>> >>>>> On Sep 8, 2011, at 3:40 PM, Florian Haas wrote: >>>>> >>>>>>>> On 09/08/11 20:59, Brad Johnson wrote: >>>>>>>>> >>>>>>>>> We have a 2 node cluster with a single resource. The resource must >>>>>>>>> run >>>>>>>>> on only a single node at one time. Using the pacemaker:ocf:ping RA >>>>>>>>> we >>>>>>>>> are pinging a WAN gateway and a LAN host on each node so the >>>>>>>>> resource >>>>>>>>> runs on the node with the greatest connectivity. The problem is >>>>>>>>> when >>>>>>>>> a >>>>>>>>> ping host goes down (so both nodes lose connectivity to it), the >>>>>>>>> resource moves to the other node due to timing differences in how >>>>>>>>> fast >>>>>>>>> they update the score attribute. The dampening value has no effect, >>>>>>>>> since it delays both nodes by the same amount. These unnecessary >>>>>>>>> fail-overs aren't acceptable since they are disruptive to the >>>>>>>>> network >>>>>>>>> for no reason. >>>>>>>>> Is there a way to dampen the ping update by different amounts on >>>>>>>>> the >>>>>>>>> active and passive nodes? Or some other way to configure the >>>>>>>>> cluster >>>>>>>>> to >>>>>>>>> try to keep the resource where it is during these tie score >>>>>>>>> scenarios? >>>>>> >>>>>> location pingd-constraint group_1 \ >>>>>> rule $id="pingd-constraint-rule" pingd: defined pingd >>>>>> >>>>>> May I suggest that you simply change this constraint to >>>>>> >>>>>> location pingd-constraint group_1 \ >>>>>> rule $id="pingd-constraint-rule" \ >>>>>> -inf: not_defined pingd or pingd lte 0 >>>>>> >>>>>> That way, only a host that definitely has _no_ connectivity carries a >>>>>> -INF score for that resource group. And I believe that is what you >>>>>> really want, rather than take the actual ping score as a placement >>>>>> weight (your "best connectivity" approach). >>>>>> >>>>>> Just my 2 cents, though. >>>>>> >>>>> Even though this approach was recommended many times, there is a >>>>> problem >>>>> with it. >>>>> What if all nodes for some reason are not able to ping ? >>>>> This rule would cause a resource to be brought down completely, whereas >>>>> if you use "best connectivity" approach it will stay up where it was >>>>> before >>>>> network failed. >>>> >>>> If the outside[1] world can't reach the cluster, is there much benefit >>>> in having it running? >>>> >>>> [1] Substitute "outside" for wherever your users are, hopefully you >>>> picked a ping node from the same area. >>>> >>>>> Vadym >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>> >>>>> Project Home: http://www.clusterlabs.org >>>>> Getting started: >>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>> Bugs: >>>>> >>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >>>>> >>>> _______________________________________________ >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: >>>> >>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: >>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >>> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker