On Thu, Sep 2, 2010 at 4:05 PM, Lars Ellenberg <lars.ellenb...@linbit.com> wrote: > On Thu, Sep 02, 2010 at 11:00:12AM +0200, Bernd Schubert wrote: >> On Thursday, September 02, 2010, Andrew Beekhof wrote: >> > On Wed, Sep 1, 2010 at 11:59 AM, Bernd Schubert >> > > My proposal is to rip out all network code out of pingd and to add >> > > slightly modified files from 'iputils'. >> > >> > Close, but thats not portable. >> > Instead use ocf:pacemaker:ping which goes a step further and ditches >> > the daemon piece altogether. >> >> Hmm, we are already using that for now temporarily. But I don't think the >> ping >> RA is suitable for larger clusters. The ping script RA runs everything >> serially and only in intervals when called by lrmd. Now lets assume we have a >> 20 node cluster. >> >> nodes = 20 >> timeout = 2 >> attempts = 2 >> >> Makes 80s for a single run with default already rather small timeouts, which >> is IMHO a bit large. And with a shell script I don't see a way to improve >> that. While we could send the pings in parallel, I have no idea how to lock >> the variable of active nodes (active=`expr $active + 1`). I don't think that >> the simple sh or even bash have a semaphore or mutex lock. So IMHO, we need a >> language that supports that, rewriting the pingd RA is one choice, rewriting >> the ping RA into python is another. > > how about an fping RA ? > active=$(fping -a -i 5 -t 250 -B1 -r1 $host_list 2>/dev/null | wc -l) > > terminates in about 3 seconds for a hostlist of 100 (on the LAN, 29 of > which are alive).
Happy to add if someone writes it :-) > >> So in fact my first proposal also only was the first step - first add better >> network code and then to make it multi-threaded - each ping host gets its own >> thread. > > A working pingd daemon has the additional advantage that it can ask its > peers for their ping node count, before actually updating the attribute, > which should help with the "dampen race". That happens at the attrd level in both cases. pingd adds nothing here. > >> Another reason why I don't like the shell RA too much is that shell takes a >> considerable amount of CPU time. For a subset of systems where we need ping >> as >> replacement for quorum policy (*) CPU time is precious. >> >> Thanks, >> Bernd >> >> PS: (*) As you insist ;) on quorum with n/2 + 1 nodes, we use ping as >> replacement. We simply cannot fulfill n/2 + 1, as controller failure takes >> down 50% of the systems (virtual machines) and the systems (VMs) of the 2nd >> controller are then supposed to take over failed services. I see that n/2 + 1 >> is optimal and also required for a few nodes. But if you have a larger set of >> system (e.g. minimum 6 with the VM systems I have in my mind) n/2 + 1 is >> sufficient, IMHO. > > You meant to say you consider == n/2 sufficient, instead of > n/2 ? > >> Therefore I asked before to make the quorum policy >> configurable. Now with Lustres multiple-mount-protection and additional stop >> of resources due to ping, I'm willing to set quorum policy to ignore. > > -- > : Lars Ellenberg > : LINBIT | Your Way to High Availability > : DRBD/HA support and consulting http://www.linbit.com > > DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker