Re: [Linux-ha-dev] interface monitoring
The Linux-HA project has been obsolete since around 2007. You should switch to the Pacemaker project. It's capable of doing what you want - and it's not dead ;-) On Wed, Aug 1, 2018, at 9:02 AM, Chanandler Bong wrote: > Hello, > i have 2 clusters. The master cluster gives up resources when the > given ping ip address is unreachable, but what i want to do is release > resources when a specific network interface is down. How can i do it? > Thank you. > ___ > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ -- Alan Robertson al...@unix.sh ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Looking for a few good queries ;-)
Hi, If anyone would like to try their hand at it, there are a few Cypher queries I'd like to have written... They relate to dependencies... 1. Compute the set of all services which depend directly or indirectly upon the given starting service 2. Same as #1 - but with a server as a starting place 3. Compute the set of all services upon which the given starting service depends (inverse of #1) 4. Same as #3 - but with a server as a starting place And it would be nice if there we could have another 4 queries but which delivered servers as output instead of services :-D These are likely to be pretty complex Cypher queries... One could even imagine versions of these 4 (or 8) queries where there was an input that said follow no more than "n" levels of indirection - or you could just write them that way in the first place... By the way, I suspect the outputs of these queries probably ought to be paths... At least that's what comes to my mind... The reasons why I think this is valuable is for the GUI. This is exactly what we need for displaying a group of related servers/services in the GUI... -- Alan Robertson al...@unix.sh ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Encryption progress...
Hi all, I've drawn a few diagrams to represent my current ideas about how to manage encryption. I'll be posting them later this week along with text explaining how I see this working for another round of criticism. I have a talk to give tomorrow, and I'll likely include them in the slide deck. It's a technical talk on distributed computing. Seems appropriate ;-). A short summary: Every nanoprobe has its own keypair We will use Trust On First Use (TOFU) for nanoprobes CMA public keys will be distributed with the software We are able to deal with having more than one CMA public key, making it easier to eventually deal with compromised CMA keys The low-level code to support this is written, in the repository and it works(!). None of the high-level policy stuff is there yet. The code that works is basically a ping testing program that deliberately loses packets to encourage the protocol to recover from lost packets. This code turned out to be simpler than I thought it would be. That's a rarity, for sure! -- Alan Robertson al...@unix.sh ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] RFC: pidfile handling; current worst case: stop failure and node level fencing
On 10/24/2014 03:32 AM, Lars Marowsky-Bree wrote: > On 2014-10-23T20:36:38, Lars Ellenberg wrote: > >> If we want to require presence of start-stop-daemon, >> we could make all this somebody elses problem. >> I need find some time to browse through the code >> to see if it can be improved further. >> But in any case, using (a tool like) start-stop-daemon consistently >> throughout all RAs would improve the situation already. >> >> Do we want to do that? >> Dejan? David? Anyone? > I'm showing my age, but Linux FailSafe had such a tool as well. ;-) So > that might make sense. > > Though in Linux nowadays, I wonder if one might not directly want to add > container support to the LRM, or directly use systemd. With a container, > all processes that the RA started would be easily tracked. Process groups do that nicely. The LRM (at least used to) put everything in a process group. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] RFC: pidfile handling; current worst case: stop failure and node level fencing
On 10/22/2014 07:11 AM, Tim Small wrote: > On 22/10/14 13:50, Alan Robertson wrote: >> Does anyone know which OSes have either or both of those /proc names? > Once again, can I recommend taking a look at the start-stop-daemon > source (see earlier posting), which does this stuff, and includes checks > for Linux/Hurd/Sun/OpenBSD/FreeBSD/NetBSD/DragonFly, and whilst I've > only ever used it on Linux, at the very least the BSD side seems to be > maintained: > > http://anonscm.debian.org/cgit/dpkg/dpkg.git/tree/utils/start-stop-daemon.c According to how you described it earlier, it didn't seem to solve the problems described in this thread. At best it does pretty much exactly what my previously-implemented solution does. This discussion has been a bit esoteric. Although my method (and also start-stop-daemon) are highly unlikely to err, they can make mistakes in some circumstances. -- Alan Robertson al...@unix.sh ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] RFC: pidfile handling; current worst case: stop failure and node level fencing
On 10/22/2014 07:09 AM, Dejan Muhamedagic wrote: > On Wed, Oct 22, 2014 at 06:50:37AM -0600, Alan Robertson wrote: >> On 10/22/2014 03:33 AM, Dejan Muhamedagic wrote: >>> Hi Alan, >>> >>> On Mon, Oct 20, 2014 at 02:52:13PM -0600, Alan Robertson wrote: >>>> For the Assimilation code I use the full pathname of the binary from >>>> /proc to tell if it's "one of mine". That's not perfect if you're using >>>> an interpreted language. It works quite well for compiled languages. >>> Yes, though not perfect, that may be good enough. I supposed that >>> the probability that the very same program gets the same recycled >>> pid is rather low. (Or is it?) >> From my 'C' code I could touch the lock file to match the timestamp of >> the /proc/pid/stat (or /proc/pid/exe) symlink -- and verify that they >> match. If there is no /proc/pid/stat, then you won't get that extra >> safeguard. But as you suggest, it decreases the probability by orders >> of magnitude even without the >> >> The /proc/pid/exe symlink appears to have the same timestamp as >> /proc/pid/stat > Hmm, not here: > > $ sudo ls -lt /proc/1 > ... > lrwxrwxrwx 1 root root 0 Aug 27 13:51 exe -> /sbin/init > dr-x-- 2 root root 0 Aug 27 13:51 fd > -r--r--r-- 1 root root 0 Aug 27 13:20 cmdline > -r--r--r-- 1 root root 0 Aug 27 13:18 stat > > And the process (init) has been running since July: > > $ ps auxw | grep -w [i]nit > root 1 0.0 0.0 10540 780 ?Ss Jul07 1:03 init [3] > > Interesting. And a little worrisome for these strategies... Here is what I see for timestamps that look to be about the time of system boot: -r 1 root root 0 Oct 21 15:42 environ lrwxrwxrwx 1 root root 0 Oct 21 15:42 root -> / -r--r--r-- 1 root root 0 Oct 21 15:42 limits dr-x-- 2 root root 0 Oct 21 15:42 fd lrwxrwxrwx 1 root root 0 Oct 21 15:42 exe -> /sbin/init -r--r--r-- 1 root root 0 Oct 21 15:42 stat -r--r--r-- 1 root root 0 Oct 21 15:42 cgroup -r--r--r-- 1 root root 0 Oct 21 15:42 cmdline servidor:/proc/1 $ ls -l /var/log/boot.log -rw-r--r-- 1 root root 5746 Oct 21 15:42 /var/log/boot.log servidor:/proc/1 $ ls -ld . dr-xr-xr-x 9 root root 0 Oct 21 15:42 . So, you can open file descriptors (fd), change your environment and cmdline and (soft) limits. You can't change your exe, or root. Cgroup is new, and I suspect you can't change it. I suspect that the directory timestamp (/proc///) won't change either. I wonder if it will change on BSD or Solaris or AIX. /proc info for AIX: http://www-01.ibm.com/support/knowledgecenter/ssw_aix_61/com.ibm.aix.files/proc.htm It doesn't say anything about file timestamps. Solaris info is here: http://docs.oracle.com/cd/E23824_01/html/821-1473/proc-4.html#scrolltoc It also doesn't mention timestamps. FreeBSD is here: http://www.unix.com/man-page/freebsd/5/procfs/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] RFC: pidfile handling; current worst case: stop failure and node level fencing
On 10/22/2014 03:33 AM, Dejan Muhamedagic wrote: > Hi Alan, > > On Mon, Oct 20, 2014 at 02:52:13PM -0600, Alan Robertson wrote: >> For the Assimilation code I use the full pathname of the binary from >> /proc to tell if it's "one of mine". That's not perfect if you're using >> an interpreted language. It works quite well for compiled languages. > Yes, though not perfect, that may be good enough. I supposed that > the probability that the very same program gets the same recycled > pid is rather low. (Or is it?) >From my 'C' code I could touch the lock file to match the timestamp of the /proc/pid/stat (or /proc/pid/exe) symlink -- and verify that they match. If there is no /proc/pid/stat, then you won't get that extra safeguard. But as you suggest, it decreases the probability by orders of magnitude even without the The /proc/pid/exe symlink appears to have the same timestamp as /proc/pid/stat Does anyone know which OSes have either or both of those /proc names? -- AlanRobertson al...@unix.sh ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] RFC: pidfile handling; current worst case: stop failure and node level fencing
On 10/21/2014 2:29 AM, Lars Ellenberg wrote: On Mon, Oct 20, 2014 at 11:21:36PM +0200, Lars Ellenberg wrote: On Mon, Oct 20, 2014 at 03:04:31PM -0600, Alan Robertson wrote: On 10/20/2014 02:52 PM, Alan Robertson wrote: For the Assimilation code I use the full pathname of the binary from /proc to tell if it's "one of mine". That's not perfect if you're using an interpreted language. It works quite well for compiled languages. It works just as well (or as bad) from interpreted languages: readlink /proc/$pid/exe (very old linux has a fsid:inode encoding there, but I digress) But that does solve a different subset of problems, has race conditions in itself, and breaks if you have updated the binary since start of that service (which does happen). Sorry, I lost the original. Alan then wrote: It only breaks if you change the *name* of the binary. Updating the binary contents has no effect. Changing the name of the binary is pretty unusual - or so it seems to me. Did I miss something? And if you do, you should stop with the binary with the old version and start it with the new one. Very few methods are going to deal well with radical changes in the service without stopping it with the old script, updating, and starting with the new script. Well, the "pid starttime" method does... I don't believe I see the race condition. Does not matter. It won't loop, and it's not fooled by pid wraparound. What else are you looking for? [Guess I missed something else here] pid + exe is certainly is better than the pid alone. It may even be "good enough". But it still has shortcomings. /proc/pid/exe is not stable, (changes to "deleted" if the binary is deleted) could be accounted for. /proc/pid/exe links to the interpreter (python, bash, java, whatever) Even if it is a "real" binary, (pid, /proc/pid/exe) is still NOT unique for pid re-use after wrap around: think different instances of mysql or whatever. (yes, it gets increasingly unlikely...) For most cases, a persistent daemon is a compiled language. Of course not all, but all the ones I personally care about ;-) However, (pid, starttime) *is* unique (for the lifetime of the pidfile, as long as that is stored on tmpfs resp. cleared after reboot). (unless you tell me you can eat through pid_max, or at least the currently unused pids, within the granularity of starttime...) So that's why I propose to use (pid, starttime) tuple. If you see problems with (pid, starttime), please speak up. If you have something *better*, please speak up. If you just have something "different", feel free to tell us anyways :-) The contents of the pidfile are specified by the LSB (or at least they were at some time in the past) That's why I use just the pid. The current version specifies that the first line of a pidfile consists of one or more numbers, and any subsequent lines should be ignored. If you go the way you do, I'd suggest other data be put on a separate lines. You might compare what you're doing to http://refspecs.linuxbase.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/iniscrptfunc.html Instead of storing the start time explicitly, you could touch the pid file's creation time to match that of the process ;-) That's harder to do in the shell, unfortunately... -- Alan ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] RFC: pidfile handling; current worst case: stop failure and node level fencing
On 10/20/2014 03:21 PM, Lars Ellenberg wrote: > On Mon, Oct 20, 2014 at 03:04:31PM -0600, Alan Robertson wrote: >> On 10/20/2014 02:52 PM, Alan Robertson wrote: >>> For the Assimilation code I use the full pathname of the binary from >>> /proc to tell if it's "one of mine". That's not perfect if you're using >>> an interpreted language. It works quite well for compiled languages. > It works just as well (or as bad) from interpreted languages: > readlink /proc/$pid/exe > (very old linux has a fsid:inode encoding there, but I digress) > > But that does solve a different subset of problems, > has race conditions in itself, and breaks if you have updated the binary > since start of that service (which does happen). > > It does not fully address what I am talking about. It only breaks if you change the *name* of the binary. Updating the binary contents has no effect. Changing the name of the binary is pretty unusual - or so it seems to me. Did I miss something? And if you do, you should stop with the binary with the old version and start it with the new one. Very few methods are going to deal well with radical changes in the service without stopping it with the old script, updating, and starting with the new script. I don't believe I see the race condition. It won't loop, and it's not fooled by pid wraparound. What else are you looking for? [Guess I missed something else here] ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] RFC: pidfile handling; current worst case: stop failure and node level fencing
For the Assimilation code I use the full pathname of the binary from /proc to tell if it's "one of mine". That's not perfect if you're using an interpreted language. It works quite well for compiled languages. On 10/20/2014 01:17 PM, Lars Ellenberg wrote: > Recent discussions with Dejan made me again more prominently aware of a > few issues we probably all know about, but usually dismis as having not > much relevance in the real-world. > > The facts: > > * a pidfile typically only stores a pid > * a pidfile may "stale", not properly cleaned up >when the pid it references died. > * pids are recycled > >This is more an issue if kernel.pid_max is small >wrt the number of processes created per unit time, >for example on some embeded systems, >or on some very busy systems. > >But it may be an issue on any system, >even a mostly idle one, given "bad luck^W timing", >see below. > > A common idiom in resource agents is to > > kill_that_pid_and_wait_until_dead() > { > local pid=$1 > is_alive $pid || return 0 > kill -TERM $pid > while is_alive $pid ; sleep 1; done > return 0 > } > > The naïve implementation of is_alive() is > is_alive() { kill -0 $1 ; } > > This is the main issue: > --- > > If the last-used-pid is just a bit smaller then $pid, > during the sleep 1, $pid may die, > and the OS may already have created a new process with that exact pid. > > Using above "is_alive", kill_that_pid() will not notice that the > to-be-killed pid has actually terminated while that new process runs. > Which may be a very long time if that is some other long running daemon. > > This may result in stop failure and resulting node level fencing. > > The question is, which better way do we have to detect if some pid died > after we killed it. Or, related, and even better: how to detect if the > process currently running with some pid is in fact still the process > referenced by the pidfile. > > I have two suggestions. > > (I am trying to avoid bashisms in here. > But maybe I overlook some. > Also, the code is typed, not sourced from some working script, > so there may be logic bugs and typos. > My intent should be obvious enough, though.) > > using "cd /proc/$pid; stat ." > - > > # this is most likely linux specific > kill_that_pid_and_wait_until_dead() > { > local pid=$1 > ( > cd /proc/$pid || return 0 > kill -TERM $pid > while stat . ; sleep 1; done > ) > return 0 > } > > Once pid dies, /proc/$pid will become stale (but not completely go away, > because it is our cwd), and stat . will return "No such process". > > Variants: > > using test -ef > -- > > exec 7 kill -TERM $pid > while :; do > exec 8 test /proc/self/fd/7 -ef /proc/self/fd/8 || break > sleep 1 > done > exec 7<&- 8<&- > > using stat -c %Y /proc/$pid > --- > > ctime0=$(stat -c %Y /proc/$pid) > kill -TERM $pid > while ctime=$(stat -c %Y /proc/$pid) && [ $ctime = $ctime0 ] ; do sleep > 1; done > > > Why not use the inode number I hear you say. > Because it is not stable. Sorry. > Don't believe me? Don't want to read kernel source? > Try it yourself: > > sleep 120 & k=$! > stat /proc/$k > echo 3 > /proc/sys/vm/drop_caches > stat /proc/$k > > But that leads me to an other proposal: > store the starttime together with the pid in a pidfile. > > For linux that would be: > > (see proc(5) for /proc/pid/stat field meanings. > note that (comm) may contain both whitespace and ")", > which is the reason for my sed | cut below) > > spawn_create_exclusive_pid_starttime() > { > local pidfile=$1 > shift > local reset > case $- in *C*) reset=":";; *) set -C; reset="set +C";; esac > if ! exec 3>$pidfile ; then > $reset > return 1 > fi > > $reset > setsid sh -c ' > read pid _ < /proc/self/stat > starttime=$(sed -e 's/^.*) //' /proc/$pid/stat | cut -d' ' -f > 20) > >&3 echo $pid $starttime > 3>&- exec "$@" > ' -- "$@" & > return 0 > } > > It does not seem possible to cycle through all available pids > within fractions of time smaller than the granularity of starttime, > so "pid starttime" should be a unique tuple (until the next reboot -- > at least on linux, starttime is measured as strictly monotonic "uptime"). > > > If we have "pid starttime" in the pidfile, > we can: > > get_proc_pid_starttime() > { > proc_pid_starttime=$(sed -e 's/^.*) //' /proc/$pid/stat) || return 1 > proc_pid_starttime=$(echo "$proc_pid_starttime" | cut -d' ' -f 20) > } > > kill_using_pidfile() > { > local pidfile=$1 > local pid starttime proc_pid_starttime > > test -e $pidfile||
[Linux-ha-dev] Prototype REST code
Hi folks, I just added the first bit of prototype REST code to the system. Its purpose it to provide an interface for JavaScript UI code yet to be written ;-). It uses the Flask library to provide the http support. There is currently only one interesting URL supported by the code: doquery/ So, you perform a query by sending an http request to the doquery/ URL. Parameters to the query are supplied in the usual ? format. Here are a few examples of queries: http://localhost:5000/doquery/GetAllQueries retrieve information (metadata) about all valid queries http://localhost:5000/doquery/GetAQuery?queryname=:* get detailed information about a particular query (metadata) http://localhost:5000/doquery/ListDrones lists all known machinesand their status (in detail) http://localhost:5000/doquery/DownDrones lists all known machines which are currently down for any reason http://localhost:5000/doquery/CrashedDrones lists all known machines which are currently down because they crashed Also, there is a hack in the code, where it reads some data from a location that needs to be installed -- but isn't yet. So it's hardwired to a pathname on my machine. That'll get fixed tomorrow. But the cool thing is that it works, and that it spits out what looks to me like good JSON. Lots to be tested, but it's a reasonable start (a minor milestone). The main other thing that comes to mind is to allow a client to subscribe to changes to things. But before that it would be cool to have any kind of a primitive piece of code that used this interface. -- Alan Robertson - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Linux-HA] Announcing release 0.1.0 of the Assimilation Monitoring Project!
I ran across it while doing some research. As you said, it's a different focus - but related. Thanks for the heads-up! On 04/24/2013 04:09 AM, Lars Ellenberg wrote: > On Fri, Apr 19, 2013 at 03:01:51PM -0600, Alan Robertson wrote: >> Hi all, > Hi Alan! > > Good to see progress on this Project. > > Did you know about "NeDi" www.nedi.ch ? > > I put Remo Rickli on Cc; aparently NeDi prefers "forum" over "mailing list". > > Nedi has a few years head start, and a different focus maybe. > But as both projects seem to have a lot in common at least for the > discovery part, you still should be able to find some potential > synergies, or at least productively cooperate. > > Cheers, > Lars > >> This announcement is likely to be of interest to people like you who are >> concerned about availability. >> >> I founded the Linux-HA project in 1998 and led it for nearly 10 years. >> Back in about November 2010, I announced the beginnings of what would >> become the Assimilation Monitoring Project on this mailing list. >> >> The Assimilation Monitoring project [http://assimmon.org >> <http://assimmon.org/>] is a new open source monitoring project with a >> revolutionary architecture. It provides highly scalable [~*/O/*(1)] >> monitoring driven by integrated continuous Stealth Discovery(TM). >> >> This first release is intended as a proof of concept, to demonstrate the >> architecture, get feedback, add early adopters, and grow the community. >> >> The project has basically two thrusts: >> >> * It provides /extremely/ scalable exception monitoring (100K servers >> -- no problem) >> * It discovers all the details of your infrastructure (servers, >> services, dependencies, switches, switch port connections, etc.), >> builds a Neo4j graph database of all the gory details and updates it >> as things change - without setting off network security alarms. >> * The two functions are integrated in a way that will permit much >> easier configuration than traditional systems, and support the >> creation of simple audits to see if everything is being monitored. >> >> Release description: >> http://linux-ha.org/source-doc/assimilation/html/_release_descriptions.html >> Technology video:http://bit.ly/OD6bY6 >> TechTarget Interview: http://bit.ly/17M6DK2 >> >> Join the mailing list: >> http://lists.community.tummy.com/cgi-bin/mailman/listinfo/assimilation >> >> >> Join the mailing list, download the code, try it out, and send your >> comments and questions to the list! >> >> >> Thanks and have a great weekend! >> >> >> >> -- >> Alan Robertson - @OSSAlanR >> >> "Openness is the foundation and preservative of friendship... Let me claim >> from you at all times your undisguised opinions." - William Wilberforce >> >> ___ >> Linux-HA mailing list >> linux...@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems -- Alan Robertson - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Announcing release 0.1.0 of the Assimilation Monitoring Project!
Hi all, This announcement is likely to be of interest to people like you who are concerned about availability. I founded the Linux-HA project in 1998 and led it for nearly 10 years. Back in about November 2010, I announced the beginnings of what would become the Assimilation Monitoring Project on this mailing list. The Assimilation Monitoring project [http://assimmon.org <http://assimmon.org/>] is a new open source monitoring project with a revolutionary architecture. It provides highly scalable [~*/O/*(1)] monitoring driven by integrated continuous Stealth Discovery(TM). This first release is intended as a proof of concept, to demonstrate the architecture, get feedback, add early adopters, and grow the community. The project has basically two thrusts: * It provides /extremely/ scalable exception monitoring (100K servers -- no problem) * It discovers all the details of your infrastructure (servers, services, dependencies, switches, switch port connections, etc.), builds a Neo4j graph database of all the gory details and updates it as things change - without setting off network security alarms. * The two functions are integrated in a way that will permit much easier configuration than traditional systems, and support the creation of simple audits to see if everything is being monitored. Release description: http://linux-ha.org/source-doc/assimilation/html/_release_descriptions.html Technology video:http://bit.ly/OD6bY6 TechTarget Interview: http://bit.ly/17M6DK2 Join the mailing list: http://lists.community.tummy.com/cgi-bin/mailman/listinfo/assimilation Join the mailing list, download the code, try it out, and send your comments and questions to the list! Thanks and have a great weekend! -- Alan Robertson - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent
On 09/12/2012 09:11 AM, Lars Marowsky-Bree wrote: > On 2012-09-12T09:01:05, Alan Robertson wrote: > >>> The status from these agents may feed into operations on other >>> resources that are fully managed. >> Understood. >> >> I believe it will care about those other agents - not these. It >> shouldn't know about these, AFAIK. > I guess then you're talking about a different effort from what > Dejan, Yan, and I are investigating. (Since we need that status so that > Pacemaker can restart the VM, if needed, for example.) > > (Our goal is also to reuse existing probes from other monitoring > frameworks, not rewrite them.) Well... Most monitors use software from somewhere else, but I didn't know about your effort - so no, I wasn't talking about that effort - although there is some similarity. What I've heard from other folks using the other monitoring frameworks, is that one of the biggest issues with Nagios for example is that the monitoring agents aren't very reliable. In spite of that, I've certainly given some thought to writing a Nagios plugin for the LRM for my purposes. -- Alan Robertson - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent
On 09/12/2012 05:14 AM, Lars Marowsky-Bree wrote: > On 2012-09-11T15:04:55, Alan Robertson wrote: > >>> Depends. Pacemaker may still care about the status of these agents. >> If it can't start or stop them, what can it do with them? > The status from these agents may feed into operations on other > resources that are fully managed. Understood. I believe it will care about those other agents - not these. It shouldn't know about these, AFAIK. The fact that the other agents might call these is an implementation detail - not something it should care about directly. Just as the resource agents should only rely on things that the OCF RA spec says are provided, consumers of those agents (like pacemaker) shouldn't go past the spec in terms of expectations from or observations of resource agents beyond the spec. Or at least that's how it seems to me. It's still my intent to have the exit codes, argument passing, etc. be fully compliant with the OCF RA specification. The only exception I plan on is no start or stop (or reload, etc) actions. They will implement the meta-data and monitor and validate-all actions. I'm not sure whether validate-all makes sense for them or not(?). I'll think about that... -- Alan Robertson - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent
On 09/08/2012 02:53 PM, Lars Marowsky-Bree wrote: > On 2012-09-07T13:46:27, Alan Robertson wrote: > >> Well, I presume that one would not tell pacemaker about such agents, as >> they would not be useful to pacemaker. From the point of view of the >> crm command, you wouldn't consider them as "valid" resource agents to >> put in a configuration for pacemaker. > Depends. Pacemaker may still care about the status of these agents. If it can't start or stop them, what can it do with them? And presuming it can't do anything with them, then it doesn't make sense to include them in a configuration. Am I missing something here? -- Alan Robertson - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent
On 09/05/2012 03:32 AM, Dejan Muhamedagic wrote: > > This would be for my new monitoring project, of course > ;-). But it could then be called by all the HTTP resource agents - or > used directly - for example by the Assimilation project. > > This would be a slight but useful bending of OCF resource agent APIs. > We could create some new metadata to document it, and also not put start > and stop into the actions in the operations section. Or just the latter. > > What do you think? > Right now, there's a bunch of resource agents faking the state > (e.g. ping), that is pretending to be able to start and stop. > If we could somehow do without it, that would obviously be > beneficial. Not sure if/how the pacemaker could deal with such > agents. Well, I presume that one would not tell pacemaker about such agents, as they would not be useful to pacemaker. From the point of view of the crm command, you wouldn't consider them as "valid" resource agents to put in a configuration for pacemaker. People would instead use the nginx or apache agents that _do_ know how to start and stop things. -- Alan Robertson - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent
Hi Dejan, If the resource agent is not running correctly it needs to be restarted. My memory says that OCF_ERR_GENERIC will not cause that behavior. I believe the spec says you should exit with not running if it is not functioning correctly. (but I didn't check it, and my memory isn't that clear in this case). I will likely write a monitor-only resource agent for web servers. What would you think about calling it from the other web resource agents? This resource agent will not look at any config files, and will require everything explicitly in parameters, and will not know how to start or stop anything. This would be for my new monitoring project, of course ;-). But it could then be called by all the HTTP resource agents - or used directly - for example by the Assimilation project. This would be a slight but useful bending of OCF resource agent APIs. We could create some new metadata to document it, and also not put start and stop into the actions in the operations section. Or just the latter. What do you think? On 08/29/2012 05:31 AM, Dejan Muhamedagic wrote: > Hi Alan, > > On Mon, Aug 27, 2012 at 10:51:15AM -0600, Alan Robertson wrote: >> Hi, >> >> I was recently using the Apache resource agent, and discovered a few >> problems: >> >> The exit code from grep was used directly as an OCF exit code. >> It is NOT an OCF exit code, and should not be directly used >> in this way. > I guess you mean the greps in monitor_apache_extended and > monitor_apache_basic? These lines: > > 267 $whattorun "$test_url" | grep -Ei "$test_regex" > /dev/null > 277 ${ourhttpclient}_func "$STATUSURL" | grep -Ei "$TESTREGEX" > /dev/null > >> This caused a "not running" error to become a generic error. > These lines are invoked _only_ in case it was previously > established that the apache server is running. So, they should > return OCF_ERR_GENERIC if the test fails. grep exits with code 1 > which matches OCF_ERR_GENERIC. But indeed the OCF error code > should be returned explicitely. > >> Pacemaker reacts very differently to the two kinds of errors. >> >> This code occurred in two places. >> >> The resource agent used OCF_CHECK_LEVEL improperly. >> >> The specification says that if you receive an OCF_CHECK_LEVEL which you >> do not support, you are required to interpret it as the next lower >> supported value for OCF_CHECK_LEVEL. >> >> In effect, there are no invalid OCF_CHECK_LEVEL values. The Apache >> agent declared all values but one to be errors. This is not the correct >> behavior. > OK. That somehow slipped while I had been reading the OCF standard. > > BTW, it'd be great if nginx shared some code with apache. The > latter has already been split into three scripts. > > Cheers, > > Dejan > >> -- >> Alan Robertson - @OSSAlanR >> >> "Openness is the foundation and preservative of friendship... Let me claim >> from you at all times your undisguised opinions." - William Wilberforce >> ___ >> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev >> Home Page: http://linux-ha.org/ > ___ > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ -- Alan Robertson - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Issues found in Apache resource agent
Hi, I was recently using the Apache resource agent, and discovered a few problems: The exit code from grep was used directly as an OCF exit code. It is NOT an OCF exit code, and should not be directly used in this way. This caused a "not running" error to become a generic error. Pacemaker reacts very differently to the two kinds of errors. This code occurred in two places. The resource agent used OCF_CHECK_LEVEL improperly. The specification says that if you receive an OCF_CHECK_LEVEL which you do not support, you are required to interpret it as the next lower supported value for OCF_CHECK_LEVEL. In effect, there are no invalid OCF_CHECK_LEVEL values. The Apache agent declared all values but one to be errors. This is not the correct behavior. -- Alan Robertson - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] LRM bug
On 08/07/2012 08:18 AM, Dejan Muhamedagic wrote: > Hi Alan, > > On Mon, Jul 30, 2012 at 10:14:27AM -0600, Alan Robertson wrote: >> The LRM treats operation timeouts as ERROR:s - not just failed >> operations that give warnings. This violates the meaning of ERROR: >> messages in the code. >> >> We reserved ERROR: messages for things that the software did not expect >> - and therefore possibly could not be properly recovered from. In this >> case, the behavior is perfectly expected and the condition will be >> properly recovered from. It just means the operation in question failed. >> >> An sample message: >> ERROR: process_lrm_event: LRM operation agent-da:3_monitor_5000 >> (47) Timed Out (timeout=6ms) >> >> Because of this one message, you can't tell customers "If you ever have >> an ERROR: message, the HA software has failed". >> >> This ought to just be a warning, like any other failed action... > I guess that ERROR is used because resource agents use the same > severity when reporting failures they cannot recover from. In > this case, the RA won't log anything, so the lrmd does that on > its behalf. That seems OK to me. The other option would be to > remove the ERROR severity log messages in all RA, because a > resource problem should normally always be recoverable. The exceptions that print ERROR: should be relegated to things like "The CRM gave me a command I didn't understand, or referenced a resource that I don't know about" -- and similar things that really shouldn't happen. Or that's how it seems to me anyway... -- Alan Robertson - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Probable "sub-optimal" behavior in ucast
On 07/31/2012 06:08 AM, Alan Robertson wrote: > > I wasn't sure, actually - because of the troubles mentioned above. I'll > check back in and let you know... Only two of the read processes are accumulating any CPU - it's the last one on each interface. You hit it spot on. Thanks Lars! -- Alan Robertson - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Probable "sub-optimal" behavior in ucast
On 07/31/2012 01:58 AM, Lars Ellenberg wrote: > Besides that a ten node cluster is likely to break the 64k message size > limit, even after compression... The CIB is about 20K before compression... So I think we're not in as bad a shape as I would have guessed. > > You probably should re-organize the code so that you only have > one receiving ucast socket per nic/ip/port. That would be a big change or so it seems to me. Right now, the parent code doesn't look at the parameters given to its children... > > But I think that a single UDP packet will be delivered to > a single socket, even though you have 18 receiving sockets > bound to the same port (possible because of SO_REUSEPORT, only). I was having various troubles with the system and wasn't sure debugging was actually taking effect. But your explanation may be the right one. I will get some more time on one of the systems in the next few days and verify that. > If we, as I think we do, receive on just one of them, where which one is > determined by the kernel, not us, your suggested ingress filter on > "expected" source IP would break communications. Good point. > > Do you have evidence for the assumption that you receive incoming > packets on all sockets, and not on just one of them? I wasn't sure, actually - because of the troubles mentioned above. I'll check back in and let you know... I saw the IPC (!) having troubles on one of the systems - and the CIB was trying to send packets that were getting lost - and eventually the CIB lost its connection to Heartbeat. I could not imagine what could cause that - so this was my theory. We had a resource that we were trying to restart but because of some disk problem it wouldn't actually restart. About this time on a different machine (the DC) we saw this IPC issue. If you have an idea what could cause IPC to behave this way I'd be happy to know what it was... -- Alan Robertson al...@unix.sh ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Probable "sub-optimal" behavior in ucast
Hi, I have a 10-node system with each system having 2 interfaces, and therefore each ha.cf file has 18 ucast lines in it. If I read the code correctly, I think each heartbeat packet is then being received 18 times and sent to the master control process - where each is then uncompressed and 17 of them are thrown away... Could someone else offer your thoughts on this? It looks to be a 2 or 3 line fix in ucast.c to thrown away ucast packets that aren't from the address we expect - which would cut us down to only one of them being sent from each of the interfaces - a 9 to 1 reduction in work on the master control process. And I don't have to uncompress them to throw them away - I can just look at the source IP address... What do you think? -- Alan Robertson - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] LRM bug
The LRM treats operation timeouts as ERROR:s - not just failed operations that give warnings. This violates the meaning of ERROR: messages in the code. We reserved ERROR: messages for things that the software did not expect - and therefore possibly could not be properly recovered from. In this case, the behavior is perfectly expected and the condition will be properly recovered from. It just means the operation in question failed. An sample message: ERROR: process_lrm_event: LRM operation agent-da:3_monitor_5000 (47) Timed Out (timeout=6ms) Because of this one message, you can't tell customers "If you ever have an ERROR: message, the HA software has failed". This ought to just be a warning, like any other failed action... -- Alan Robertson - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] STONITH agent for SoftLayer API
Red Hat invented their own API then disabled the working API in their version of the code. Of course, they don't have as many agents, and they're not as well tested and the API is a bit odd. But what do I know, since I invented the Linux-HA API. On 6/7/2012 3:50 PM, Brad Jones wrote: > Can someone help me understand the correct spec to build a STONITH > plugin against? Through a bunch of trial and error today, I've > discovered that there may be a few different methods of passing > configuration options to a plugin, or perhaps this differs > significantly across distributions and I'm not getting the nuiance. > > The plugin in question is at > https://github.com/bradjones1/stonith-softlayer/blob/master/softlayer.php > - you'll notice I now have options to pull from STDIN, command-line > and environment variables (in a global, since this is php.) > > I originally built this according to > https://fedorahosted.org/cluster/wiki/FenceAgentAPI but this seems to > be very different from the plugins I am finding at > http://hg.linux-ha.org/glue/file/c69dc6ace936/lib/plugins/stonith/external, > for instance. > > FWIW I am using cluster-glue 1.0.8. > > I'd be happy to help write some documentation once I figure out > exactly why this is/was so confusing. Thanks... > -- > Brad Jones > b...@jones.name > Mobile: 303-219-0795 > ___ > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] USENIX Configuration Management Summit
Hi, For those of you interested in managing servers beyond high-availability (which is likely many of you), you might be interested in next week's USENIX Configuration Management Summit. It will be held in Boston, and will feature a series of interesting talks, including one by me on the Assimilation Monitoring Project - http://assimmon.org/. https://www.usenix.org/conference/ucms12 Look forward to seeing you there! -- Alan Robertson - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] The role of dependencies in managing computer systems
Hi, I recently wrote a blog post on the importance of dependencies in managing computer systems. Of course, a small number of dependencies are modelled very nicely by things like Pacemaker, but the picture is much bigger than those dependencies. Read more here: http://techthoughts.typepad.com/managing_computers/2012/06/dependency-information-in-computer-systems.html -- Alan Robertson - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [RFC] IPaddr2: Proposal patch to support the dual stack of IPv4 and IPv6.
On 06/04/2012 12:32 AM, Keisuke MORI wrote: > Hi Alan, > > Thank you for your comments. > It's an interesting idea, but I don't think we need to care about IPv4 > link-local addresses > because users can configure using the same manner as a "regular" IP address. > (and it's used very rarely) > > In the case of IPv6 link-local addresses it is almost always a wrong > configuration if nic is missing > (the socket API mandate it) so we want to check it. > >> However, for addresses which are not yet up (which is unfortunately what >> you're concerned with), ipv6 link-local addresses take the form >>fe80:: -- followed by 64-bits of MAC addresses (48 bit >> MACs are padded out) >> >> http://en.wikipedia.org/wiki/Link-local_address >> >> MAC addresses never begin with 4 bytes of zeros, so the regular expression >> to match this is pretty straightforward. This isn't a bad approximation >> (but could easily be made better): > Yes, you are right. Matching to 'fe80::' should be pretty easy and good > enough. > Why I could not think of such a simple idea :) I'm delighted to have been of service. I'm best at simple things ;-). -- Alan Robertson - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [RFC] IPaddr2: Proposal patch to support the dual stack of IPv4 and IPv6.
It's straightforward to determine if an IP address is link-local or not - for an already configured address. 3: eth1: mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 94:db:c9:3f:7c:20 brd ff:ff:ff:ff:ff:ff inet 10.10.10.30/24 brd 10.10.10.255 *scope global* eth1 inet6 fe80::96db:c9ff:fe3f:7c20/64 *scope link * valid_lft forever preferred_lft forever This works uniformly for both ipv4 and ipv6 addresses (quite nice!) However, for addresses which are not yet up (which is unfortunately what you're concerned with), ipv6 link-local addresses take the form fe80:: -- followed by 64-bits of MAC addresses (48 bit MACs are padded out) http://en.wikipedia.org/wiki/Link-local_address MAC addresses never begin with 4 bytes of zeros, so the regular expression to match this is pretty straightforward. This isn't a bad approximation (but could easily be made better): islinklocal() { if echo $1 | grep -i '^fe80::[^:]*:[^:]*:[^:]*:[^:]*$' >/dev/null then echo "$1 is link-local" else echo "$1 is NOT link-local" fi } On 05/31/2012 12:29 AM, Keisuke MORI wrote: > I would like to propose an enhancement of IPaddr2 to support IPv6 as > well as IPv4. > > I've submitted this as a pull request #97 but also posting to the ML > for a wider audience. > > I would appreciate your comments and suggestions for merging this into > the upstream. > > > [RFC] IPaddr2: Proposal patch to support the dual stack of IPv4 and IPv6. > https://github.com/ClusterLabs/resource-agents/pull/97 > > > ## Benefits: > > * Unify the usage, behavior and the code maintenance between IPv4 and > IPv6 on Linux. > > The usage of IPaddr2 and IPv6addr are similar but they have > different parameters and different behaviors. > In particular, they may choose a different interface depending > on your configuration even if you provided similar parameters > in the past. > > IPv6addr is written in C and rather hard to make improvements. > As /bin/ip already supports both IPv4 and IPv6, we can share > the most of the code of IPaddr2 written in bash. > > * usable for LVS on IPv6. > > IPv6addr does not support lvs_support=true and unfortunately > there is no possible way to use LVS on IPv6 right now. > > IPaddr2(/bin/ip) works for LVS configurations without > enabling lvs_support both for IPv4 and IPv6. > > (You don't have to remove an address on the loopback interface > if the virtual address is assigned by using /bin/ip.) > > See also: > http://www.gossamer-threads.com/lists/linuxha/dev/76429#76429 > > * retire the old 'findif' binary. > > 'findif' binary is replaced by a shell script version of > findif, originally developed by lge. > See findif could be rewritten in shell : > https://github.com/ClusterLabs/resource-agents/issues/53 > > * easier support for other pending issues > > These pending issues can be fix based on this new IPaddr2. > * Allow ipv6addr to mark new address as deprecated > https://github.com/ClusterLabs/resource-agents/issues/68 > * New RA that controls IPv6 address in loopback interface > https://github.com/ClusterLabs/resource-agents/pull/77 > > > ## Notes / Changes: > > * findif semantics changes > > There are some incompatibility in deciding which interface to > be used when your configuration is ambiguous. But in reality > it should not be a problem as long as it's configured properly. > > The changes mostly came from fixing a bug in the findif binary > (returns a wrong broadcast) or merging the difference between > (old)IPaddr2 and IPv6addr. > See the ofct test cases for details. > (case No.6, No.9, No.10, No.12, No.15 in IPaddr2v4 test cases) > > Other notable changes are described below. > > * "broadcast" parameter for IPv4 > > "broadcast" parameter may be required along with "cidr_netmask" > when you want use a different subnet mask from the static IP address. > It's because doing such calculation is difficult in the shell > script version of findif. > > See the ofct test cases for details. > (case No.11, No.14, No.16, No.17 in IPaddr2v4 test cases) > > This limitation may be eliminated if we would remove > brd options from the /bin/ip command line. > > * loopback(lo) now requires cidr_netmask or broadcast. > > See the ofct test case in the IPaddr2 ocft script. > The reason is similar to the previous one. > > * loose error check for "nic" for a IPv6 link-local address. > > IPv6addr was able to check this, but in the shell script it is > hard to determine a link-local address (requires bitmask calculation). > I do not think it's worth to implement it in shell. > > * send_ua: a new binary > > We need one new binary as a replacement of send_arp for IPv6 support. > IPv6addr.c is reused to make this command. > > > Note that IPv6addr RA is still there and you can continue to use > it for the backward compatibility. > > > ## Acknowledgement
Re: [Linux-ha-dev] [Patch] The patch which revises memory leak.
7 @@ >>>> >>>>node->track.last_rexmit_req = time_longclock(); >>>> >>>> -if (!g_hash_table_remove(rexmit_hash_table, ri)){ >>>> -cl_log(LOG_ERR, "%s: entry not found in rexmit_hash_table" >>>> - "for seq/node(%ld %s)", >>>> - __FUNCTION__, ri->seq, ri->node->nodename); >>>> -return FALSE; >>>> +value = g_hash_table_lookup(rexmit_hash_table, ri); >>>> +if ( value != NULL) { >>>> +sourceid = (unsigned long) value; >>>> +Gmain_timeout_remove(sourceid); >>>> + >>>> +if (!g_hash_table_remove(rexmit_hash_table, ri)){ >>>> +cl_log(LOG_ERR, "%s: entry not found in rexmit_hash_table" >>>> + "for seq/node(%ld %s)", >>>> + __FUNCTION__, ri->seq, ri->node->nodename); >>>> +return FALSE; >>>> +} >>>>} >>>> >>>>schedule_rexmit_request(node, seq, max_rexmit_delay); >>> >>> -- >>> : Lars Ellenberg >>> : LINBIT | Your Way to High Availability >>> : DRBD/HA support and consulting http://www.linbit.com >>> >>> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. >>> ___ >>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev >>> Home Page: http://linux-ha.org/ >>> >> ___ >> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev >> Home Page: http://linux-ha.org/ >> > ___ > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ -- Alan Robertson - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch] The patch which revises memory leak.
;> (start) >>>> 31928 ?SLs0:00 0 182 53989 7128 0.0 heartbeat: master >>>> control process >>>> (One hour later) >>>> 31928 ?SLs0:02 0 182 54481 7620 0.0 heartbeat: master >>>> control process >>>> (Two hour later) >>>> 31928 ?SLs0:08 0 182 55353 8492 0.0 heartbeat: master >>>> control process >>>> (Four hours later) >>>> 31928 ?SLs0:23 0 182 56689 9828 0.0 heartbeat: master >>>> control process >>>> >>>> >>>> The state of the memory leak seems to vary according to a node with the >>>> quantity of the retransmission. >>>> >>>> The increase of this memory disappears by applying my patch. >>>> >>>> And the similar correspondence seems to be necessary in >>>> send_reqnodes_msg(), but this is like little leak. >>>> >>>> Best Regards, >>>> Hideo Yamauchi. >>>> >>>> >>>> --- On Sat, 2012/4/28, >>>> renayama19661...@ybb.ne.jp wrote: >>>> >>>>> Hi Lars, >>>>> >>>>> Thank you for comments. >>>>> >>>>>> Have you actually been able to measure that memory leak you observed, >>>>>> and you can confirm this patch will fix it? >>>>>> >>>>>> Because I don't think this patch has any effect. >>>>> Yes. >>>>> I really measured leak. >>>>> I can show a result next week. >>>>> #Japan is a holiday until Tuesday. >>>>> >>>>>> send_rexmit_request() is only used as paramter to >>>>>> Gmain_timeout_add_full, and it returns FALSE always, >>>>>> which should cause the respective sourceid to be auto-removed. >>>>> It seems to be necessary to release gsource somehow or other. >>>>> The similar liberation seems to be carried out in lrmd. >>>>> >>>>> Best Regards, >>>>> Hideo Yamauchi. >>>>> >>>>> >>>>> --- On Fri, 2012/4/27, Lars Ellenberg wrote: >>>>> >>>>>> On Thu, Apr 26, 2012 at 10:56:30AM +0900, renayama19661...@ybb.ne.jp >>>>>> wrote: >>>>>>> Hi All, >>>>>>> >>>>>>> We gave test that assumed remote cluster environment. >>>>>>> And we tested packet lost. >>>>>>> >>>>>>> The retransmission timer of Heartbeat causes memory leak. >>>>>>> >>>>>>> I donate a patch. >>>>>>> Please confirm the contents of the patch. >>>>>>> And please reflect a patch in a repository of Heartbeat. >>>>>> Have you actually been able to measure that memory leak you observed, >>>>>> and you can confirm this patch will fix it? >>>>>> >>>>>> Because I don't think this patch has any effect. >>>>>> >>>>>> send_rexmit_request() is only used as paramter to >>>>>> Gmain_timeout_add_full, and it returns FALSE always, >>>>>> which should cause the respective sourceid to be auto-removed. >>>>>> >>>>>> >>>>>>> diff -r 106ca984041b heartbeat/hb_rexmit.c >>>>>>> --- a/heartbeat/hb_rexmit.cThu Apr 26 19:28:26 2012 +0900 >>>>>>> +++ b/heartbeat/hb_rexmit.c Thu Apr 26 19:31:44 2012 +0900 >>>>>>> @@ -164,6 +164,8 @@ >>>>>>>seqno_t seq = (seqno_t) ri->seq; >>>>>>>struct node_info* node = ri->node; >>>>>>>struct ha_msg*hmsg; >>>>>>> +unsigned long sourceid; >>>>>>> +gpointer value; >>>>>>> >>>>>>>if (STRNCMP_CONST(node->status, UPSTATUS) != 0&& >>>>>>>STRNCMP_CONST(node->status, ACTIVESTATUS) !=0) { >>>>>>> @@ -196,11 +198,17 @@ >>>>>>> >>>>>>>node->track.last_rexmit_req = time_longclock(); >>>>>>> >>>>>>> -if (!g_hash_table_remove(rexmit_hash_table, ri)){ >>>>>>> -cl_log(LOG_ERR, "%s: entry not found in rexmit_hash_table" >>>>>>> - "for seq/node(%ld %s)", >>>>>>> - __FUNCTION__, ri->seq, ri->node->nodename); >>>>>>> -return FALSE; >>>>>>> +value = g_hash_table_lookup(rexmit_hash_table, ri); >>>>>>> +if ( value != NULL) { >>>>>>> +sourceid = (unsigned long) value; >>>>>>> +Gmain_timeout_remove(sourceid); >>>>>>> + >>>>>>> +if (!g_hash_table_remove(rexmit_hash_table, ri)){ >>>>>>> +cl_log(LOG_ERR, "%s: entry not found in rexmit_hash_table" >>>>>>> + "for seq/node(%ld %s)", >>>>>>> + __FUNCTION__, ri->seq, ri->node->nodename); >>>>>>> +return FALSE; >>>>>>> +} >>>>>>>} >>>>>>> >>>>>>>schedule_rexmit_request(node, seq, max_rexmit_delay); >> -- >> : Lars Ellenberg >> : LINBIT | Your Way to High Availability >> : DRBD/HA support and consulting http://www.linbit.com >> >> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. >> ___ >> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev >> Home Page: http://linux-ha.org/ >> > ___ > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ -- Alan Robertson - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Core Dump When Sending to Other Node That's Resetting
Have you tried running this under valgrind? On 04/13/2012 05:22 PM, Nguyen Dinh Phong wrote: Hi, I wrote a wrapper using hbclient api for an application that manages the redundancy of our system. The application uses the wrapper to send/receive messages (string) between the primary and secondary. In our testing of reset and switch over, once in a while, there is core dump in the send with double free in libc, that I do not know if caused by my wrapper of hbclient api. /lib/libc.so.6[0xf7d71629] /lib/libc.so.6(cfree+0x59)[0xf7d719e9] /usr/lib/libplumb.so.2[0xf7e88dcf] /usr/lib/libplumb.so.2[0xf7e9a03e] /usr/lib/libplumb.so.2[0xf7e9a1a4] /usr/lib/libplumb.so.2[0xf7e9922f] /usr/lib/libplumb.so.2(msg2ipcchan+0xb8)[0xf7e891ea] /usr/lib/libhbclient.so.1[0xf7e6a736] /usr/lib/libha_lib.so(hb_send+0x204)[0xf7e61e15] ---> my wrapper I use send_ordered_nodemsg() to send and readmsg() to read (based on api_test.c). However in sample codes of ipfail or drbd, I saw the setting up of IPChannel and usage of msg2ipcchan(). Which is more appropriate? I'd also like to know if I should add more codes to handle node status change because the crashes always occur when the other node go reset. Snippet of my codes: 1. Initialization: if (mhm_hb->llc_ops->signon(mhm_hb, "ping")!= HA_OK) { // I pasted the common "ping", // plan to change to different name cl_log(LOG_ERR, "Cannot sign on with heartbeat"); ... 2. Send: int hb_send(ll_cluster_t *hb, char *dest, void *buf, size_t sz) { HA_Message *msg; if (hb==NULL) return HA_FAIL; msg = ha_msg_new(0); if (ha_msg_add(msg, F_TYPE, T_MHM_MSG) != HA_OK) { cl_log(LOG_ERR, "hb_send: cannot add field TYPE\n"); ZAPMSG(msg); return HA_FAIL; } if (ha_msg_add(msg, F_ORIG, node_name) != HA_OK) { cl_log(LOG_ERR, "hb_send: cannot add field ORIG\n"); ZAPMSG(msg); return HA_FAIL; } char *payload = malloc(sz+1); if (payload==NULL) { ZAPMSG(msg); return HA_FAIL; } memset(payload, 0, sz+1);// Add a Null byte at the end memcpy(payload, buf, sz); if (ha_msg_add(msg, F_MHM_PAYLOAD, payload) != HA_OK) { cl_log(LOG_ERR, "hb_send: cannot add field PAYLOAD\n"); ZAPMSG(msg); return HA_FAIL; } if (hb->llc_ops->send_ordered_nodemsg(hb, msg, peer_name) != HA_OK) { ZAPMSG(msg); return HA_FAIL; } else { ZAPMSG(msg); return sz; } } 3. Receive: int hb_recv(ll_cluster_t *hb, void *buf, size_t sz) { int msgcount=0; HA_Message *reply; if (hb==NULL) return HA_FAIL; memset(buf, 0, sz); for(; (reply=hb->llc_ops->readmsg(hb, 1)) != NULL;) { > Blocking receiving const char *type; const char *orig; const char *payload; ++msgcount; if ((type = ha_msg_value(reply, F_TYPE)) == NULL) { type = "?"; } if ((orig = ha_msg_value(reply, F_ORIG)) == NULL) { orig = "?"; } cl_log(LOG_DEBUG, "Got message %d of type [%s] from [%s]" ,msgcount, type, orig); if (strcmp(type, T_MHM_MSG) == 0) { payload = ha_msg_value(reply, F_MHM_PAYLOAD); int p_sz = strlen(payload); cl_log(LOG_DEBUG, "payload %s sz %d p_sz %d\n", payload, sz, p_sz); if (p_sz <= sz) { char *tmp = (char*) buf; strncpy(tmp, payload, p_sz); cl_log(LOG_DEBUG, "return buf %s sz %d ret_val %d", buf, strlen(buf), p_sz); ZAPMSG(reply); return(p_sz); } else { cl_log(LOG_ERR, "Receive buffer %d too small for payload %d", sz, p_sz); ZAPMSG(reply); return HA_FAIL; } } ZAPMSG(reply); ---> Could we delete message that's not meant to our module, or should we let it go? } if (reply==NULL) { cl_log(LOG_ERR, "read_hb_msg returned NULL"); cl_log(LOG_ERR, "REASON: %s", hb->llc_ops->errmsg(hb)); } return 0; } Thanks, Phong _______ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ -- Alan Robertson - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Another problem with the heartbeat init script...
On 03/30/2012 04:58 PM, Lars Ellenberg wrote: > On Mon, Mar 26, 2012 at 02:28:42PM -0600, Alan Robertson wrote: >> Earlier I mentioned the problem with portreserve (which was apparently >> ignored?) > No. > But I cached a cold. Sorry to hear that :-(. Hope you're feeling better now. You have my full sympathies - since I had bronchitis that lasted over two weeks. > And you did not send a patch, did you? Good point. Sorry to be a whiner... I was hoping for a little more conversation in any case. >> Now I have run into another problem. When you set LRM parameters in >> /etc/sysconfig, the code assumes that the LRM will start within 20 >> seconds of starting heartbeat. That is not the case. >> >> lrmd was changed to getenv() it's max-children meanwhile. >> >> You need to cherry pick that patch, or update glue. Good to hear that's changed. I put a patch of my own into my local copy - just extending the loop and making it not print those annoying '.'s or delay startup while waiting. So, I'm good locally - and there's no need for a patch for the future. I guess I need to make a workspace so I can submit patches properly (as you noted above). On the other hand, the good news though is that by upping that limit to 16 and switching from a group to explicit dependencies, the failover time was cut from about 60 seconds to about 18 seconds - so I'm happy. -- Alan Robertson - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Another problem with the heartbeat init script...
Earlier I mentioned the problem with portreserve (which was apparently ignored?) Now I have run into another problem. When you set LRM parameters in /etc/sysconfig, the code assumes that the LRM will start within 20 seconds of starting heartbeat. That is not the case. If you have initdead set to 120 (for example) then it can be 120 seconds before it starts. If you also have autojoin any, then it will _always_ take >= 120 seconds before it starts. Delaying the startup of other services on the system while we wait for the initdead to expire is not a good idea. I suppose I should put together a patch on these items... -- Alan Robertson - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Patch: pgsql streaming replication
On 03/20/2012 03:40 AM, Lars Marowsky-Bree wrote: > On 2012-03-19T16:29:23, "Soffen, Matthew" wrote: > >> I believe that the reason for not using #bash is that it is it NOT part of >> the default install on non Linux systems. > That is what package dependencies are for. Matt's point is simple: Avoiding dependencies is far better than declaring them. There is nothing in bash which cannot be easily done in the standard POSIX shell. We have avoided these things in most of our RAs - and there is no reason to change that. -- Alan Robertson - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Occasionally heartbeat doesn't start...
In this case, it was actually rpcbind that grabbed the port. AFAIK, there is no way to tell it to use or not use a particular port - except to grab it first. That's what portreserve does. If it is run first, and given the right config files, it _will_ keep anyone else from using that port. It makes sense to put port 694 in /etc/portreserve/heartbeat as part of our package and include that invocation. If someone chooses a different port they can always edit that file. Redhat provides portreserve and starts it by default before rpcbind. If other distros don't provide it or use it - no harm comes from installing the file and attempting to run portrelease. But for those that provide it, it is a help. On 03/12/2012 05:43 AM, Lars Ellenberg wrote: > On Fri, Mar 09, 2012 at 11:52:56AM -0700, Alan Robertson wrote: >> Hi, >> >> I've been investigating an HA configuration for a customer. One >> time in testing heartbeat didn't start, because rpcbind had stolen >> its reserved port. Restarting rpcbind made it choose a different >> random port. This is definitely an interesting problem - even if it >> doesn't happen very often. >> >> The best solution to this, AFAIK is to make a file >> /etc/portreserve/heartbeat with this one line in it: >> 694/udp >> >> and then add portrelease heartbeat to the init script. > "rpcbind" used to be "portmap". > > You would need the portreserve daemon available, installed, > and started at the right time during your boot sequence. > So that's only a hackish workaround. > > On Debian (Ubuntu, other derivatives) you'd simply add a line > to /etc/bindresvport.blacklist. But that may fail as well, > there have been reports where this was ignored for some reason. > So that again is just a workaround. > > If you know exactly what will register with portmap (rpcbind), > you can tell those services to request fixed ports instead. > > Typically you do, and those are just a few nfs related services. > So just edit /etc/sysconfig/* or /etc/defaults/* > to e.g. include -o and -p options for rpc.statd, and similar. > > This really is a fix, as long as you know all services > that are started before heartbeat, and can tell them > to use specific ports. > -- Alan Robertson - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Occasionally heartbeat doesn't start...
Hi, I've been investigating an HA configuration for a customer. One time in testing heartbeat didn't start, because rpcbind had stolen its reserved port. Restarting rpcbind made it choose a different random port. This is definitely an interesting problem - even if it doesn't happen very often. The best solution to this, AFAIK is to make a file /etc/portreserve/heartbeat with this one line in it: 694/udp and then add portrelease heartbeat to the init script. Thoughts? -- Alan Robertson - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] In RHEL5 and RHEL6 about different HA_RSCTMP
On 12/07/2011 09:14 PM, nozawat wrote: > Hi Andrew > >> Right, but the location and its deletion date back over 8 years IIRC. > I seem to delete it in the following. > ->http://hg.linux-ha.org/heartbeat-STABLE_3_0/file/7e3a82377fa8/heartbeat/heartbeat.c > -- > 991 if (system("rm -fr " RSC_TMPDIR) != 0) { > 992 cl_log(LOG_INFO, "Removing %s failed, recreating.", > RSC_TMPDIR); > 993 } > -- > > Regards, > Tomo For what it's worth - this directory was added with the sole purpose of being a place to put files that would automatically be deleted when heartbeat started. It has been that way from the first day it was created. Andrew has this right (it was probably actually even longer than 8 years ago). I created it for pseudo-resources - things like MailTo that need to track their state so they can follow normal resource agent expectations. For that purpose, this behavior is right. You certainly don't want a machine to reboot and think that these pseudo-resources are still running. If you don't like that behavior, then put your status files somewhere else... -- Alan Robertson - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Assimilation Monitoring Project on Twitter
Hi, I'm giving a little more detailed status on the Assimilation project's development progress on Twitter - on @OSSAlanR. -- Alan Robertson - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Additional changes made via DHCPD review process
I agree about avoiding the feature to sync config files. My typical recommendation is to use drbdlinks and put it on replicated or shared storage. In fact, I do that at home, and are doing it for a current customer. By the way, Sean has recently revised drbdlinks to support the OCF API. (In fact, it supports all of the OCF, heartbeat-v1 and LSB APIs). http://www.tummy.com/Community/software/drbdlinks/ You can find his source control for it on github: https://github.com/linsomniac/drbdlinks Quoting Florian Haas : > On Tue, Dec 6, 2011 at 4:44 PM, Dejan Muhamedagic wrote: >> Hi, >> >> On Tue, Dec 06, 2011 at 10:59:20AM -0400, Chris Bowlby wrote: >>> Hi Everyone, >>> >>> I would like to thank Florian, Andreas and Dejan for making >>> suggestions and pointing out some additional changed I should make. At >>> this point the following additional changes have been made: >>> >>> - A test case in the validation function for ocf_is_probe has been >>> reversed tp ! ocf_is_probe, and the "test"/"[ ]" wrappers removed to >>> ensure the validation is not occuring if the partition is not mounted or >>> under a probe. >>> - An extraneous return code has been removed from the "else" clause of >>> the probe test, to ensure the rest of the validation can finish. >>> - The call to the DHCPD daemon itself during the start phase has been >>> wrapped with the ocf_run helper function, to ensure that is somewhat >>> standardized. >>> >>> The first two changes corrected the "Failed Action... Not installed" >>> issue on the secondary node, as well as the fail-over itself. I've been >>> able to fail over to secondary and primary nodes multiple times and the >>> service follows the rest of the grouped services. >>> >>> There are a few things I'd like to add to the script, now that the main >>> issues/code changes have been addressed, and they are as follows: >>> >>> - Add a means of copying /etc/dhcpd.conf from node1 to node2...nodeX >>> from within the script. The logic behind this is as follows: >> >> I'd say that this is admin's responsibility. There are tools such >> as csync2 which can deal with that. Doing it from the RA is >> possible, but definitely very error prone and I'd be very >> reluctant to do that. Note that we have many RAs which keep >> additional configuration in a file and none if them tries to keep >> the copies of that configuration in sync itself. > > Seconded. Whatever configuration doesn't live _in_ the CIB proper, is > not Pacemaker's job to replicate. The admin gets to either sync files > manually across the nodes (csync2 greatly simplifies this; no need to > reinvent the wheel), or put the config files on storage that's > available to all cluster nodes. > > Cheers, > Florian > ___ > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ > ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] attrd and repeated changes
On 10/20/2011 03:41 AM, Philipp Marek wrote: > Hello, > > when constantly sending new data via attrd the changes are never used. > > > Example: > while sleep 1 > do attrd_updater -l reboot -d 5 -n rep_chg -U try$SECONDS > cibadmin -Ql | grep rep_chg > done > > This always returns the same value - the one that was given with more than 5 > seconds delay afterwards, so that the dampen interval wasn't broken by the > next change. > > > I've attached two draft patches; one for allowing the _first_ value in a > dampen interval to be used (effectively ignoring changes until this value is > written), and one for using the _last_ value in the dampen interval (by not > changing the dampen timer). [1] > > > *** Note: they are for discussion only! > *** I didn't test them, not even for compilation. > > > Perhaps this "bug" [2] was introduced with one of these changes (the hashes > are the GIT numbers) > > High: crmd: Bug lf#2528 - Introduce a slight delay when >creating a transition to allow attrd time to perform its updates > e7f5da92490844d190609931f434e08c0440da0f > > Low: attrd: Indicate when attrd clients are updating fields > 69b49b93ff6fd25ac91f589d8149f2e71a5114c5 > > > What is the correct way to handle multiple updates within the dampen > interval? Personally, I'd vote for the last value. I agree with you about this being a bug. -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Stonith turns node names to lowercase
On 10/19/2011 04:11 AM, Lars Marowsky-Bree wrote: > On 2011-10-18T12:40:40, Florian Haas wrote: > >>>g_strdown(nodecopy); >>> >>> Is there a reason for this ? >> I suppose Dejan will accept a patch making this configurable. > Please, no. We fence by hostname; hostnames are case insensitive by > definition. Plugins need to handle that. More specifically - this patch was put in to make this work in the real world. In the real world, host names correspond to DNS names (or you will go crazy). DNS names are case-insensitive - and that's how it is. -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] endian oddity in cluster glue
On 09/05/2011 09:02 AM, Pádraig Brady wrote: > I was looking at extricating some logic from cluster-glue-libs > and noticed strangeness wrt the endianess checking. > CONFIG_BIG_ENDIAN is defined on my x86 machine, which is > due to configure.ac referencing a non existent byteorder_test.c > > To fix this I would suggest the following patch. > However, I'm wary that this may introduce compatibility > issues with generated md5 sums which is the only code > that inspects the above defines. If we stick with BIG_ENDIAN > always then there will be interoperability issues between > x86 and ppc hosts for example (which may not be an issue)? > > cheers, > Pádraig. Strange as it seems, there are some mixed PPC and x86 clusters out there... -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Announcing - the Assimilation monitoring system - a sub-project of Linux-HA
On 08/17/2011 09:15 PM, Digimer wrote: > > Linux is fairly described as an ecosystem. Differing branches and > methods of solving a given problem are tried, and the one with the most > backing and merit wins. It's part of what makes open-source what it is. > So, from my point of view, best of luck to both. :) Thanks! And, I share your last point of view as well - I wish all parties good luck! -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Announcing - the Assimilation monitoring system - a sub-project of Linux-HA
On 08/16/2011 05:08 PM, Angus Salkeld wrote: > On Fri, Aug 12, 2011 at 03:11:36PM -0600, Alan Robertson wrote: >> Hi, >> >> Back last November or so, I started work on a new monitoring project - >> for monitoring servers and services. >> >> It's aims are: >> - Scalable virtually without limit - tens of thousands of servers is >> not a problem >> - Easy installation and upkeep - includes integrated discovery of >> servers, services >> and switches - without setting off security alarms ;-) >> >> This project isn't ready for a public release yet (it's in a fairly >> early stage), but it seemed worthwhile to let others know that the >> project exists, and start getting folks to read over the code, and >> perhaps begin to play with it a bit as well. >> >> The project has two arenas of operation: >> nanoprobes - which run in (nearly) every monitored machine > Why not matahari (http://matahariproject.org/)? > >> Collective management - running in a central server (or HA cluster). > Quite simerlar to http://pacemaker-cloud.org/. Seems a > shame not to be working together. > > -Angus This is a set of ideas I've been working on for the last four years or so. My most grandiose vision of it I called a Data Center Operating System. This is about the same time that Amazon announced their first cloud offering (unknown to me). There are a few hints about it a couple of years ago in my blog. I heard a little about Andrew's project when I announced this back in November. Andrew has made it perfectly clear that he doesn't want to work with me (really, absolutely, abundantly, perfectly, crystal clear) and there is evidence that he doesn't work well with others besides me, that's not a possibility. In the short term I'm not specially concerned with clouds - just with any collection of computers which range from 4 up to and above cloud scale. That includes clouds of course - but we'll get a lot more users at the small scale than we well at cloud scale. There are several reasons for this approach: - Existing monitoring software sucks. - Many more collections of computers besides clouds exist and need help - although this would work very well with clouds This problem has dimensions that a cloud environment doesn't have. In a cloud, all deployment is automated, so you can _know_ what is running where. In a more conventional data center, having a way to discover what's in your data center, and what's running on those servers is important. -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Announcing - the Assimilation monitoring system - a sub-project of Linux-HA
Hi, Back last November or so, I started work on a new monitoring project - for monitoring servers and services. It's aims are: - Scalable virtually without limit - tens of thousands of servers is not a problem - Easy installation and upkeep - includes integrated discovery of servers, services and switches - without setting off security alarms ;-) This project isn't ready for a public release yet (it's in a fairly early stage), but it seemed worthwhile to let others know that the project exists, and start getting folks to read over the code, and perhaps begin to play with it a bit as well. The project has two arenas of operation: nanoprobes - which run in (nearly) every monitored machine Collective management - running in a central server (or HA cluster). Current status: - edge switch discovery code is complete - scalable heartbeat code is complete - Lots more to do :-D Current code base is about 7K lines of C. You can find documentation for the project here: http://linux-ha.org/source-doc/assimilation/html/index.html An overview of the architecture is found under the "System Goals and Architecture" tab. If you want to dive into the structure of the code in a sort-of top-down way, you might explore under the "Modules" tab. This documentation is high level (project aims), medium level (class structure and descriptions), low level (APIs) and includes the code as well. It's all done with Doxygen - which worked really well for this. The code itself is available in the Linux-HA mercurial repository - which you can find here: http://hg.linux-ha.org/%7Cexperimental/assimilation/ This code includes all the documentation above - it's just not in quite as pretty or organized a format. My short term goal is to get the server monitoring completely up and usable. Current thinking is that the central Collective management code will be in Python, with the nanoprobes in 'C' (as they currently are). It is expected that service monitoring will make use of the LRM - when we get to that. -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Uniquness OCF Parameters
On 06/17/2011 02:43 AM, Lars Ellenberg wrote: > On Thu, Jun 16, 2011 at 03:52:37PM -0600, Alan Robertson wrote: >> On 06/16/2011 02:51 AM, Lars Ellenberg wrote: >>> On Thu, Jun 16, 2011 at 09:48:20AM +0200, Florian Haas wrote: >>>> On 2011-06-16 09:03, Lars Ellenberg wrote: >>>>> With the current "unique=true/false", you cannot express that. >>>> Thanks. You learn something every day. :) >>> Sorry that I left off the "As you are well aware of," >>> introductionary phrase. ;-) >>> >>> I just summarized the "problem": >>> >>>>> Depending on what we chose the meaning to be, >>>>> parameters marked "unique=true" would be required to >>>>> either be all _independently_ unique, >>>>> or be unique as a tuple. >>> And made a suggestion how to solve it: >>> >>>>> If we want to be able to express both, we need a different markup. >>>>> >>>>> Of course, we can move the markup out of the parameter description, >>>>> into an additional markup, that spells them out, >>>>> like. >>>>> >>>>> But using unique=0 as the current non-unique meaning, then >>>>> unique=, would >>>>> name the scope for this uniqueness requirement, >>>>> where parameters marked with the same such label >>>>> would form a unique tuple. >>>>> Enables us to mark multiple tuples, and individual parameters, >>>>> at the same time. >>> If we really think it _is_ a problem. >> If one wanted to, one could say >> unique=1,3 >> or >> unique=1 >> unique=3 >> >> Then parameters which share the same uniqueness list are part of the >> same uniqueness grouping. Since RAs today normally say unique=1, if one >> excluded the unique group 0 from being unique, then this could be done >> in a completely upwards-compatible way for nearly all resources. > That is what I suggested, yes. > Where unique=0 is basically "not mentioning the unique hint". Originally that's what I thought you said. But somehow read it differently later. Perhaps I got my comment authorship cross-wired. Wouldn't be hard to imagine ;-) -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Uniquness OCF Parameters
On 06/16/2011 02:51 AM, Lars Ellenberg wrote: > On Thu, Jun 16, 2011 at 09:48:20AM +0200, Florian Haas wrote: >> On 2011-06-16 09:03, Lars Ellenberg wrote: >>> With the current "unique=true/false", you cannot express that. >> Thanks. You learn something every day. :) > Sorry that I left off the "As you are well aware of," > introductionary phrase. ;-) > > I just summarized the "problem": > >>> Depending on what we chose the meaning to be, >>> parameters marked "unique=true" would be required to >>>either be all _independently_ unique, >>>or be unique as a tuple. > And made a suggestion how to solve it: > >>> If we want to be able to express both, we need a different markup. >>> >>> Of course, we can move the markup out of the parameter description, >>> into an additional markup, that spells them out, >>> like. >>> >>> But using unique=0 as the current non-unique meaning, then >>> unique=, would >>> name the scope for this uniqueness requirement, >>> where parameters marked with the same such label >>> would form a unique tuple. >>> Enables us to mark multiple tuples, and individual parameters, >>> at the same time. > If we really think it _is_ a problem. If one wanted to, one could say unique=1,3 or unique=1 unique=3 Then parameters which share the same uniqueness list are part of the same uniqueness grouping. Since RAs today normally say unique=1, if one excluded the unique group 0 from being unique, then this could be done in a completely upwards-compatible way for nearly all resources. -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Uniquness OCF Parameters
On 06/14/2011 07:21 AM, Dejan Muhamedagic wrote: > Hi Florian, > > On Tue, Jun 14, 2011 at 02:03:19PM +0200, Florian Haas wrote: >> On 2011-06-14 13:08, Dejan Muhamedagic wrote: >>> Hi Alan, >>> >>> On Mon, Jun 13, 2011 at 10:32:02AM -0600, Alan Robertson wrote: >>>> On 06/13/2011 04:12 AM, Simon Talbot wrote: >>>>> A couple of observations (I am sure there are more) on the uniqueness >>>>> flag for OCF script parameters: >>>>> >>>>> Would it be wise for the for the index parameter of the SFEX ocf script >>>>> to have its unique flag set to 1 so that the crm tool (and others) would >>>>> warn if one inadvertantly tried to create two SFEX resource primitives >>>>> with the same index? >>>>> >>>>> Also, an example of the opposite, the Stonith/IPMI script, has parameters >>>>> such as interface, username and password with their unique flags set to >>>>> 1, causing erroneous warnings if you use the same interface, username or >>>>> password for multiple IPMI stonith primitives, which of course if often >>>>> the case in large clusters? >>>>> >>>> When we designed it, we intended that Unique applies to the complete set >>>> of parameters - not to individual parameters. It's like a multi-part >>>> unique key. It takes all 3 to create a unique instance (for the example >>>> you gave). >>> That makes sense. >> Does it really? Then what would be the point of having some params that >> are unique, and some that are not? Or would the tuple of _all_ >> parameters marked as unique be considered unique? > Consider the example above for sfex. It has a device and index > which together determine which part of the disk the RA should > use. Only the device:index tuple must be unique. Currently, > neither device nor index is a unique parameter (in the > meta-data). Otherwise we'd have false positives for the > following configuration: > > disk1:1 > disk1:2 > disk2:1 > disk2:2 > > Now, stuff such as configfile and pidfile obviously both must be > unique independently of each other. There are probably other > examples of both kinds. Although what you said makes sense and is obviously upwards compatible and useful - I don't think we thought it through to that detail originally. It was a long time ago, and we were trying to think all these kind of things through - and I don't recall that kind of detail in the discussions. I think we did a pretty good job. It's one of the things I think we did right. That's not saying it's perfect - nothing is. But, it's not bad, and I think it has stood the test of time pretty well. We met face to face for these discussions, and not every word we said was archived for posterity ;-). We wrote up the specs afterwards - several weeks later due to things like getting the web site set up and so on. Folks from two proprietary stacks, people from the user community, and folks from the Linux-HA community met to do these things together. About 8-12 people total. I remember most of them. Although we invited Red Hat - they didn't send anyone. It's a testament to the spec that in spite of not participating in the standard, that they implemented it anyway. I was certainly pleased with that. -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Uniquness OCF Parameters
On 06/14/2011 06:03 AM, Florian Haas wrote: > On 2011-06-14 13:08, Dejan Muhamedagic wrote: >> Hi Alan, >> >> On Mon, Jun 13, 2011 at 10:32:02AM -0600, Alan Robertson wrote: >>> On 06/13/2011 04:12 AM, Simon Talbot wrote: >>>> A couple of observations (I am sure there are more) on the uniqueness flag >>>> for OCF script parameters: >>>> >>>> Would it be wise for the for the index parameter of the SFEX ocf script to >>>> have its unique flag set to 1 so that the crm tool (and others) would warn >>>> if one inadvertantly tried to create two SFEX resource primitives with the >>>> same index? >>>> >>>> Also, an example of the opposite, the Stonith/IPMI script, has parameters >>>> such as interface, username and password with their unique flags set to 1, >>>> causing erroneous warnings if you use the same interface, username or >>>> password for multiple IPMI stonith primitives, which of course if often >>>> the case in large clusters? >>>> >>> When we designed it, we intended that Unique applies to the complete set >>> of parameters - not to individual parameters. It's like a multi-part >>> unique key. It takes all 3 to create a unique instance (for the example >>> you gave). >> That makes sense. > Does it really? Then what would be the point of having some params that > are unique, and some that are not? Or would the tuple of _all_ > parameters marked as unique be considered unique? > I don't know what you think I said, but A multi-part key to a database is a tuple which consists of all marked parameters. You just said what I said in a different way. So we agree. -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Uniquness OCF Parameters
On 06/13/2011 04:12 AM, Simon Talbot wrote: > A couple of observations (I am sure there are more) on the uniqueness flag > for OCF script parameters: > > Would it be wise for the for the index parameter of the SFEX ocf script to > have its unique flag set to 1 so that the crm tool (and others) would warn if > one inadvertantly tried to create two SFEX resource primitives with the same > index? > > Also, an example of the opposite, the Stonith/IPMI script, has parameters > such as interface, username and password with their unique flags set to 1, > causing erroneous warnings if you use the same interface, username or > password for multiple IPMI stonith primitives, which of course if often the > case in large clusters? > When we designed it, we intended that Unique applies to the complete set of parameters - not to individual parameters. It's like a multi-part unique key. It takes all 3 to create a unique instance (for the example you gave). -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Problem WARN: Gmain_timeout_dispatch Again
That's probably OK. If you're really having a problem, it should ordinarily show it up before it causes a false failover. Then you can figure out if you want to raise your timeout or figure out what's causing the slow processing. On 05/14/2011 09:08 AM, gilmarli...@agrovale.com.br wrote: Thanks again. deadtime 30 and warntime 15 this good ? > BUT also either make warntime smaller or deadtime larger... > > > On 5/13/2011 7:48 PM, gilmarli...@agrovale.com.br wrote: >> Thank you for your attention. >> His recommendation and wait, if only to continue the logs I get >> following warning if the services do not migrate to another server >> just keep watching the logs warning. >> >> > I typically make deadtime something like 3 times warntime. That way >> > you'll get data before you get into trouble. When your heartbeats >> > exceed warntime, you get information on how late it is. I would >> > typically make deadtime AT LEAST twice the latest time you've ever seen >> > with warntime. >> > >> > If the worst case you ever saw was this 60ms instead of 50ms, I'd look >> > somewhere else for the problem. However, it is possible that you have a >> > hardware trouble, or a kernel bug. Possible, but unlikely. >> > >> > More logs are always good when looking at a problem like this. >> > hb_report will get you lots of logs and so on for the next time it >> happens. >> > >> > On 05/13/2011 11:44 AM, gilmarli...@agrovale.com.br wrote: >> >> Thanks for the help. >> >> >> >> I had a problem the 30 days that began with this post, and after two >> >> days the heartbeat message that the accused had fallen server1 and >> >> services migrated to server2 >> >> Now with this change to eth1 and eth2 for drbd and heartbeat to the >> >> amendment of warntime deadtime 20 to 15 and do not know if this will >> >> happen again. >> >> Thanks >> >> >> >> > That's related to process dispatch time in the kernel. It might >> be the >> >> > case that this expectation is a bit aggressive (mea culpa). >> >> > >> >> > In the mean time, as long as those timings remain close to the >> >> > expectations (60 vs 50ms) I'd ignore them. >> >> > >> >> > Those messages are meant to debug real-time problems - which you >> don't >> >> > appear to be having. >> >> > >> >> > -- Alan Robertson >> >> > al...@unix.sh >> >> > >> >> > >> >> > On 05/12/2011 12:54 PM, gilmarli...@agrovale.com.br wrote: >> >> >> Hello! >> >> >> I'm using heartbeat version 3.0.3-2 on debian squeeze with dedicated >> >> >> gigabit ethernet interface for the heartbeat. >> >> >> But even this generates the following message: >> >> >> WARN: Gmain_timeout_dispatch: Dispatch function for send local >> status >> >> >> took too long to execute: 60 ms (> 50 ms) (GSource: 0x101c350) >> >> >> I'm using eth1 to eth2 and to Synchronize DRBD(eth1) HEARBEAT >> (eth2). >> >> >> I tried increasing the values deadtime = 20 and 15 warntime >> >> >> Interface Gigabit Ethernet controller: Intel Corporation 82575GB >> >> >> Serv.1 and the Ethernet controller: Broadcom Corporation >> NetXtreme II >> >> >> BCM5709 in Serv.2 >> >> >> Tested using two Broadcom for the heartbeat, also without success. >> >> >> >> >> >> Thanks >> >> > >> >> > -- >> >> >> >> >> ___ >> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev >> Home Page: http://linux-ha.org/ > > ___ > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ > ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Problem WARN: Gmain_timeout_dispatch Again
BUT also either make warntime smaller or deadtime larger... On 5/13/2011 7:48 PM, gilmarli...@agrovale.com.br wrote: Thank you for your attention. His recommendation and wait, if only to continue the logs I get following warning if the services do not migrate to another server just keep watching the logs warning. > I typically make deadtime something like 3 times warntime. That way > you'll get data before you get into trouble. When your heartbeats > exceed warntime, you get information on how late it is. I would > typically make deadtime AT LEAST twice the latest time you've ever seen > with warntime. > > If the worst case you ever saw was this 60ms instead of 50ms, I'd look > somewhere else for the problem. However, it is possible that you have a > hardware trouble, or a kernel bug. Possible, but unlikely. > > More logs are always good when looking at a problem like this. > hb_report will get you lots of logs and so on for the next time it happens. > > On 05/13/2011 11:44 AM, gilmarli...@agrovale.com.br wrote: >> Thanks for the help. >> >> I had a problem the 30 days that began with this post, and after two >> days the heartbeat message that the accused had fallen server1 and >> services migrated to server2 >> Now with this change to eth1 and eth2 for drbd and heartbeat to the >> amendment of warntime deadtime 20 to 15 and do not know if this will >> happen again. >> Thanks >> >> > That's related to process dispatch time in the kernel. It might be the >> > case that this expectation is a bit aggressive (mea culpa). >> > >> > In the mean time, as long as those timings remain close to the >> > expectations (60 vs 50ms) I'd ignore them. >> > >> > Those messages are meant to debug real-time problems - which you don't >> > appear to be having. >> > >> > -- Alan Robertson >> > al...@unix.sh >> > >> > >> > On 05/12/2011 12:54 PM, gilmarli...@agrovale.com.br wrote: >> >> Hello! >> >> I'm using heartbeat version 3.0.3-2 on debian squeeze with dedicated >> >> gigabit ethernet interface for the heartbeat. >> >> But even this generates the following message: >> >> WARN: Gmain_timeout_dispatch: Dispatch function for send local status >> >> took too long to execute: 60 ms (> 50 ms) (GSource: 0x101c350) >> >> I'm using eth1 to eth2 and to Synchronize DRBD(eth1) HEARBEAT (eth2). >> >> I tried increasing the values deadtime = 20 and 15 warntime >> >> Interface Gigabit Ethernet controller: Intel Corporation 82575GB >> >> Serv.1 and the Ethernet controller: Broadcom Corporation NetXtreme II >> >> BCM5709 in Serv.2 >> >> Tested using two Broadcom for the heartbeat, also without success. >> >> >> >> Thanks >> > >> > -- >> ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Problem WARN: Gmain_timeout_dispatch Again
I typically make deadtime something like 3 times warntime. That way you'll get data before you get into trouble. When your heartbeats exceed warntime, you get information on how late it is. I would typically make deadtime AT LEAST twice the latest time you've ever seen with warntime. If the worst case you ever saw was this 60ms instead of 50ms, I'd look somewhere else for the problem. However, it is possible that you have a hardware trouble, or a kernel bug. Possible, but unlikely. More logs are always good when looking at a problem like this. hb_report will get you lots of logs and so on for the next time it happens. On 05/13/2011 11:44 AM, gilmarli...@agrovale.com.br wrote: Thanks for the help. I had a problem the 30 days that began with this post, and after two days the heartbeat message that the accused had fallen server1 and services migrated to server2 Now with this change to eth1 and eth2 for drbd and heartbeat to the amendment of warntime deadtime 20 to 15 and do not know if this will happen again. Thanks > That's related to process dispatch time in the kernel. It might be the > case that this expectation is a bit aggressive (mea culpa). > > In the mean time, as long as those timings remain close to the > expectations (60 vs 50ms) I'd ignore them. > > Those messages are meant to debug real-time problems - which you don't > appear to be having. > > -- Alan Robertson > al...@unix.sh > > > On 05/12/2011 12:54 PM, gilmarli...@agrovale.com.br wrote: >> Hello! >> I'm using heartbeat version 3.0.3-2 on debian squeeze with dedicated >> gigabit ethernet interface for the heartbeat. >> But even this generates the following message: >> WARN: Gmain_timeout_dispatch: Dispatch function for send local status >> took too long to execute: 60 ms (> 50 ms) (GSource: 0x101c350) >> I'm using eth1 to eth2 and to Synchronize DRBD(eth1) HEARBEAT (eth2). >> I tried increasing the values deadtime = 20 and 15 warntime >> Interface Gigabit Ethernet controller: Intel Corporation 82575GB >> Serv.1 and the Ethernet controller: Broadcom Corporation NetXtreme II >> BCM5709 in Serv.2 >> Tested using two Broadcom for the heartbeat, also without success. >> >> Thanks > > -- ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Problem WARN: Gmain_timeout_dispatch Again
That's related to process dispatch time in the kernel. It might be the case that this expectation is a bit aggressive (mea culpa). In the mean time, as long as those timings remain close to the expectations (60 vs 50ms) I'd ignore them. Those messages are meant to debug real-time problems - which you don't appear to be having. -- Alan Robertson al...@unix.sh On 05/12/2011 12:54 PM, gilmarli...@agrovale.com.br wrote: > Hello! > I'm using heartbeat version 3.0.3-2 on debian squeeze with dedicated > gigabit ethernet interface for the heartbeat. > But even this generates the following message: > WARN: Gmain_timeout_dispatch: Dispatch function for send local status > took too long to execute: 60 ms (> 50 ms) (GSource: 0x101c350) > I'm using eth1 to eth2 and to Synchronize DRBD(eth1) HEARBEAT (eth2). > I tried increasing the values deadtime = 20 and 15 warntime > Interface Gigabit Ethernet controller: Intel Corporation 82575GB > Serv.1 and the Ethernet controller: Broadcom Corporation NetXtreme II > BCM5709 in Serv.2 > Tested using two Broadcom for the heartbeat, also without success. > > Thanks -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] New OCF RA: symlink
On 04/21/2011 01:30 AM, Dominik Klein wrote: >> Am I too paranoid? > I don't think you are. Some non-root pratically being able to remove any > file is certainly a valid concern. > > Thing is: I needed an RA that configured a cronjob. Florian suggested > writing "the" symlink RA instead, that could manage symlink. Apparently > there was an IRC discussion a couple weeks ago that I was not a part of. > > So while the symlink RA could also do what I needed, I tried to write > that instead of the cronjob RA (which will also come since it will cover > some more functions than this one, but that's another story). > > So anyway, maybe those involved in the first discussion can comment on > this, too and share thoughts on how to solve things. Maybe they had > already addressed these situations. Drbdlinks was never converted to an OCF RA, that I recall. It handles cases of needing to restart the logging system when you changed symlnks around - mainly for chroot services. I've used it for many years. You can find the source for it here: http://www.tummy.com/Community/software/drbdlinks/ It's pretty well thought out, and works quite well. I'd certainly look it over before reinventing the wheel. -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Pacemaker] Article on HA in the IBM cloud using Pacemaker and Heartbeat
On 01/28/2011 11:12 AM, Steven Dake wrote: > On 01/28/2011 08:02 AM, Alan Robertson wrote: >> Hi, >> >> I recently co-authored an article on HA in the IBM cloud using Pacemaker >> and Heartbeat. >> >> http://www.ibm.com/developerworks/cloud/library/cl-highavailabilitycloud/ >> >> The cool thing is that the IBM cloud supports virtual IPs. With most of >> the other clouds you have to do DNS failover - which is sub-optimal >> ;-). Of course, they added this after we harangued them ;-) - but still >> it's very nice to have. >> >> It uses Heartbeat rather than Corosync because (for good reason) clouds >> don't support multicast or broadcast. >> > Corosync works in non broadcast/multicast modes. (the transport is > called udpu). Thanks for the correction!It's always better to have ones facts (unlike how I did here). -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Article on HA in the IBM cloud using Pacemaker and Heartbeat
Hi, I recently co-authored an article on HA in the IBM cloud using Pacemaker and Heartbeat. http://www.ibm.com/developerworks/cloud/library/cl-highavailabilitycloud/ The cool thing is that the IBM cloud supports virtual IPs. With most of the other clouds you have to do DNS failover - which is sub-optimal ;-). Of course, they added this after we harangued them ;-) - but still it's very nice to have. It uses Heartbeat rather than Corosync because (for good reason) clouds don't support multicast or broadcast. There will be a follow-up article on setting up DRBD in the cloud as well... Probably a month away or so... -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] nginx resource agent
On 12/14/2010 02:42 AM, Dejan Muhamedagic wrote: > Hi Alan, > > On Mon, Dec 06, 2010 at 10:41:48AM -0700, Alan Robertson wrote: >> Hi, >> >> Attached is a resource agent for the nginx web server/proxy package. >> >> http://en.wikipedia.org/wiki/Nginx >> http://nginx.org/ >> >> I'd like for it to be added to the next release of the resource >> agents package. > It's a pity that we cannot share code with the apache RA. We need > to setup some place for that. > > Pushed to the repository. > >># >># I'm not convinced this is a wonderful idea (AlanR) >># >>for sig in SIGTERM SIGHUP SIGKILL >>do >> if >>pgrep -f "$NGINXD.*$CONFIGFILE">/dev/null >> then >>pkill -$sig -f $NGINXD.*$CONFIGFILE>/dev/null >>ocf_log info "nginxd children were signalled ($sig)" >>sleep 1 >> else >>break >> fi >>done > Can't recall anymore the details, there was a bit of discussion > on the matter a few years ago, but NTT insisted on killing httpd > children. Or do you mind the implementation? Hi Dejan, I know it's been a long time. Sorry about that. If I _hated_ the idea, I would have left it out. It definitely leaves me feeling a bit unsettled. If it causes a problem, it will no doubt eventually show up. It looks like it's just masking a bug in Apache - that is, that giving it a shutdown request doesn't really work... Perhaps I shouldn't have kept it in the nginx code - since it does seem to be a bit specific to some circumstance in Apache... On the other hand, it shouldn't hurt anything either... -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] OCF RA dev guide: final heads up
On 12/06/2010 09:35 AM, Dejan Muhamedagic wrote: > On a different matter: > > Perhaps it would be good to add a section about ocf-tester. Or > would you consider that out of scope? Let me second that request. If you don't know about ocf-tester, then you don't really know much about building OCF RAs (IMHO). -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] nginx resource agent
Let the the list know how it works out. It has some improvements over how the original Apache resource agent works. On 12/06/2010 10:59 AM, Raoul Bhatia [IPAX] wrote: > On 12/06/2010 06:41 PM, Alan Robertson wrote: >> Hi, >> >> Attached is a resource agent for the nginx web server/proxy package. >> >> http://en.wikipedia.org/wiki/Nginx >> http://nginx.org/ >> >> I'd like for it to be added to the next release of the resource agents >> package. > nice - thank you! :) > > cheers, > raoul -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] nginx resource agent
Hi, Attached is a resource agent for the nginx web server/proxy package. http://en.wikipedia.org/wiki/Nginx http://nginx.org/ I'd like for it to be added to the next release of the resource agents package. Thanks! -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce #!/bin/sh # # High-Availability nginx OCF resource agent # # nginx # # Description: starts/stops nginx servers. # # Author: Alan Robertson # Dejan Muhamedagic # This code is based significantly on the apache resource agent # # Support: linux...@lists.linux-ha.org # # License: GNU General Public License (GPL) # # Copyright:(C) 2002-2010 International Business Machines # # # Our parsing of the nginx config files is very rudimentary. # It'll work with lots of different configurations - but not every # possible configuration. # # Patches are being accepted ;-) # # OCF parameters: # OCF_RESKEY_configfile # OCF_RESKEY_nginx # OCF_RESKEY_port # OCF_RESKEY_options # OCF_RESKEY_status10regex # OCF_RESKEY_status10url # OCF_RESKEY_client # OCF_RESKEY_testurl # OCF_RESKEY_test20regex # OCF_RESKEY_test20conffile # OCF_RESKEY_test20name # OCF_RESKEY_external_monitor30_cmd # # # TO DO: # More extensive tests of extended monitor actions # Look at the --with-http_stub_status_module for validating # the configuration? (or is that automatically done?) # Checking could certainly result in better error # messages. # Allow for the fact that the config file and so on might all be # on shared disks - this affects the validate-all option. : ${OCF_FUNCTIONS_DIR=$OCF_ROOT/resource.d/heartbeat} . ${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs HA_VARRUNDIR=${HA_VARRUN} ### # # Configuration options - usually you don't need to change these # ### # NGINXDLIST="/usr/sbin/nginx /usr/local/sbin/nginx" # default options for http clients # NB: We _always_ test a local resource, so it should be # safe to connect from the local interface. WGETOPTS="-O- -q -L --no-proxy --bind-address=127.0.0.1" CURLOPTS="-o - -Ss -L --interface lo" LOCALHOST="http://localhost"; NGINXDOPTS="" # # # End of Configuration options ### CMD=`basename $0` # The config-file-pathname is the pathname to the configuration # file for this web server. Various appropriate defaults are # assumed if no config file is specified. usage() { cat <<-! usage: $0 action action: start start nginx stopstop nginx reload reload the nginx configuration status return the status of web server, running or stopped monitor return TRUE if the web server appears to be working. For this to be supported you must configure mod_status and give it a server-status URL - or configure what URL you wish to be monitored. You have to have installed either curl or wget for this to work. meta-data show meta data message validate-allvalidate the instance parameters ! exit $1 } # # run the http client # curl_func() { cl_opts="$CURLOPTS $test_httpclient_opts" if [ x != "x$test_user" ] then echo "-u $test_user:$test_password" | curl -K - $cl_opts "$1" else curl $cl_opts "$1" fi } wget_func() { auth="" cl_opts="$WGETOPTS $test_httpclient_opts" [ x != "x$test_user" ] && auth="--http-user=$test_user --http-passwd=$test_password" wget $auth $cl_opts "$1" } # # rely on whatever the user provided userdefined() { $test_httpclient $test_httpclient_opts "$1" } # # find a good http client # findhttpclient() { # prefer curl if present... if [ "x$CLIENT" != x ] then echo "$CLIENT" elif which curl >/dev/null 2>&1 then echo "curl" elif which wget >/dev/null 2>&1 then echo "wget" else return 1 fi } gethttpclient() { [ -z "$test_httpclient" ] && test_httpclient=$ourhttpclient case "$test_httpclient" in curl|wget) ech
Re: [Linux-ha-dev] Thinking about a new communications plugin
On 11/27/2010 04:19 PM, Lars Ellenberg wrote: > On Sun, Nov 28, 2010 at 12:03:23AM +0100, Lars Ellenberg wrote: >> But until then, you could probably already have implemented your >> original proposal in the cumulative man hours spent writing and reading >> this thread, and I'm sure I will get used. So please, just go ahead. > tztztz. > Though possibly I get used, too, sometimes, I obviously meant > ..., and I'm sure _it_ will be used. > And I'm going to be one of those that use it, probably... It wasn't that bad to read it all. I hadn't realized the messages had gotten so large. We put in compression exactly to deal with this situation. All that bulky XML is extremely compressible. I didn't write that part of the code, and hadn't noticed that it did all that excessive compression/decompression. But you will note that this only really happens during a cluster transition. Most of the time nothing happens - and nothing but heartbeats go over the network - or has that changed too? On a completely different subject, I'm modernizing my home production cluster. Switching to Ubuntu, replacing motherboard with multi-core CPUs, replacing hard drives, adding striping. I was planning on putting the DRBD metadata in an SSD - but there seems to some incompatibility between the SSD I bought and Linux and/or my motherboard. On the other hand the SSD works nicely with non-Linux disk testing utilities. Sigh... -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Thinking about a new communications plugin
On 11/24/2010 01:41 PM, Lars Ellenberg wrote: On Wed, Nov 24, 2010 at 11:43:05AM -0800, Bob Schatz wrote: Lars, Please take my opinions with a grain of salt. I am just trying to share my experiences. I am not sure if they apply here. I appreciate all of the hard work involved in LinuxHA and Pacemaker! Just to tell you where I am coming from while I count down the minutes before a holiday here in the states... ... My take away from it was the following (at least what I remember): 1. To increase reliability add less features and rewrite areas prone to bugs or absolutely... 2. Patch the existing code as opposed to coming out with more frequent releases well, whether to count "patch level" or "micro release" is not a technical difference, though it may be of huge importance on a "political" level. Unless, of course, you meant feature releases... that may be a different thing. 3. Come up with a couple of recipes on how to do a couple of common system administration tasks like adding a patch, migrating an application regardless of two nodes or more than 3, etc I am not sure how this maps to LinuxHA/Pacemaker. It may be a different market. Or it may not. We'll see. I thought I should share my experiences to see how it maps to what others think. I may be off base. I just pointed out that, adding an other communication plugin to heartbeat is one thing, but if the purpose of that new plugin was to allow more nodes to join the cluster, then we should be aware of the current limitations in the heartbeat messaging layer when used with pacemaker and many nodes. If I limit myself to a small number of nodes, then this plugin to allow re-configuration of unicast peers is not necessary really, anyways. The heartbeat messaging layer currently is not fit for many nodes. Whether corosync is, really, I cannot say. How much is "many"? That depends on several things, but mostly on the resulting size of the cib (if used with pacemaker). Why many? Because "everyone" wants to go "cloud", and (ab)using a cluster manager to manage resources in a cloud seems an obvious thing to (at least) try. Neither of this affects Pacemaker, directly. I'm not going to start any new features in heartbeat, unless someone specifically pays linbit to do so ;-) I was talking about a half-dozen or so nodes. Not to /implement/ a cloud, but to /run in/ a cloud someone else implemented. -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Thinking about a new communications plugin
Hi, I've been thinking about a new unicast communications plugin that would work slightly differently from the current ucast plugin. It would take a filename giving the hostnames or ipv4 or ipv6 unicast addresses that one wants to send heartbeats to. When heartbeat receives a SIGHUP, this plugin would reread this file and reconfigure the hosts to send heartbeats to. This would mean that there would be no reason to have to restart heartbeat just to add or delete a host from the list being sent heartbeats. Some environments (notably clouds) don't allow either broadcasts or multicasts. This would allow those environments to be able to add and delete hosts to the cluster without having to restart heartbeat - as occurs now... [and I'd like to support ipv6 for heartbeats]. Any thoughts about this? Would anyone else like such a plugin? -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] a scalable "membership" and LRM proxy proposal
On 11/18/2010 02:47 AM, Andrew Beekhof wrote: > Its not that I'm against your proposal, I just don't know of enough > resources to build, test and stabilize a new communication protocol. > In that context, an off-the-shelf component that gives us a couple of > magnitudes worth of additional scaling looks pretty attractive - and > should provide some valuable feedback for how to take it to the next > level. Right. I don't think this is as complex as for example the original heartbeat protocol (with error recovery and so on). Time will tell - assuming I have enough time to do anything with it at all ;-). But, I won't be dealing with the overhead my company imposes on official efforts - which will at least triple my effectiveness ;-). -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] a scalable "membership" and LRM proxy proposal
On 11/16/2010 08:23 AM, Alan Robertson wrote: > > One of the cool things about the proposal I made is that that the > overlords incur near-zero ongoing overhead to monitor a very very big > network, and no network congestion. This work to do this monitoring > is spread pretty evenly among all the nodes in the system such that no > node has to keep track of more than a handful of peers (most only have > two peers - it looks like it could be bounded to 4 peers worst case). > Ring-structured heartbeat communication looks like it should work out > very well. > To make this clearer, I put up a diagram showing how the ring communication should work on the blog page - http://techthoughts.typepad.com/managing_computers/2010/10/big-clusters-scalable-membership-proposal.html The image alone is here: http://techthoughts.typepad.com/.a/6a00e54ed61e0788330133f5e75d90970b-pi -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] a scalable "membership" and LRM proxy proposal
Thanks for this information. This is _SOOO_ much better than trying to dig it all out of the web site. On 11/16/2010 03:04 AM, Andrew Beekhof wrote: >> Alan Robertson wrote: >> I was hoping for something a little more lightweight - although I >> clearly understand the benefits of "it already exists" and having some >> credible claims to security as a goal (since nothing is ever "secure"). >> >> I wonder if you really want that kind of very strongly guaranteed >> message delivery > Not always, possibly not ever. > But happily this is configurable, so we'd only ask for those > guarantees if we needed them. That's good to know. Is this an Apache extension, or is this part of the standard? Does Qpid support IPv6? >> - since messages sent to a node that crashes before >> receiving them are delivered after it comes back up. But, of course, >> there's always a way to work around things that don't do what you need >> for them to. Presumably you'd also need to clean up those messages out >> of the queues of all senders if the node is going away permanently - at >> least once you figure that out... Messages to clients seem to better >> match the semantics of RDS. Messages back to "overlords" could use AMQP >> without obvious corresponding issues. >> >> I wonder about latency - particularly when federated - and taking >> garbage collection into account... I see that QPID claims to be >> "extremely fast". It probably is pretty fast for a large and complex >> Java program. > Here are the numbers from their website: > > Red Hat MRG product built on Qpid has shown 760,000msg/sec ingress on > an 8 way box or 6,000,000 msg/sec Is there something missing from this sentence, or am I just dense? I'm guessing that this is intended to imply that it can process 760K msgs/sec per CPU, giving a projected 6M msgs/sec for an 8-way... > Latencies have been recored as low as 180-250us (.18ms-.3ms) for TCP > round trip and 60-80us for RDMA round trip using the C++ broker For latencies, something more like 99th percentile guarantees are a better measure than best case latencies. And, if it uses TCP, then the overhead of holding 10K TCP connections open at once seems a bit high - just to do nothing most of the time... This model is different from the design point for this protocol. I expect that most of the time these connections would sit idle. One of the cool things about the proposal I made is that that the overlords incur near-zero ongoing overhead to monitor a very very big network, and no network congestion. This work to do this monitoring is spread pretty evenly among all the nodes in the system such that no node has to keep track of more than a handful of peers (most only have two peers - it looks like it could be bounded to 4 peers worst case). Ring-structured heartbeat communication looks like it should work out very well. >> Nevertheless, I see the attraction. Not sure it's what I want, but >> since I don't know yet quite what I want - that would be hard to say :-). > Yep, nothing forcing everyone down the same path. Got that. I see advantages to having at least some common APIs/libraries/interfaces/something. Cross-pollination of ideas is good. Sharing code and having alternatives is better - if not too expensive in code, organizational overhead and emotional energy. Thanks for taking time to share ideas and educate me, -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] a scalable "membership" and LRM proxy proposal
Hi, I missed the "through federation" part. Sorry... As a point of comparison - the proposal as described on my blog does not require federation. Probably at least as scalable, and it's very probable that it's lower latency - and it's pretty much dead certain that it's lower traffic on the network. I assume that QMF is the Qpid Management Framework found here? https://cwiki.apache.org/qpid/qpid-management-framework.html I was hoping for something a little more lightweight - although I clearly understand the benefits of "it already exists" and having some credible claims to security as a goal (since nothing is ever "secure"). I wonder if you really want that kind of very strongly guaranteed message delivery - since messages sent to a node that crashes before receiving them are delivered after it comes back up. But, of course, there's always a way to work around things that don't do what you need for them to. Presumably you'd also need to clean up those messages out of the queues of all senders if the node is going away permanently - at least once you figure that out... Messages to clients seem to better match the semantics of RDS. Messages back to "overlords" could use AMQP without obvious corresponding issues. I wonder about latency - particularly when federated - and taking garbage collection into account... I see that QPID claims to be "extremely fast". It probably is pretty fast for a large and complex Java program. Nevertheless, I see the attraction. Not sure it's what I want, but since I don't know yet quite what I want - that would be hard to say :-). -- Alan Robertson al...@unix.sh On 11/11/2010 05:28 AM, Andrew Beekhof wrote: > Some of your thinking mirrors our own. > > What we're moving towards is indeed two tiers of membership. > One being a small but fully meshed set of, to use your terminology, > "Overlords" running a traditional cluster stack. > The other being a much larger set of independent nodes or VMs running > only an lrm-like proxy. > > Members of the second tier have no knowledge of each other's > existence, nor even of the cluster itself. > > The transport layer we plan on using to talk to these nodes is QMF > (which implements AMQP). > QMF has the nice properties of being cross-platform (ie. windows), > standards based and something that already exists. > We also know that it is secure, fast, and scales well through federation. > Happily it also gives us node up/down information "for free". > > As Lars mentioned, a Matahari agent (essentially the lrm with a QMF > interface on top) is intended to act as the proxy. > He also mentioned container resources, but this was a red herring. > Whether the entities running Matahari are also guests being managed by > Pacemaker is irrelevant. They can equally be physical machines or > cloud instances. > > The Matahari and QMF pieces are both generically useful components > with no ties to Pacemaker. > There will still need to be integration done to hook up the node > liveliness and add the ability to send resource commands via the QMF > bus. What form this work takes will depend on which parts of > Pacemaker are being used in the overall architecture. > > On Thu, Nov 4, 2010 at 3:48 PM, Alan Robertson wrote: >> I've been thinking about the idea of very highly scalable membership, and >> also about the LRM proxy function which is currently being performed by >> Pacemaker. Towards this end I wrote up a high-level design (or >> architecture, or design philosophy or something) for such a scalable >> membership/LRM proxy service. The design is not specific to working with >> Pacemaker - it could work with Pacemaker, or a number of other kinds of >> management entities. >> >> The kind of membership outlined here would be (in Pacemaker terms) sort of a >> second-class membership - which has advantages and disadvantages. >> >> The blog post can be found here: >> http://techthoughts.typepad.com/managing_computers/2010/10/big-clusters-scalable-membership-proposal.html >> >> Please feel to comment on it on the blog, or on the mailing list. I've >> reproduced the blog posting below: >> >> Really Big Clusters: A Scalable membership proposal >> >> This blog entry is a bit different than previous entries - I'm proposing >> some enhanced capabilities to go with the LRM and friends from the Linux-HA >> project. I will update this entry on an ongoing basis to match my current >> thinking about this proposal. >> >> This post outlines a proposed server liveness ("membership") design which is >> inten
Re: [Linux-ha-dev] a scalable "membership" and LRM proxy proposal
This sounds like a reasonable approach - since it's very similar to one I'd been advocating for a number of years ;-) Liveness being provided directly to the top tier through QMF interfaces sounds unlikely to be scalable to the degree I'm looking for. If a node receives 10s of thousands of "I'm alive" messages per second, that sounds wasteful at the least... But, I haven't read the specs - so maybe this is all happily taken care of. I'll go read some specs. Thanks for the info! -- Alan Robertson al...@unix.sh On 11/11/2010 05:28 AM, Andrew Beekhof wrote: > Some of your thinking mirrors our own. > > What we're moving towards is indeed two tiers of membership. > One being a small but fully meshed set of, to use your terminology, > "Overlords" running a traditional cluster stack. > The other being a much larger set of independent nodes or VMs running > only an lrm-like proxy. > > Members of the second tier have no knowledge of each other's > existence, nor even of the cluster itself. > > The transport layer we plan on using to talk to these nodes is QMF > (which implements AMQP). > QMF has the nice properties of being cross-platform (ie. windows), > standards based and something that already exists. > We also know that it is secure, fast, and scales well through federation. > Happily it also gives us node up/down information "for free". > > As Lars mentioned, a Matahari agent (essentially the lrm with a QMF > interface on top) is intended to act as the proxy. > He also mentioned container resources, but this was a red herring. > Whether the entities running Matahari are also guests being managed by > Pacemaker is irrelevant. They can equally be physical machines or > cloud instances. > > The Matahari and QMF pieces are both generically useful components > with no ties to Pacemaker. > There will still need to be integration done to hook up the node > liveliness and add the ability to send resource commands via the QMF > bus. What form this work takes will depend on which parts of > Pacemaker are being used in the overall architecture. > > On Thu, Nov 4, 2010 at 3:48 PM, Alan Robertson wrote: >> I've been thinking about the idea of very highly scalable membership, and >> also about the LRM proxy function which is currently being performed by >> Pacemaker. Towards this end I wrote up a high-level design (or >> architecture, or design philosophy or something) for such a scalable >> membership/LRM proxy service. The design is not specific to working with >> Pacemaker - it could work with Pacemaker, or a number of other kinds of >> management entities. >> >> The kind of membership outlined here would be (in Pacemaker terms) sort of a >> second-class membership - which has advantages and disadvantages. >> >> The blog post can be found here: >> http://techthoughts.typepad.com/managing_computers/2010/10/big-clusters-scalable-membership-proposal.html >> >> Please feel to comment on it on the blog, or on the mailing list. I've >> reproduced the blog posting below: >> >> Really Big Clusters: A Scalable membership proposal >> >> This blog entry is a bit different than previous entries - I'm proposing >> some enhanced capabilities to go with the LRM and friends from the Linux-HA >> project. I will update this entry on an ongoing basis to match my current >> thinking about this proposal. >> >> This post outlines a proposed server liveness ("membership") design which is >> intended to scale up to tens of thousands of servers to be managed as an >> entity. >> >> Scalability depends on a lot of factors - processor overhead, network >> bandwidth, and network load. A highly scalable system will take all of >> these factors into account. From the perspective of the server software >> author (like, for example, me), one of the easiest to overlook is network >> load. Network load depends on a number of factors - number of packets, size >> of the packets, how many switches or routers it has to go through, and how >> many endpoints will receive the packet. To best accomplish this task, it is >> desirable that the majority of "normal" traffic be network topology aware. >> To scale up to very large collections of computers, it also necessary that >> as much as possible be monitored as locally as possible. In addition, since >> switching gear is not optimized for multicast packets, and multicast packets >> consume significant resources when compared to unicast packets, it is >> desirable to avoid using multicast packets during normal operation. >>
Re: [Linux-ha-dev] a scalable "membership" and LRM proxy proposal
Quoting Lars Marowsky-Bree : > On 2010-11-04T08:48:58, Alan Robertson wrote: > > This is something that's come up several times in the past ("containers" > of resources), and something that seems to be neatly addressed by the > current work on Matahari. > > http://repos.fedorapeople.org/repos/beekhof/matahari > http://fedorahosted.org/matahari > http://matahari-dev.blogspot.com Thanks for the links! I'll read them. I don't understand my post as being related to containers of resources. The resources aspect of the proposal is the smallest part. So, I'm a little puzzled by your reply. I must not be understanding it correctly. There have been some refinements on my blog not on this post - but the concept hasn't changed. -- Alan Robertson al...@unix.sh ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] a scalable "membership" and LRM proxy proposal
*Potential problems with this design* * Providing switch topology information to the Overlord(s) is a potential difficulty. There are some protocols implemented by some switches which provide information similar to what's required. Further investigation is required to determine how difficult discovery and auto-configuration of network topology will be. The Link Layer Discovery Protocol (LLDP <http://en.wikipedia.org/wiki/Link_Layer_Discovery_Protocol>) may be of help here. * The scope of the problem being addressed is very large. The addition of layers of monitoring will likely introduce difficulties in debugging problems. Infrastructure for determining the origin of anomolous behavior should be provided. * The flexibility in setting up monitoring topologies adds complexity to the system (even though it keeps it out of the LRM proxy). * There is the possibility that the Overlord will receive conflicting reports regarding the liveness of a particular node. This potential lack of consensus will add complexity to the corner cases in the Overlord. *Open Issues* * There are many possible topologies for handling the upper layer Minion workload distribution. It is not yet clear what the set of good choices are, and what the tradeoffs are between those different topologies. * It is /possible/ that it will prove desirable for this same infrastructure to collect statistical information for the Overlords. Conversations with experts in high-performance computing and examination of tools like Ganglia will likely prove helpful in better understanding this problem. *Concluding Thoughts* This proposal only provides liveness information and distributed control for a large collection of managed servers. In many respects, this is perhaps the easiest component to scale up. It is a long way from a complete cloud infrastructure, clustering software, or enterprise-scope system management package. Scaling the Overlord components above this layer to the same standard of size is a very interesting and challenging task. I suspect that this design is sufficiently scalable that it is likely that other components in the system architecture are likely to be the limiting factors in system scaling. -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Linux-HA server(s) down for an hour on 23 March 2010
From noon until 1 PM US Mountain Daylight time (1800-1900 UTC) on 23 March 2010, the servers supporting the Linux-HA web site and Mercurial source control will be down for server migration. This is not expected to take more than an hour. Thanks go the good folks at tummy.com for continuing to provide and maintain servers for Linux-HA! Sorry for the inconvenience, -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Linux-HA Leadership Announcement
Quoting Dejan Muhamedagic <[EMAIL PROTECTED]>: Hi Alan, I definitely hope that you won't disappear completely from the project now either and that your future engagement will lead you again to the linux-ha project and community. I am personally very thankful to you for the time you spent, the patience you had (which was not always easy), and the knowledge you shared while guiding me through the project. I'm still around, I just don't have the time to dedicate it that it needs. I love the project, and care a lot about its customers. As for what the future will bring - no one knows - but I'll try and influence things so I can help with it. -- Alan Robertson [EMAIL PROTECTED] ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Linux-HA Leadership Announcement
After more than 10 years as the Linux-HA project leader, I've decided to create a new leadership structure. One of my original success criteria for the project was that it eventually would not need me. In the last few years, it has seemed more and more likely that we'd reached this plateau of success - and the time has come to put that supposition to the test. Effective today, I am appointing a team of three people to lead and govern the project going forward. These three outstanding people have proved themselves key contributors to the project, and are ready and willing to take over the reins of leadership - and lead the project into the future. These people are: Keisuke MORI <[EMAIL PROTECTED]> Dave Blaschke <[EMAIL PROTECTED]> Lars Marowsky-Bree <[EMAIL PROTECTED]> As for me, my current assignment in IBM doesn't permit me to spend full time on the project, but I will continue to promote and contribute to the project as time permits. Should future circumstances permit it, I expect that I will increase my efforts the project again. Congratulations to Mori-san, Dave and Lars! They're working out their new roles, scheduling releases, and so on. Expect to hear from them soon! -- Alan Robertson [EMAIL PROTECTED] Linux-HA founder, Linux-HA project leader emeritus ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] 10-year anniversary of the Linux-HA project
The 10-year anniversary of the first working code was this week. I announced it 10 years ago yesterday. I should have announced it yesterday, but yesterday was a travel day for me, and it slipped my mind. Here's the announcement of that first code: http://lists.linux-ha.org/pipermail/linux-ha/1998-March/76.html Thanks to all the many people over these ten years who've made this project a success! -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Announcing! Release 2.1.3 of Linux-HA is now available!
oved CCM interaction + Low: CIB: Minor logging improvement + High: crmd: Improvements to shutdown sequence (particularly in response to failures) + CTS: Low: Ignore the correct BadNews message in ResourceRecover + CTS: Low: Tell NearQuorumPointTest to look for Pat:DC_IDLE before declaring success + CTS: Low: Optimize filtering of BadNews + Admin: Low: Bug 1603 - Allow CIB digest files to be verified + RA: apache - make status quieter + [RA] eDir88: include the stop option + OSDL bug 1666: in BSC, make sure temp rsc dir exists for RAs + Contrib: dopd - Fix usage of crm_log_init() by code that shouldn't be using it + Tools: ocf-tester - use the default value for OCF_ROOT if it exists + RA: IPaddr2 - Make the check for the modprobe/iptables utilities conditional on the IP being cloned + CRM: Update crm/cib feature sets and the set of tags/attributes used for feature set detection + crmd: Simplify the detection of active actions and resources at shutdown + PE: Use failcount to handle failed stops and starts + TE: Set failcount to INFINITY for resources that fail to start or stop + CRM: Remove debug code that should not have been committed + PE: Add regression test for previous commit + PE: Regression: Allow M/S resources to be promoted based solely on rsc_location constraints + PE: Fix up the tests now that compare_version() functions correctly (as of cs: 7d69ef94a258) + CRM: Fix compare_version() to actually work correctly on a regular basis + PE: Update testcases to include all_stopped (added in cs: 800c2fec24ee) + crmd: Bug 1655 - crmd can't exit when the PE or TE is killed from underneath it + Tools: Bug 1653 - Misc attrd/attrd_updater cleanups + Tools: Bug 1653 - Further changes to prevent use of NULL when no attribute is specified + CRM: Make logging setup consistent and do not log the command-line to stderr + RA: Delay (v1) - Remove extra characters from call to ra_execocf + Tools: Bug 1653 - attrd crashes when no attribute is specified + OCF: Provide the location of /sbin as used by some agents (HA_SBIN_DIR) + PE: Move the creation of stonith shutdown constraints to native_internal_constraints() + crmd: Only remap monitor operation status to LRM_OP_DONE under the correct conditions + PE: Handle two new actions in text2task + CTS: Give stonith devices penty of time to start + PE: Include description for the remove-after-stop option + PE: Streamline STONITH ordering. Make sure 'all_stopped' depends on all STONITH ops. + PE: Aggregate the startup of fencing resources into a stonith_up pseudo-action + PE: STONITH Shutdown ordering + Bugzilla 1657: Speed up BasicSanityCheck and also make logging inheritance more uniform. + OSDL 1449 / Novell 291749: GUI should not overwrite more specific settings of contained resources. + Remove autoconf and friends on make distclean -- PS: Special thanks to Dejan for making up this change log - it's an annoying and thankless task. -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] heartbeat 2.x on IPv6?
Tomokazu Omura wrote: > Hello, > > Has anyone developed a ping6 plugin for heartbeat 2.x ? > We've tried out a heartbeat cluster on an IPv6 environment where > we were able to make an virtual IP go up but was unable to get the > ping node function running. Guess the same question goes to the > ipfail as well. > > Sorry if this email was sent to everyone twice... No one has done this yet. Our ipv6 capabilities are currently limited to bringing up an IPv6 address. There are a number of things that ought to be done to make everything ipv6 compatible. Thanks for pointing out another one! I'd certainly consider patches for making this work right. Thanks! -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Problem with heartbeat gui
Dejan Muhamedagic wrote: > Hi, > > On Thu, Dec 13, 2007 at 10:27:53AM +0100, Fernando Iglesias wrote: >> Hi all, >> >> I've a little problem with heartbeat gui, i'll try introduce it. I've got a >> two nodes cluster and I should be able to connect using GUI installed in a >> third machine ( admin cluster machine), I set the login parameters (I've >> checked they're right ) and try to connect but I've no response, I only have >> one "Updating data from server" and one "No data available" messages. After >> a few hours I've no changes. >> >> Any guess about this problem? > > Not unless you post logs and version information. Also, there is a 10-minute screencast video on using the GUI which gives a variety of tips on authorization and so on. You might look at it. http://linux-ha.org/Education/Newbie/IPaddrScreencast -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: AW: [Linux-ha-dev] Call for testers: 2.1.3
Spindler Michael wrote: > Hi, > >>>>> This problem has been solved. My packaging box didn't have all >>>>> necessary packages for building GUI rpm. When I added >> them it was >>>>> able to build haclinet (GUI) and that find-lang.sh tool >> worked fine. >>>>> I didn't find the problem with pegasus on my CentOS 5.0 >> but I have >>>>> 32 bit version, and the problem was reported for 64 bit. >>>> >>>> OK. >>>> >>>> So, this step should only be included if --enable-mgmt, I guess? >>>> >>> Right. It establish language settings for the GUI, so it's >> not needed >>> if GUI isn't needed. >> We are trying to build it on RedHat(Red Hat Enterprise Linux >> ES release 4 (Nahant Update 4)), and a problem remains before us. >> Please check Mori-san's patch again. >> http://developerbugs.linux-foundation.org//attachment.cgi?id=1109 >> >> -if test "x${CIMOM}" = "x"; then >> -if test "x${CIMOM}" = "x"; then >> -AC_CHECK_PROG([CIMOM], [cimserver], [pegasus]) >> +if test "x${enable_cim_provider}" = "xyes"; then # >> maybe, here # >> +if test "x${CIMOM}" = "x"; then >> +if test "x${CIMOM}" = "x"; then >> >> I attached the configure.log >> > > fyi: I was able to build the rpms on RedHat AS 4 without any problems. There was two bugs in the configure stuff: 1) It got the package name for pegasus wrong for Red Hat 2) It didn't work if you had pegasus installed but didn't enable the CIM provider. -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] build RPM without pegasus on RHEL4
Junko IKEDA wrote: Hi, I got the latest dev including the fix about RPM, http://hg.linux-ha.org/dev/rev/7e3c4ea27853 but I'm still facing the following error. error: Macro %CMPI_PROVIDER_DIR has empty body error: Failed build dependencies: pegasus is needed by heartbeat-2.1.3-1.x86_64 gmake: *** [rpm] Error 1 redha-release; Red Hat Enterprise Linux ES release 4 (Nahant Update 4) Kernel-release; 2.6.9-42.ELsmp I wonder why RedHat AS 4 is OK? There is something about tog-pegasus, but no just pegasus... # rpm -qa | grep pegasus tog-pegasus-devel-2.5.1-2.EL4 tog-pegasus-2.5.1-2.EL4 OK. I now understand the failed dependency issue, but not yet the empty macro body. You should have some messages in your build output like this: CIM server = "${CIMOM}"]) CIM providers dir= "${CMPI_PROVIDER_DIR}"] CMPI header files= "${CMPI_HEADER_PATH}"] Can you tell me what these messages from your build say? Better yet, I just committed the package name fix to 'dev', why don't you update and send me your whole build output - to my email address? If you want to do an egrep for 'CIM|CMPI' in your build output and post that to the list, that would also be a reasonable thought. But, still please send the complete build output to my email address. -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [mgmt]Rewriting order and colocation configurations
Lars Marowsky-Bree wrote: On 2007-12-09T02:50:52, Yan Gao <[EMAIL PROTECTED]> wrote: Thanks! It's a good tool. By now, haclient doesn't generate a xml file. Ideally, haclient should generate a valid xml, and then transfer to mgmtd. Xinwei and I think that the current protocol is too complicated and has many limitations. We want to simplify the protocol and improve the applicability so that it's convenient to implement full features according to the dtd. Right; the client very likely should directly talk to the CIB daemon. (Which already supports this; we may need a way to apply ACLs in the future though.) You're preaching to the choir ;-) I have extended the metadata to explicitly include enumeration values. I think this would help a lot for the kinds of validations I think you're doing. In my implementation,I've adopted the enumeration values specified in the dtd to be used for the list of combobox options. It is quite possible that a DTD is not powerful enough to adequately describe the syntax and semantics of the CIB. The DTD is, simply put, the oldest and least complex schema format for XML, and happened to be what I knew when I conjured up the original one ;-) XML Schema, Relax NG (or others I know even less about) may be more appropriate standards to describe the CIB as we have it today, and as it evolves further. This may be preferable to needing to duplicate this in home-grown fashion. A lint-like tool is still a good idea, but it should be build on top of this, IMHO. /me redirects this whining into /dev/null Feel free to investigate the best tools, define the DTD-replacement, and get Andrew to adopt it - incorporating it into crm_verify. "Show me the code" is the Linux way after all... If it has python support, likely all I'll have to do to use it is import a few more python classes at the top, and add a half-dozen lines to ciblint to read in the DTD-replacement, and call its validation function. When all is said and done, it won't eliminate more than a few hundred lines of code in ciblint (which is now more than 2K lines and still growing). -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Call for testers: 2.1.3
Alan Robertson wrote: Known Problem in 2.1.3: The STONITHd test seems to fail if fencing is enabled. I suspect this of being a testing quirk rather than a new problem. I'm working it. Now fixed in 'dev' and 'test'. -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [mgmt]Rewriting order and colocation configurations
Yan Gao wrote: On Sat, 2007-12-08 at 07:54 -0700, Alan Robertson wrote: Yan Gao wrote: I'm rewriting the order and colocation configurations of mgmt. Following features will be implemented: 1. Get the crm.dtd file from server end. 2. Dynamically adding gtk widgets for attributes of a type of elment completely according to the dtd definition. 3. The Added widgets are different for CDATA or enum type of attributes to ensure the inputted values will be legal for dtd. 4. Marking out the default values. 5. The widgets for required or optional attributes will be put into different tables. 6. Dynamically generating appropriate description for current setting. Hence all information in dtd can be exploited. In other words, it'll provide full features. I've been thinking of building up a general model for kinds of elements. And I'll try to unify the style of adding objects and viewing objects. The following are some screenshots. Any comments will be appreciated. I didn't look at your screenshots, but the ideas sound wonderful. Sorry, I added the sreenshots in attachments. Also, you might look at using crmlint to validate the CIB you generate. There is also information in ciblint which should definitely be of value to you. I'm kind of focused on the release at the moment, so if you would also CC me directly, if you want to discuss this, that would help me. Thanks! Thanks! It's a good tool. By now, haclient doesn't generate a xml file. In the GUI, orignal comboboxes will be added for the attributes of enum type. It'll just allow users to select a valid value from a list and prevent them to input a invalid value. Any other validation checking hasn't been implemented yet. I have extended the metadata to explicitly include enumeration values. I think this would help a lot for the kinds of validations I think you're doing. -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] stonith plugin through HMC web interface
Xinwei Hu wrote: 2007/12/8, Alan Robertson <[EMAIL PROTECTED]>: Xinwei Hu wrote: Hi all, The ibmhmc stonith plugin doesn't work with the web interface of HMC. The attachment is a workable version of stonith plugin through HMC web interface. It depends on curl and /bin/sh. It'll be great if someone can help to review and include it upstream then. Why don't you want to use the current HMC interface? Cause we don't have the standalone HMC machine. Where does the web server you're contacting run on? -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [mgmt]Rewriting order and colocation configurations
Yan Gao wrote: I'm rewriting the order and colocation configurations of mgmt. Following features will be implemented: 1. Get the crm.dtd file from server end. 2. Dynamically adding gtk widgets for attributes of a type of elment completely according to the dtd definition. 3. The Added widgets are different for CDATA or enum type of attributes to ensure the inputted values will be legal for dtd. 4. Marking out the default values. 5. The widgets for required or optional attributes will be put into different tables. 6. Dynamically generating appropriate description for current setting. Hence all information in dtd can be exploited. In other words, it'll provide full features. I've been thinking of building up a general model for kinds of elements. And I'll try to unify the style of adding objects and viewing objects. The following are some screenshots. Any comments will be appreciated. I didn't look at your screenshots, but the ideas sound wonderful. Also, you might look at using crmlint to validate the CIB you generate. There is also information in ciblint which should definitely be of value to you. I'm kind of focused on the release at the moment, so if you would also CC me directly, if you want to discuss this, that would help me. Thanks! -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Call for testers: 2.1.3
Known Problem in 2.1.3: The STONITHd test seems to fail if fencing is enabled. I suspect this of being a testing quirk rather than a new problem. I'm working it. -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Request for feedback: Checking CIBs for errors...
Hi, One of the problems I've run into both personally and working with customers is finding errors in CIBs. I see this kind of thing a good bit on the mailing list too. So, to help with this and maybe lighten the burden on the mailing list and Andrew and Dejan and Lars, I wrote a command which checks for errors in CIB files - which will be included in 2.1.3 when that comes out. It's called ciblint, and you can read some more about it, and see where to get a copy to try out here: http://linux-ha.org/ciblint It is _not_ a finished product yet, but even as a work-in-progress, it does a number of interesting things. It is intentionally picky - and I _intend_ for it to be picky in the right ways - but of course there are no doubt errors in it. If there is an old way of doing something, and and a new-and-more-correct way, I intend for it to insist on the new way. So, some things it complains about may be perfectly acceptable to the CRM, but are not preferred. I'm pretty sure that some of the things the GUI does will fall into that category. It can also provide you a good bit of information about the legal values to put in various places (-l and -A options). Although I've learned more about the CIB while writing this script, I'm know I still have more to learn. I'm looking for constructive feedback on it. Here are a few specific kinds of feedback that would be especially helpful: - Did it find anything useful for you? - Do you think it's incorrect (not just pedantic) in some cases? - Do you have any suggestions for errors you've made or seen that you think it should catch? - What corrections do you have for any of the explanatory text -- in particular from the -A option? - Any other constructive suggestions would be welcome. - Comments about how stupid I am for having something wrong or what an incredibly stupid idea this is will be cheerfully redirected to /dev/null Sample CIBs for these various kinds of feedback would be especially appreciated. It's a python script, and my current thinking for a todo list is in the text of the script. It currently does a sudo to run lrmadmin to grab some metadata from the LRM. That will eventually change. [lrmadmin shouldn't require you to be root for the things I'm asking it to do] -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-HA] Re: [Linux-ha-dev] ANNOUNCE: Project Organization - CRM to become its own project
Because of the surprise timing of this announcement, right in the last phases of a release, and during time when I'm supposed to be on vacation, I'm postponing discussion on this until at least Monday to give me a chance to get testing back on track. Although I did get some hints that this _might_ happen, I certainly didn't know it was going to, or when - I need a few days to get a better perspective on it. So for now, I ask that we postpone discussion on this matter. -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] stonith plugin through HMC web interface
Xinwei Hu wrote: Hi all, The ibmhmc stonith plugin doesn't work with the web interface of HMC. The attachment is a workable version of stonith plugin through HMC web interface. It depends on curl and /bin/sh. It'll be great if someone can help to review and include it upstream then. Why don't you want to use the current HMC interface? -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Call for testers: 2.1.3
Sebastian Reitenbach wrote: Hi Alan, High-Availability Linux Development List wrote: We are in the final weeks of testing for release 2.1.3 - which has been delayed to the week of Dec 19. Please help us test this upcoming new release! You can get a source tar ball for it from any of these links: http://hg.linux-ha.org/test/archive/tip.tar.bz2 http://hg.linux-ha.org/test/archive/tip.zip http://hg.linux-ha.org/test/archive/tip.tar.gz The latest fixes applied to the 'test' repository can be found here: http://hg.linux-ha.org/test/summary We have already tested this release a good bit, but it needs more testing before release. We do a lot of automated testing, more is always welcome [http://linux-ha.org/CTS]. However, what you can probably help the most with is ad-hoc testing. Don't forget to file bugs on 2.1.3 when you find them. You can find our bugzilla at these URLs: http://developerbugs.linux-foundation.org//query.cgi?product=Linux-HA http://developerbugs.linux-foundation.org//enter_bug.cgi?product=Linux-HA Thanks in advance for your help and cooperation! Maybe I'm a bit late, but are there any objections to get these into 2.1.3, or are they left for next year: http://developerbugs.linux-foundation.org/show_bug.cgi?id=1659 http://developerbugs.linux-foundation.org/show_bug.cgi?id=1761 http://developerbugs.linux-foundation.org/show_bug.cgi?id=1743 http://developerbugs.linux-foundation.org/show_bug.cgi?id=1731 I'm planning on picking those up - I've just been busy. It's not too late for those kind of patches - they're pretty low-risk. -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] ANNOUNCE: Project Organization - CRM to become its own project
Andrew's contributions to the Linux-HA community will be missed. I am sad that he has unilaterally decided to leave Linux-HA and fork his code into in a separate project. I have suspected that this was coming for a number of months, but as you probably have guessed, Andrew won't reply to emails I send him, or answer the phone when I call - except when I hide my caller id. I wish I had known how to fix that without him feeling he needed to leave the project. I'm not sure what this means yet. It may mean that we're in for a time of difficult coordination that I find hard to imagine working - because the need for coordination with a separate project will be higher than if it were in the same project - and communication and coordination was the problem in the first place. D-: Or it may mean that we'll be looking for someone to pick up maintaining the CRM. D-: I really do not know. But if anyone is interested in picking up the CRM, do let me know. In any case, for the convenience of the project, I expect that we'll be mirroring his work from his new project on our Mercurial repository - at least until we get this figured out. I'll let you know when that's set up. Or maybe Andrew leaving and going his own way will work out for the best and things will be better than I could possibly imagine. Anyone who has ideas on how this can be made to happen, please email me. -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Heartbeat - Dev: changeset 11642:54085bc025ce
Andrew Beekhof wrote: http://hg.linux-ha.org/dev/rev/54085bc025ce "This was (at least) caused by a bug in the ssh plugin." uhhh... no. the plugin behaved correctly - it's _supposed_ to report failure when it can't complete the stonith operation. "risk: near-zero - changes were not made to any production code" again, no. you know full well that people use the ssh agents and that the change is incredibly dangerous for those people. if you want to make these cts-specific hacks, please create a new agent called external/cts or perhaps external/broken and do them there. anything else is just irresponsible. What I know full well that I have never wavered in strongly advising against using the ssh plugin in production. The SSH plugin was written specifically for CTS - nothing else. It was written because my machines kept blowing out power supplies, etc from being stonithed with a real power switch in CTS thousands of time. It has always been documented as a test tool ONLY. At one time Lars and I discussed leaving it out of what's shipped in the plugin library but it made life too messy, so we left it in, and documented it as not-for-production. This has been discussed dozens of times over the last 6 or 7 years, and the recommendation every time it's come up has been to never use it in production. Also note, that this is NOT the "ssh" plugin, but the "external/ssh" plugin. The "ssh" plugin is unchanged. The external/ssh plugin was written to exercise the new "external" stonith module, and comes with the same caveat: "Never use it in production". From what your strong reaction to this change, I'm guessing that you might have advised some people to use it in production... I stand by my recommendation that it never be used in production, but given that what seems to be implied about your recommendations, I can make that last set of changes optional based on a parameter to the RA, which we can then supply in CTS. livedangerously=yes No point in having three stonith agents that do the same thing - we already have two. These changes are in changeset 11643:35a4edc666b8, which has now been pushed into 'dev'. -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Call for testers: 2.1.3
Serge Dubrouski wrote: This problem has been solved. My packaging box didn't have all necessary packages for building GUI rpm. When I added them it was able to build haclinet (GUI) and that find-lang.sh tool worked fine. I didn't find the problem with pegasus on my CentOS 5.0 but I have 32 bit version, and the problem was reported for 64 bit. OK. So, this step should only be included if --enable-mgmt, I guess? -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Call for testers: 2.1.3
Serge Dubrouski wrote: It also build on FC6 but not on CentOS. Whiinnne... /me straightens up. Thanks for the info! -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Call for testers: 2.1.3
Dejan Muhamedagic wrote: Hi, On Wed, Dec 05, 2007 at 02:24:14PM -0700, Serge Dubrouski wrote: Hi, Alan - make rpm on CentIOS 5.0 fails with this error message: + /usr/lib/rpm/redhat/find-lang.sh /var/tmp/heartbeat-2.1.3-2-root-root haclient Don't have any redhat/centos, but on SuSE the usage says: $ /usr/lib/rpm/find-lang.sh Usage: /usr/lib/rpm/find-lang.sh TOP_DIR PACKAGE_NAME [prefix] and there is no package named haclient. The specfile says: %find_lang haclient Is that good? Thanks, Dejan No translations found for haclient in /var/tmp/heartbeat-2.1.3-2-root-root error: Bad exit status from /var/tmp/rpm-tmp.93013 (%install) I ran into some issues with that, but thought they were fixed now... Maybe the RH and SUSE versions have to be different? It certainly builds fine for me on FC7 and on SUSE 10.1. -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] errors with 76c25be5c854
Tadashiro Yoshida wrote: Hi, We detected some errors in the ComponentFail test while running CTS with a dev version. It might be a CTS's problem with message handling. Please check it if it happens something wrong. Dev version: 76c25be5c854 # python CTSlab.py -v2 -r -c --facility local7 -L /var/log/ha-log-local7 500 2>&1 | tee cts.log --- Nov 26 19:22:09 x3650a CTS: Running test ComponentFail (x3650b) [16] Nov 26 19:22:10 x3650b heartbeat: [27967]: WARN: Managed /usr/lib64/heartbeat/stonithd process 27980 killed by signal 9 [SIGKILL - Kill, unblockable]. Nov 26 19:22:10 x3650b heartbeat: [27967]: ERROR: Respawning client "/usr/lib64/heartbeat/stonithd": Nov 26 19:22:10 x3650b heartbeat: [27967]: info: Starting child client "/usr/lib64/heartbeat/stonithd"(0,0) Nov 26 19:22:10 x3650b stonithd: [30753]: notice: /usr/lib64/heartbeat/stonithd start up successfully. : Nov 26 19:32:41 x3650a CTS: Patterns not found: ['x3650c crmd:.*LOST:.* x3650b ', 'Updating node state to member for x3650b'] Nov 26 19:32:41 x3650a CTS: Test ComponentFail failed [reason:Didn't find all expected patterns] Nov 26 19:32:41 x3650a CTS: Test ComponentFail (x3650b) [FAILED] --- I think this was a pattern problem in the messages-to-ignore, which I believe is now fixed. http://hg.linux-ha.org/dev/rev/e4a4c6fd5649 Besides, it seams there are some failures in the stonithd testing, although the final message says it was succeeded. --- Nov 26 19:54:11 x3650a CTS: BadNews: Nov 26 19:46:26 x3650b stonithd: [26162]: CRIT: command ssh -q -x -n -l root "x3650c" "echo 'sleep 2; /sbin/reboot -nf' | SHELL=/bin/sh at now >/dev/null 2>&1" failed Nov 26 19:54:11 x3650a CTS: BadNews: Nov 26 19:46:53 x3650b stonithd: [23258]: ERROR: Failed to STONITH the node x3650c: optype=RESET, op_result=TIMEOUT Nov 26 19:54:11 x3650a CTS: BadNews: Nov 26 19:46:53 x3650b tengine: [26116]: ERROR: tengine_stonith_callback: Stonith of x3650c failed (2)... aborting transition. --- I need to change my testing setup to look at this. I'd heard a rumor that this was happening, but it wasn't happening to me, and no bugzilla was filed. But, I'm pretty sure it's an indication of a fault in the stonith ssh module. I changed the code to fail-fast, which is vastly safer when you don't have STONITH available, and not harmful when you have real STONITH. However, if the ssh STONITH module can't connect to the machine it will show a failure like this one. So, I think the thing to do is to figure out how to report success in this case - in the testing STONITH module. -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Call for testers: 2.1.3
We are in the final weeks of testing for release 2.1.3 - which has been delayed to the week of Dec 19. Please help us test this upcoming new release! You can get a source tar ball for it from any of these links: http://hg.linux-ha.org/test/archive/tip.tar.bz2 http://hg.linux-ha.org/test/archive/tip.zip http://hg.linux-ha.org/test/archive/tip.tar.gz The latest fixes applied to the 'test' repository can be found here: http://hg.linux-ha.org/test/summary We have already tested this release a good bit, but it needs more testing before release. We do a lot of automated testing, more is always welcome [http://linux-ha.org/CTS]. However, what you can probably help the most with is ad-hoc testing. Don't forget to file bugs on 2.1.3 when you find them. You can find our bugzilla at these URLs: http://developerbugs.linux-foundation.org//query.cgi?product=Linux-HA http://developerbugs.linux-foundation.org//enter_bug.cgi?product=Linux-HA Thanks in advance for your help and cooperation! -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Re: Heartbeat - Dev: changeset 11628:e4a4c6fd5649
Andrew Beekhof wrote: this commit is wrong - only the children indicated in the process definition are allowed to die please revert this change asap http://hg.linux-ha.org/dev/rev/e4a4c6fd5649 Well... That's not what happens in reality, and as far as I can tell it's expected. When one of your processes dies, it creates a cascading chain of other dying processes which are connected to it via IPC, which die when it dies. As a result, when something important like the CIB dies, virtually any/every one of your processes can die as a result. Which one(s) die before the node suicides depend on the timing. The key causative factors of this are: Your processes don't suicide directly. It appears that file descriptor notification pretty often happens before death-of-child signals So, a process (let's say the CIB) dies, and then one or more of its many local peers (CRM, pengine, attrd, tengine, etc.) discovers that it has disconnected. It in turn dies, and depending on the relative timing of when the log message gets sent out or the suicide occurs, the log messages may be received by the remote logging daemon - or not. What have I missed here? -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Re: [Linux-HA] Recovering from "unexpected bad things" - is STONITH the answer?
Kevin Tomlinson wrote: On Tue, 2007-11-06 at 10:25 -0700, Alan Robertson wrote: We now have the ComponentFail test in CTS. Thanks Lars for getting it going! And, in the process, it's showing up some kinds of problems that we hadn't been looking for before. A couple examples of such problems can be found here: http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1762 http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1732 The question that comes up is this: For problems that should "never" happen like death of one of our core/key processes, is an immediate reboot of the machine the right recovery technique? The advantages of such a choice include: It is fast It will invoke recovery paths that we exercise a lot in testing It is MUCH simpler than trying to recover from all these cases, therefore almost certainly more reliable The disadvantages of such a choice include: It is crude, and very annoying It probably shouldn't be invoked for single-node clusters (?) It could be criticized as being lazy It shouldn't be invoked if there is another simple and correct method Continual rebooting becomes a possibility... We do not have a policy of doing this throughout the project, what we have is a few places where we do it. I propose that we should consider making a uniform policy decision for the project - and specifically decide to use ungraceful reboots as our recovery method for "key" processes dying (for example: CCM, heartbeat, CIB, CRM). It should work for those cases where people don't configure in watchdogs or explicitly define any STONITH devices, and also independently of quorum policies - because AFAIK it seems like the right choice, there's no technical reason not to do so. My inclination is to think that this is a good approach to take for problems that in our best-guess judgment "shouldn't happen". I'm bringing this to both lists, so that we can hear comments both from developers and users. Comments please... I would say the "right thing" would depend on your cluster implementation and what is consider the right thing to do for the applications that the cluster is monitoring. I would propose that this action should be administrator configurable. From a user point of view with the cluster that we are implementing we would expect any cluster failure (internal) to either get itself back and running or just send out an alert "Help me. im not working"... as we would want our applications to continue running on the nodes. ** We dont want a service outage just because the cluster is no longer monitoring our applications. ** We would expect to get a 24x7 call out. Sev1 and then logon to the cluster and see what was happening. (configured alerting) Our applications only want a service outage if the node itself has issues not the Cluster.. Here's the issue: The solution as I see it is to do one of: a) reboot the node and clear the problem with certainty b) continue on and risk damaging your disks. c) write some new code to recover from specific cases more gracefully and then test it thoroughly. d) Try and figure out how to propagate the failure to the top layer of the cluster, and hope you get the notice there soon enough so that it can "freeze" the cluster before the code reacts to the apparent failure and begins to try and recover from it. In the current code, sometimes you'll get behavior (a) and sometimes you'll get behavior (b) and sometimes you'll get behavior (c). In the particular case described by bug 1762, failure to reboot the node did indeed start the same resource twice. In a cluster where you have shared disk (like yours for example), that would probably trash the filesystem. Not a good plan unless you're tired of your current job ;-). I'd like to take most/all of the cases where you might get behavior (b) and cause them to use behavior (a). If writing correct code and testing it were free, then (c) would obviously be the right choice. Quite honestly, I don't know how to do (d) in a reliable way at all. It's much more difficult than it sounds. Among other reasons, it relies on the components you're telling to freeze things to work correctly. Since resource freezes happen at the top level of the system, and the top layers need all the layers under them to work correctly, getting this right seems to be the kind of approach you could make into your life's work - and still never get it right. Case (c) has to be handled on a case by case basis, where you write and test the code for a particular failure case. IMHO the only feasible _general_ answer is (a). There are an infinite number of things that can go wrong. So,