Angus Salkeld wrote: > On Tue, May 25, 2010 at 10:02:06AM -0700, Alan Jones wrote: >> This seems like a good design for services that cannot tolerate restart. > Well sam gives you option of restart the process or watchdog (others too).
In reality, SAM gives you option to: - quit process and mark as failed if HC fail - or (and this one seems to be more interesting for you) restart fallen process and if you will decide to take wd action just call sam_mark_failed. > >> However, Pacemaker is designed to restart - so registering a watchdog >> for it doesn't make sense. We clearly need a watchdog on the corosync >> daemon and may need one on whatever is restarting Pacemaker (corosync >> also?). It is also interesting to ask where in corosync the watchdog is >> petted. > At the moment I have put this in a timer, which isn't too bad as it is > driven off of the poll loop. > >> Petting the watchdog should indicate that corosync is live is some higher >> sense and not blocked on socket calls, for example. > > I suspect this might make corosync hit too many false positives. > > There is a totempg_callback_token_create() call that will run your function > when a token is sent or recieved (depending on the option you pass to it). > We could hook this up to pett the watchdog. But what happens on lossy > network? How do you set the tolerance? > > I'll have a think about it though, prehaps Steve has some sugestions. > > -Angus > >> Alan >> >> On Mon, May 24, 2010 at 5:29 PM, Angus Salkeld <[email protected]> wrote: >> > > _______________________________________________ > Openais mailing list > [email protected] > https://lists.linux-foundation.org/mailman/listinfo/openais _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
