Re: [Openais] kernel watchdog timer for corosync

Jan Friesse Wed, 26 May 2010 00:25:51 -0700

Angus Salkeld wrote:
> On Tue, May 25, 2010 at 10:02:06AM -0700, Alan Jones wrote:
>> This seems like a good design for services that cannot tolerate restart.
> Well sam gives you option of restart the process or watchdog (others too).


In reality, SAM gives you option to:
- quit process and mark as failed if HC fail
- or (and this one seems to be more interesting for you) restart fallen
process and if you will decide to take wd action just call sam_mark_failed.

> 
>> However, Pacemaker is designed to restart - so registering a watchdog
>> for it doesn't make sense.  We clearly need a watchdog on the corosync
>> daemon and may need one on whatever is restarting Pacemaker (corosync
>> also?).  It is also interesting to ask where in corosync the watchdog is
>> petted.
> At the moment I have put this in a timer, which isn't too bad as it is
> driven off of the poll loop.
> 
>> Petting the watchdog should indicate that corosync is live is some higher
>> sense and not blocked on socket calls, for example.
> 
> I suspect this might make corosync hit too many false positives.
> 
> There is a totempg_callback_token_create() call that will run your function
> when a token is sent or recieved (depending on the option you pass to it).
> We could hook this up to pett the watchdog. But what happens on lossy
> network? How do you set the tolerance?
> 
> I'll have a think about it though, prehaps Steve has some sugestions.
> 
> -Angus
> 
>> Alan
>>
>> On Mon, May 24, 2010 at 5:29 PM, Angus Salkeld <[email protected]> wrote:
>>
> 
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais

_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Re: [Openais] kernel watchdog timer for corosync

Reply via email to