On 10/14/2016 11:21 AM, [email protected] wrote: > Hi Klaus, > Hi All, > > I tried prototype of watchdog using WD service. > - > https://github.com/HideoYamauchi/pacemaker/commit/3ee97b76e0212b1790226864dfcacd1a327dbcc9 > > Please comment. Thank you Hideo for providing the prototype. Added the patch to my build and it seems to be working as expected.
A few thoughts triggered by this approach: - we have to alert the corosync-people as in a chat with Jan Friesse he pointed me to the fact that for corosync 3.x the wd-service was planned to be removed especially delicate as the binding is very loose so that - as is - it builds against a corosync with disabled wd-service without any complaints... - as of now if you enable wd-service in the corosync-build it is on by default and would be hogging the watchdog presumably (there is obviously a pull request that makes it default to off) - with my thoughts about adding an API to sbd previously in the thread I was trying to target closer observation of pacemaker_remoted as well (remote-nodes don't have corosync running) I guess it would be possible to run corosync with a static config as single-node cluster bound to localhost for that purpose. I read the thread about corosync-remote and that happening might make the special-handling for pacemaker-remote obsolete anyway ... - to enable the approach to live alongside sbd it would be possible to make sbd use the corosync-API as well for watchdog purposes instead of opening the watchdog directly This shouldn't be a big deal for sbd used to observe a pacemaker-node as cluster-watcher (the part of sbd that sends cpg-pings to corosync) already builds against corosync. The blockdevice-part of sbd being basically generic it might be an issue though. Regards, Klaus > > > Best Regards, > Hideo Yamauchi. > > > ----- Original Message ----- >> From: "[email protected]" <[email protected]> >> To: "[email protected]" <[email protected]> >> Cc: >> Date: 2016/10/11, Tue 17:58 >> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is >> frozen, cluster decisions are delayed infinitely >> >> Hi Klaus, >> >> Thank you for comment. >> >> I make the patch which is prototype using WD service. >> >> Please wait a little. >> >> Best Regards, >> Hideo Yamauchi. >> >> >> >> >> ----- Original Message ----- >>> From: Klaus Wenninger <[email protected]> >>> To: [email protected] >>> Cc: >>> Date: 2016/10/10, Mon 21:03 >>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd >> is frozen, cluster decisions are delayed infinitely >>> On 10/07/2016 11:10 PM, [email protected] wrote: >>>> Hi All, >>>> >>>> Our user may not necessarily use sdb. >>>> >>>> I confirmed that there was a method using WD service of corosync as >> one >>> method not to use sdb. >>>> Pacemaker watches the process of pacemaker by WD service using CMAP >> and can >>> carry out watchdog. >>> >>> Have to have a look at that... >>> But if we establish some in-between-layer in pacemaker we could have this >>> as one of the possibilities besides e.g. sbd (with enhanced API), going for >>> a watchdog-device directly, ... >>> >>>> >>>> We can set up a patch of pacemaker. >>> Always helpful to discuss/clarify an idea once some code is available ... >>> >>>> Was the discussion of using WD service over so far? >>> Not from my pov. Just a day off ;-) >>> >>>> >>>> Best Regard, >>>> Hideo Yamauchi. >>>> >>>> >>>> ----- Original Message ----- >>>>> From: Klaus Wenninger <[email protected]> >>>>> To: Ulrich Windl <[email protected]>; >>> [email protected] >>>>> Cc: >>>>> Date: 2016/10/7, Fri 17:47 >>>>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the >> DC >>> crmd is frozen, cluster decisions are delayed infinitely >>>>> On 10/07/2016 08:14 AM, Ulrich Windl wrote: >>>>>>>>> Klaus Wenninger <[email protected]> >> schrieb am >>>>> 06.10.2016 um 18:03 in >>>>>> Nachricht >> <[email protected]>: >>>>>>> On 10/05/2016 04:22 PM, [email protected] wrote: >>>>>>>> Hi All, >>>>>>>> >>>>>>>>>> If a user uses sbd, can the cluster evade a >>> problem of >>>>> SIGSTOP of crmd? >>>>>>>>> >>>>>>>>> As pointed out earlier, maybe crmd should feed a >>> watchdog. Then >>>>> stopping >>>>>>> crmd >>>>>>>>> will reboot the node (unless the watchdog fails). >>>>>>>> Thank you for comment. >>>>>>>> >>>>>>>> We examine watchdog of crmd, too. >>>>>>>> In addition, I comment after examination advanced. >>>>>>> Was thinking of doing a small test implementation going >>>>>>> a little in the direction Lars Ellenberg had been >> pointing >>> out. >>>>>>> a couple of thoughts I had so far: >>>>>>> >>>>>>> - add an API (via DBus or libqb - favoring libqb atm) to >> sbd >>>>>>> an application can use to create a watchdog within sbd >>>>>> Why has it to be done within sbd? >>>>> Not necessarily, could be spawned out as well into an own project >> or >>>>> something already existent could be taken. >>>>> Remember to have added a dbus-interface to >>>>> https://sourceforge.net/projects/watchdog/ for a project once. >>>>> If you have a suggestion I'm open. >>>>> Going off sbd would have the advantage of a smooth start: >>>>> >>>>> - cluster/pacemaker-watcher are there already and can >>>>> be replaced/moved over time >>>>> - the lifecycle of the daemon (when started/stopped) is >>>>> already something that is in the code and in the people's >> minds >>>>>>> - parameters for the first are a name and a timeout >>>>>>> >>>>>>> - first use-case would be crmd observation >>>>>>> >>>>>>> - later on we could think of removing pacemaker >> dependencies >>>>>>> from sbd by moving the actual implementation of >>>>>>> pacemaker-watcher and probably cluster-watcher as well >>>>>>> into pacemaker - using the new API >>>>>>> >>>>>>> - this of course creates sbd dependency within pacemaker >> so >>>>>>> that it would make sense to offer a simpler and >>> self-contained >>>>>>> implementation within pacemaker as an alternative >>>>>> I think the watchdog interface is so simple that you >> don't >>> need a relay >>>>> for it. The only limit I can imagine is the number of watchdogs >>> available of >>>>> some specific hardware. >>>>> That is the point ;-) >>>>>>> thus it would be favorable to have the dependency >>>>>>> within a non-compulsory pacemaker-rpm so that >>>>>>> we can offer an alternative that doesn't use sbd >>>>>>> at maybe the cost of being less reliable or one >>>>>>> that owns a hardware-watchdog by itself for systems >>>>>>> where this is still unused. >>>>>>> >>>>>>> - e.g. via some kind of plugin (Andrew forgive me - >>>>>>> no >> pils ;-) >>> ) >>>>>>> - or via an additional daemon >>>>>>> >>>>>>> What did you have in mind? >>>>>>> Maybe it makes sense to synchronize... >>>>>>> >>>>>>> Regards, >>>>>>> Klaus >>>>>>> >>>>>>>> Best Regards, >>>>>>>> Hideo Yamauchi. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ----- Original Message ----- >>>>>>>>> From: Ulrich Windl >>> <[email protected]> >>>>>>>>> To: [email protected]; >> [email protected] >>>>>>>>> Cc: >>>>>>>>> Date: 2016/10/5, Wed 23:08 >>>>>>>>> Subject: Antw: Re: [ClusterLabs] Antw: Re: When >> the DC >>> crmd is >>>>> frozen, >>>>>>> cluster decisions are delayed infinitely >>>>>>>>>>>> <[email protected]> >>> schrieb am >>>>> 21.09.2016 um 11:52 >>>>>>>>> in Nachricht >>>>>>>>> >>> <[email protected]>: >>>>>>>>>> Hi All, >>>>>>>>>> >>>>>>>>>> Was the final conclusion given about this >>> problem? >>>>>>>>>> If a user uses sbd, can the cluster evade a >>> problem of >>>>> SIGSTOP of crmd? >>>>>>>>> As pointed out earlier, maybe crmd should feed a >>> watchdog. Then >>>>> stopping >>>>>>> crmd >>>>>>>>> will reboot the node (unless the watchdog fails). >>>>>>>>> >>>>>>>>>> We are interested in this problem, too. >>>>>>>>>> >>>>>>>>>> Best Regards, >>>>>>>>>> >>>>>>>>>> Hideo Yamauchi. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >> _______________________________________________ >>>>>>>>>> Users mailing list: [email protected] >>>>>>>>>> http://clusterlabs.org/mailman/listinfo/users >>>>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>>>> Getting started: >>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>>> _______________________________________________ >>>>>>>> Users mailing list: [email protected] >>>>>>>> http://clusterlabs.org/mailman/listinfo/users >>>>>>>> >>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>> Getting started: >>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>> _______________________________________________ >>>>>>> Users mailing list: [email protected] >>>>>>> http://clusterlabs.org/mailman/listinfo/users >>>>>>> >>>>>>> Project Home: http://www.clusterlabs.org >>>>>>> Getting started: >>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>> Bugs: http://bugs.clusterlabs.org >>>>> _______________________________________________ >>>>> Users mailing list: [email protected] >>>>> http://clusterlabs.org/mailman/listinfo/users >>>>> >>>>> Project Home: http://www.clusterlabs.org >>>>> Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>> Bugs: http://bugs.clusterlabs.org >>>>> >>>> _______________________________________________ >>>> Users mailing list: [email protected] >>>> http://clusterlabs.org/mailman/listinfo/users >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>> >>> >>> _______________________________________________ >>> Users mailing list: [email protected] >>> http://clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >> _______________________________________________ >> Users mailing list: [email protected] >> http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > _______________________________________________ > Users mailing list: [email protected] > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
