Hi Klaus, Hi All, I tried prototype of watchdog using WD service. - https://github.com/HideoYamauchi/pacemaker/commit/3ee97b76e0212b1790226864dfcacd1a327dbcc9
Please comment. Best Regards, Hideo Yamauchi. ----- Original Message ----- > From: "[email protected]" <[email protected]> > To: "[email protected]" <[email protected]> > Cc: > Date: 2016/10/11, Tue 17:58 > Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is > frozen, cluster decisions are delayed infinitely > > Hi Klaus, > > Thank you for comment. > > I make the patch which is prototype using WD service. > > Please wait a little. > > Best Regards, > Hideo Yamauchi. > > > > > ----- Original Message ----- >> From: Klaus Wenninger <[email protected]> >> To: [email protected] >> Cc: >> Date: 2016/10/10, Mon 21:03 >> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd > is frozen, cluster decisions are delayed infinitely >> >> On 10/07/2016 11:10 PM, [email protected] wrote: >>> Hi All, >>> >>> Our user may not necessarily use sdb. >>> >>> I confirmed that there was a method using WD service of corosync as > one >> method not to use sdb. >>> Pacemaker watches the process of pacemaker by WD service using CMAP > and can >> carry out watchdog. >> >> Have to have a look at that... >> But if we establish some in-between-layer in pacemaker we could have this >> as one of the possibilities besides e.g. sbd (with enhanced API), going for >> a watchdog-device directly, ... >> >>> >>> >>> We can set up a patch of pacemaker. >> >> Always helpful to discuss/clarify an idea once some code is available ... >> >>> Was the discussion of using WD service over so far? >> >> Not from my pov. Just a day off ;-) >> >>> >>> >>> Best Regard, >>> Hideo Yamauchi. >>> >>> >>> ----- Original Message ----- >>>> From: Klaus Wenninger <[email protected]> >>>> To: Ulrich Windl <[email protected]>; >> [email protected] >>>> Cc: >>>> Date: 2016/10/7, Fri 17:47 >>>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the > DC >> crmd is frozen, cluster decisions are delayed infinitely >>>> >>>> On 10/07/2016 08:14 AM, Ulrich Windl wrote: >>>>>>>> Klaus Wenninger <[email protected]> > schrieb am >> >>>> 06.10.2016 um 18:03 in >>>>> Nachricht > <[email protected]>: >>>>>> On 10/05/2016 04:22 PM, [email protected] wrote: >>>>>>> Hi All, >>>>>>> >>>>>>>>> If a user uses sbd, can the cluster evade a >> problem of >>>> SIGSTOP of crmd? >>>>>>>> >>>>>>>> As pointed out earlier, maybe crmd should feed a >> watchdog. Then >>>> stopping >>>>>> crmd >>>>>>>> will reboot the node (unless the watchdog fails). >>>>>>> Thank you for comment. >>>>>>> >>>>>>> We examine watchdog of crmd, too. >>>>>>> In addition, I comment after examination advanced. >>>>>> Was thinking of doing a small test implementation going >>>>>> a little in the direction Lars Ellenberg had been > pointing >> out. >>>>>> >>>>>> a couple of thoughts I had so far: >>>>>> >>>>>> - add an API (via DBus or libqb - favoring libqb atm) to > sbd >>>>>> an application can use to create a watchdog within sbd >>>>> Why has it to be done within sbd? >>>> Not necessarily, could be spawned out as well into an own project > or >>>> something already existent could be taken. >>>> Remember to have added a dbus-interface to >>>> https://sourceforge.net/projects/watchdog/ for a project once. >>>> If you have a suggestion I'm open. >>>> Going off sbd would have the advantage of a smooth start: >>>> >>>> - cluster/pacemaker-watcher are there already and can >>>> be replaced/moved over time >>>> - the lifecycle of the daemon (when started/stopped) is >>>> already something that is in the code and in the people's > minds >>>> >>>>>> - parameters for the first are a name and a timeout >>>>>> >>>>>> - first use-case would be crmd observation >>>>>> >>>>>> - later on we could think of removing pacemaker > dependencies >>>>>> from sbd by moving the actual implementation of >>>>>> pacemaker-watcher and probably cluster-watcher as well >>>>>> into pacemaker - using the new API >>>>>> >>>>>> - this of course creates sbd dependency within pacemaker > so >>>>>> that it would make sense to offer a simpler and >> self-contained >>>>>> implementation within pacemaker as an alternative >>>>> I think the watchdog interface is so simple that you > don't >> need a relay >>>> for it. The only limit I can imagine is the number of watchdogs >> available of >>>> some specific hardware. >>>> That is the point ;-) >>>>>> thus it would be favorable to have the dependency >>>>>> within a non-compulsory pacemaker-rpm so that >>>>>> we can offer an alternative that doesn't use sbd >>>>>> at maybe the cost of being less reliable or one >>>>>> that owns a hardware-watchdog by itself for systems >>>>>> where this is still unused. >>>>>> >>>>>> - e.g. via some kind of plugin (Andrew forgive me - >>>>>> no > pils ;-) >> ) >>>>>> - or via an additional daemon >>>>>> >>>>>> What did you have in mind? >>>>>> Maybe it makes sense to synchronize... >>>>>> >>>>>> Regards, >>>>>> Klaus >>>>>> >>>>>>> Best Regards, >>>>>>> Hideo Yamauchi. >>>>>>> >>>>>>> >>>>>>> >>>>>>> ----- Original Message ----- >>>>>>>> From: Ulrich Windl >> <[email protected]> >>>>>>>> To: [email protected]; > [email protected] >>>>>>>> Cc: >>>>>>>> Date: 2016/10/5, Wed 23:08 >>>>>>>> Subject: Antw: Re: [ClusterLabs] Antw: Re: When > the DC >> crmd is >>>> frozen, >>>>>> cluster decisions are delayed infinitely >>>>>>>>>>> <[email protected]> >> schrieb am >>>> 21.09.2016 um 11:52 >>>>>>>> in Nachricht >>>>>>>> >> <[email protected]>: >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> Was the final conclusion given about this >> problem? >>>>>>>>> >>>>>>>>> If a user uses sbd, can the cluster evade a >> problem of >>>> SIGSTOP of crmd? >>>>>>>> As pointed out earlier, maybe crmd should feed a >> watchdog. Then >>>> stopping >>>>>> crmd >>>>>>>> will reboot the node (unless the watchdog fails). >>>>>>>> >>>>>>>>> We are interested in this problem, too. >>>>>>>>> >>>>>>>>> Best Regards, >>>>>>>>> >>>>>>>>> Hideo Yamauchi. >>>>>>>>> >>>>>>>>> >>>>>>>>> > _______________________________________________ >>>>>>>>> Users mailing list: [email protected] >>>>>>>>> http://clusterlabs.org/mailman/listinfo/users > >>>>>>>>> >>>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>>> Getting started: >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>> _______________________________________________ >>>>>>> Users mailing list: [email protected] >>>>>>> http://clusterlabs.org/mailman/listinfo/users >>>>>>> >>>>>>> Project Home: http://www.clusterlabs.org >>>>>>> Getting started: >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>> >>>>>> _______________________________________________ >>>>>> Users mailing list: [email protected] >>>>>> http://clusterlabs.org/mailman/listinfo/users >>>>>> >>>>>> Project Home: http://www.clusterlabs.org >>>>>> Getting started: >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>> Bugs: http://bugs.clusterlabs.org >>>>> >>>> >>>> _______________________________________________ >>>> Users mailing list: [email protected] >>>> http://clusterlabs.org/mailman/listinfo/users >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>>> >>> _______________________________________________ >>> Users mailing list: [email protected] >>> http://clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> >> >> _______________________________________________ >> Users mailing list: [email protected] >> http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > _______________________________________________ > Users mailing list: [email protected] > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
