Hi Klaus, Hi Jan, Thank you for comment.
I wait for other comment a little more. We will argue about this matter next week. Best Regards, Hideo Yamauchi. ----- Original Message ----- > From: Jan Friesse <[email protected]> > To: [email protected]; Cluster Labs - All topics related to open-source > clustering welcomed <[email protected]> > Cc: > Date: 2016/10/20, Thu 15:46 > Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is > frozen, cluster decisions are delayed infinitely > >> >> On 10/14/2016 11:21 AM, [email protected] wrote: >>> Hi Klaus, >>> Hi All, >>> >>> I tried prototype of watchdog using WD service. >>> - > https://github.com/HideoYamauchi/pacemaker/commit/3ee97b76e0212b1790226864dfcacd1a327dbcc9 >>> >>> Please comment. >> Thank you Hideo for providing the prototype. >> Added the patch to my build and it seems to >> be working as expected. >> >> A few thoughts triggered by this approach: >> >> - we have to alert the corosync-people as in >> a chat with Jan Friesse he pointed me to the >> fact that for corosync 3.x the wd-service was >> planned to be removed > > Actually I didn't express myself correctly. What I wanted to say was > "I'm considering idea of removing it", simply because it's > disabled in > downstream. > > BUT keep in mind that removing functionality = ask community to find out > if there is not somebody actively using it. > > And because there is active users and future use case, removing of wd is > not an option. > > >> >> especially delicate as the binding is very loose >> so that - as is - it builds against a corosync with >> disabled wd-service without any complaints... >> >> - as of now if you enable wd-service in the >> corosync-build it is on by default and would >> be hogging the watchdog presumably >> (there is obviously a pull request that makes >> it default to off) >> >> - with my thoughts about adding an API to >> sbd previously in the thread I was trying to >> target closer observation of pacemaker_remoted >> as well (remote-nodes don't have corosync >> running) >> >> I guess it would be possible to run corosync >> with a static config as single-node cluster >> bound to localhost for that purpose. >> >> I read the thread about corosync-remote and >> that happening might make the special-handling >> for pacemaker-remote obsolete anyway ... >> >> - to enable the approach to live alongside >> sbd it would be possible to make sbd use >> the corosync-API as well for watchdog purposes >> instead of opening the watchdog directly >> >> This shouldn't be a big deal for sbd used to >> observe a pacemaker-node as cluster-watcher >> (the part of sbd that sends cpg-pings to corosync) >> already builds against corosync. >> The blockdevice-part of sbd being basically >> generic it might be an issue though. >> >> Regards, >> Klaus >> >>> >>> >>> Best Regards, >>> Hideo Yamauchi. >>> >>> >>> ----- Original Message ----- >>>> From: "[email protected]" > <[email protected]> >>>> To: "[email protected]" <[email protected]> >>>> Cc: >>>> Date: 2016/10/11, Tue 17:58 >>>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the > DC crmd is frozen, cluster decisions are delayed infinitely >>>> >>>> Hi Klaus, >>>> >>>> Thank you for comment. >>>> >>>> I make the patch which is prototype using WD service. >>>> >>>> Please wait a little. >>>> >>>> Best Regards, >>>> Hideo Yamauchi. >>>> >>>> >>>> >>>> >>>> ----- Original Message ----- >>>>> From: Klaus Wenninger <[email protected]> >>>>> To: [email protected] >>>>> Cc: >>>>> Date: 2016/10/10, Mon 21:03 >>>>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When > the DC crmd >>>> is frozen, cluster decisions are delayed infinitely >>>>> On 10/07/2016 11:10 PM, [email protected] wrote: >>>>>> Hi All, >>>>>> >>>>>> Our user may not necessarily use sdb. >>>>>> >>>>>> I confirmed that there was a method using WD service of > corosync as >>>> one >>>>> method not to use sdb. >>>>>> Pacemaker watches the process of pacemaker by WD service > using CMAP >>>> and can >>>>> carry out watchdog. >>>>> >>>>> Have to have a look at that... >>>>> But if we establish some in-between-layer in pacemaker we > could have this >>>>> as one of the possibilities besides e.g. sbd (with enhanced > API), going for >>>>> a watchdog-device directly, ... >>>>> >>>>>> >>>>>> We can set up a patch of pacemaker. >>>>> Always helpful to discuss/clarify an idea once some code is > available ... >>>>> >>>>>> Was the discussion of using WD service over so far? >>>>> Not from my pov. Just a day off ;-) >>>>> >>>>>> >>>>>> Best Regard, >>>>>> Hideo Yamauchi. >>>>>> >>>>>> >>>>>> ----- Original Message ----- >>>>>>> From: Klaus Wenninger <[email protected]> >>>>>>> To: Ulrich Windl > <[email protected]>; >>>>> [email protected] >>>>>>> Cc: >>>>>>> Date: 2016/10/7, Fri 17:47 >>>>>>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: > Re: When the >>>> DC >>>>> crmd is frozen, cluster decisions are delayed infinitely >>>>>>> On 10/07/2016 08:14 AM, Ulrich Windl wrote: >>>>>>>>>>> Klaus Wenninger > <[email protected]> >>>> schrieb am >>>>>>> 06.10.2016 um 18:03 in >>>>>>>> Nachricht >>>> <[email protected]>: >>>>>>>>> On 10/05/2016 04:22 PM, > [email protected] wrote: >>>>>>>>>> Hi All, >>>>>>>>>> >>>>>>>>>>>> If a user uses sbd, can the > cluster evade a >>>>> problem of >>>>>>> SIGSTOP of crmd? >>>>>>>>>>> >>>>>>>>>>> As pointed out earlier, maybe crmd > should feed a >>>>> watchdog. Then >>>>>>> stopping >>>>>>>>> crmd >>>>>>>>>>> will reboot the node (unless the > watchdog fails). >>>>>>>>>> Thank you for comment. >>>>>>>>>> >>>>>>>>>> We examine watchdog of crmd, too. >>>>>>>>>> In addition, I comment after > examination advanced. >>>>>>>>> Was thinking of doing a small test > implementation going >>>>>>>>> a little in the direction Lars Ellenberg > had been >>>> pointing >>>>> out. >>>>>>>>> a couple of thoughts I had so far: >>>>>>>>> >>>>>>>>> - add an API (via DBus or libqb - favoring > libqb atm) to >>>> sbd >>>>>>>>> an application can use to create a > watchdog within sbd >>>>>>>> Why has it to be done within sbd? >>>>>>> Not necessarily, could be spawned out as well into > an own project >>>> or >>>>>>> something already existent could be taken. >>>>>>> Remember to have added a dbus-interface to >>>>>>> https://sourceforge.net/projects/watchdog/ for a > project once. >>>>>>> If you have a suggestion I'm open. >>>>>>> Going off sbd would have the advantage of a smooth > start: >>>>>>> >>>>>>> - cluster/pacemaker-watcher are there already and > can >>>>>>> be replaced/moved over time >>>>>>> - the lifecycle of the daemon (when started/stopped) > is >>>>>>> already something that is in the code and in the > people's >>>> minds >>>>>>>>> - parameters for the first are a name and a > timeout >>>>>>>>> >>>>>>>>> - first use-case would be crmd observation >>>>>>>>> >>>>>>>>> - later on we could think of removing > pacemaker >>>> dependencies >>>>>>>>> from sbd by moving the actual > implementation of >>>>>>>>> pacemaker-watcher and probably > cluster-watcher as well >>>>>>>>> into pacemaker - using the new API >>>>>>>>> >>>>>>>>> - this of course creates sbd dependency > within pacemaker >>>> so >>>>>>>>> that it would make sense to offer a > simpler and >>>>> self-contained >>>>>>>>> implementation within pacemaker as an > alternative >>>>>>>> I think the watchdog interface is so simple > that you >>>> don't >>>>> need a relay >>>>>>> for it. The only limit I can imagine is the number > of watchdogs >>>>> available of >>>>>>> some specific hardware. >>>>>>> That is the point ;-) >>>>>>>>> thus it would be favorable to have the > dependency >>>>>>>>> within a non-compulsory pacemaker-rpm so > that >>>>>>>>> we can offer an alternative that > doesn't use sbd >>>>>>>>> at maybe the cost of being less reliable > or one >>>>>>>>> that owns a hardware-watchdog by itself > for systems >>>>>>>>> where this is still unused. >>>>>>>>> >>>>>>>>> - e.g. via some kind of plugin (Andrew > forgive me - >>>>>>>>> > no >>>> pils ;-) >>>>> ) >>>>>>>>> - or via an additional daemon >>>>>>>>> >>>>>>>>> What did you have in mind? >>>>>>>>> Maybe it makes sense to synchronize... >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Klaus >>>>>>>>> >>>>>>>>>> Best Regards, >>>>>>>>>> Hideo Yamauchi. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ----- Original Message ----- >>>>>>>>>>> From: Ulrich Windl >>>>> <[email protected]> >>>>>>>>>>> To: [email protected]; >>>> [email protected] >>>>>>>>>>> Cc: >>>>>>>>>>> Date: 2016/10/5, Wed 23:08 >>>>>>>>>>> Subject: Antw: Re: [ClusterLabs] > Antw: Re: When >>>> the DC >>>>> crmd is >>>>>>> frozen, >>>>>>>>> cluster decisions are delayed infinitely >>>>>>>>>>>>>> > <[email protected]> >>>>> schrieb am >>>>>>> 21.09.2016 um 11:52 >>>>>>>>>>> in Nachricht >>>>>>>>>>> >>>>> <[email protected]>: >>>>>>>>>>>> Hi All, >>>>>>>>>>>> >>>>>>>>>>>> Was the final conclusion given > about this >>>>> problem? >>>>>>>>>>>> If a user uses sbd, can the > cluster evade a >>>>> problem of >>>>>>> SIGSTOP of crmd? >>>>>>>>>>> As pointed out earlier, maybe crmd > should feed a >>>>> watchdog. Then >>>>>>> stopping >>>>>>>>> crmd >>>>>>>>>>> will reboot the node (unless the > watchdog fails). >>>>>>>>>>> >>>>>>>>>>>> We are interested in this > problem, too. >>>>>>>>>>>> >>>>>>>>>>>> Best Regards, >>>>>>>>>>>> >>>>>>>>>>>> Hideo Yamauchi. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>> _______________________________________________ >>>>>>>>>>>> Users mailing list: > [email protected] >>>>>>>>>>>> > http://clusterlabs.org/mailman/listinfo/users >>>>>>>>>>>> Project Home: > http://www.clusterlabs.org >>>>>>>>>>>> Getting started: >>>>>>> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>>>>> Bugs: > http://bugs.clusterlabs.org >>>>>>>>>> > _______________________________________________ >>>>>>>>>> Users mailing list: > [email protected] >>>>>>>>>> > http://clusterlabs.org/mailman/listinfo/users >>>>>>>>>> >>>>>>>>>> Project Home: > http://www.clusterlabs.org >>>>>>>>>> Getting started: >>>>>>> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>>>> > _______________________________________________ >>>>>>>>> Users mailing list: [email protected] >>>>>>>>> > http://clusterlabs.org/mailman/listinfo/users >>>>>>>>> >>>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>>> Getting started: >>>>>>> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>> _______________________________________________ >>>>>>> Users mailing list: [email protected] >>>>>>> http://clusterlabs.org/mailman/listinfo/users >>>>>>> >>>>>>> Project Home: http://www.clusterlabs.org >>>>>>> Getting started: >>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>> >>>>>> _______________________________________________ >>>>>> Users mailing list: [email protected] >>>>>> http://clusterlabs.org/mailman/listinfo/users >>>>>> >>>>>> Project Home: http://www.clusterlabs.org >>>>>> Getting started: >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>> Bugs: http://bugs.clusterlabs.org >>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list: [email protected] >>>>> http://clusterlabs.org/mailman/listinfo/users >>>>> >>>>> Project Home: http://www.clusterlabs.org >>>>> Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>> Bugs: http://bugs.clusterlabs.org >>>>> >>>> _______________________________________________ >>>> Users mailing list: [email protected] >>>> http://clusterlabs.org/mailman/listinfo/users >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>>> >>> _______________________________________________ >>> Users mailing list: [email protected] >>> http://clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> >> >> _______________________________________________ >> Users mailing list: [email protected] >> http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > > _______________________________________________ > Users mailing list: [email protected] > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
