Hi Klaus, Hi Jan, Hi All, About watchdog using WD service, there does not seem to be the opposite opinion. I do work to make an official patch from next week.
Best Regards, Hideo Yamauchi. ----- Original Message ----- > From: "renayama19661...@ybb.ne.jp" <renayama19661...@ybb.ne.jp> > To: Cluster Labs - All topics related to open-source clustering welcomed > <users@clusterlabs.org> > Cc: > Date: 2016/10/26, Wed 17:46 > Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is > frozen, cluster decisions are delayed infinitely > > Hi Klaus, > Hi Jan, > Hi All, > > Our member argued about watchdog using WD service. > > 1) The WD service is not abolished. > 2) In pacemaker_remote, it is available by starting corosync in localhost. > 3) It is necessary for the scramble of watchdog to consider it. > 4) Because I think about the case which does not use sbd, I do not think > about > adding an interface similar to corosync-API to sbd for the moment. > > The user chooses a method using method and WD service using sbd and will use > it. > It may cause confusion that there are two methods, but there is value for the > user who does not use sbd. > > We want to include watchdog using WD service in Pacemaker. > I intend to make an official patch. > > What do you think? > > Best Regards, > Hideo Yamauchi. > > > > ----- Original Message ----- >> From: "renayama19661...@ybb.ne.jp" > <renayama19661...@ybb.ne.jp> >> To: Cluster Labs - All topics related to open-source clustering welcomed > <users@clusterlabs.org> >> Cc: >> Date: 2016/10/20, Thu 19:08 >> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd > is frozen, cluster decisions are delayed infinitely >> >> Hi Klaus, >> Hi Jan, >> >> Thank you for comment. >> >> I wait for other comment a little more. >> We will argue about this matter next week. >> >> Best Regards, >> Hideo Yamauchi. >> >> >> ----- Original Message ----- >>> From: Jan Friesse <jfrie...@redhat.com> >>> To: kwenn...@redhat.com; Cluster Labs - All topics related to > open-source >> clustering welcomed <users@clusterlabs.org> >>> Cc: >>> Date: 2016/10/20, Thu 15:46 >>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC > crmd >> is frozen, cluster decisions are delayed infinitely >>> >>>> >>>> On 10/14/2016 11:21 AM, renayama19661...@ybb.ne.jp wrote: >>>>> Hi Klaus, >>>>> Hi All, >>>>> >>>>> I tried prototype of watchdog using WD service. >>>>> - >>> >> > https://github.com/HideoYamauchi/pacemaker/commit/3ee97b76e0212b1790226864dfcacd1a327dbcc9 >>>>> >>>>> Please comment. >>>> Thank you Hideo for providing the prototype. >>>> Added the patch to my build and it seems to >>>> be working as expected. >>>> >>>> A few thoughts triggered by this approach: >>>> >>>> - we have to alert the corosync-people as in >>>> a chat with Jan Friesse he pointed me to the >>>> fact that for corosync 3.x the wd-service was >>>> planned to be removed >>> >>> Actually I didn't express myself correctly. What I wanted to say > was >>> "I'm considering idea of removing it", simply because >> it's >>> disabled in >>> downstream. >>> >>> BUT keep in mind that removing functionality = ask community to find > out >>> if there is not somebody actively using it. >>> >>> And because there is active users and future use case, removing of wd > is >>> not an option. >>> >>> >>>> >>>> especially delicate as the binding is very loose >>>> so that - as is - it builds against a corosync with >>>> disabled wd-service without any complaints... >>>> >>>> - as of now if you enable wd-service in the >>>> corosync-build it is on by default and would >>>> be hogging the watchdog presumably >>>> (there is obviously a pull request that makes >>>> it default to off) >>>> >>>> - with my thoughts about adding an API to >>>> sbd previously in the thread I was trying to >>>> target closer observation of pacemaker_remoted >>>> as well (remote-nodes don't have corosync >>>> running) >>>> >>>> I guess it would be possible to run corosync >>>> with a static config as single-node cluster >>>> bound to localhost for that purpose. >>>> >>>> I read the thread about corosync-remote and >>>> that happening might make the special-handling >>>> for pacemaker-remote obsolete anyway ... >>>> >>>> - to enable the approach to live alongside >>>> sbd it would be possible to make sbd use >>>> the corosync-API as well for watchdog purposes >>>> instead of opening the watchdog directly >>>> >>>> This shouldn't be a big deal for sbd used to >>>> observe a pacemaker-node as cluster-watcher >>>> (the part of sbd that sends cpg-pings to corosync) >>>> already builds against corosync. >>>> The blockdevice-part of sbd being basically >>>> generic it might be an issue though. >>>> >>>> Regards, >>>> Klaus >>>> >>>>> >>>>> >>>>> Best Regards, >>>>> Hideo Yamauchi. >>>>> >>>>> >>>>> ----- Original Message ----- >>>>>> From: "renayama19661...@ybb.ne.jp" >>> <renayama19661...@ybb.ne.jp> >>>>>> To: "users@clusterlabs.org" >> <users@clusterlabs.org> >>>>>> Cc: >>>>>> Date: 2016/10/11, Tue 17:58 >>>>>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: > When >> the >>> DC crmd is frozen, cluster decisions are delayed infinitely >>>>>> >>>>>> Hi Klaus, >>>>>> >>>>>> Thank you for comment. >>>>>> >>>>>> I make the patch which is prototype using WD service. >>>>>> >>>>>> Please wait a little. >>>>>> >>>>>> Best Regards, >>>>>> Hideo Yamauchi. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ----- Original Message ----- >>>>>>> From: Klaus Wenninger <kwenn...@redhat.com> >>>>>>> To: users@clusterlabs.org >>>>>>> Cc: >>>>>>> Date: 2016/10/10, Mon 21:03 >>>>>>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: > Antw: Re: >> When >>> the DC crmd >>>>>> is frozen, cluster decisions are delayed infinitely >>>>>>> On 10/07/2016 11:10 PM, renayama19661...@ybb.ne.jp >> wrote: >>>>>>>> Hi All, >>>>>>>> >>>>>>>> Our user may not necessarily use sdb. >>>>>>>> >>>>>>>> I confirmed that there was a method using WD >> service of >>> corosync as >>>>>> one >>>>>>> method not to use sdb. >>>>>>>> Pacemaker watches the process of pacemaker by > WD >> service >>> using CMAP >>>>>> and can >>>>>>> carry out watchdog. >>>>>>> >>>>>>> Have to have a look at that... >>>>>>> But if we establish some in-between-layer in > pacemaker >> we >>> could have this >>>>>>> as one of the possibilities besides e.g. sbd (with >> enhanced >>> API), going for >>>>>>> a watchdog-device directly, ... >>>>>>> >>>>>>>> >>>>>>>> We can set up a patch of pacemaker. >>>>>>> Always helpful to discuss/clarify an idea once some > code >> is >>> available ... >>>>>>> >>>>>>>> Was the discussion of using WD service over so > far? >>>>>>> Not from my pov. Just a day off ;-) >>>>>>> >>>>>>>> >>>>>>>> Best Regard, >>>>>>>> Hideo Yamauchi. >>>>>>>> >>>>>>>> >>>>>>>> ----- Original Message ----- >>>>>>>>> From: Klaus Wenninger >> <kwenn...@redhat.com> >>>>>>>>> To: Ulrich Windl >>> <ulrich.wi...@rz.uni-regensburg.de>; >>>>>>> users@clusterlabs.org >>>>>>>>> Cc: >>>>>>>>> Date: 2016/10/7, Fri 17:47 >>>>>>>>> Subject: Re: [ClusterLabs] Antw: Re: Antw: > Re: >> Antw: >>> Re: When the >>>>>> DC >>>>>>> crmd is frozen, cluster decisions are delayed > infinitely >>>>>>>>> On 10/07/2016 08:14 AM, Ulrich Windl > wrote: >>>>>>>>>>>>> Klaus Wenninger >>> <kwenn...@redhat.com> >>>>>> schrieb am >>>>>>>>> 06.10.2016 um 18:03 in >>>>>>>>>> Nachricht >>>>>> <3980cfdd-ebd9-1597-f6bd-a1ca808f7...@redhat.com>: >>>>>>>>>>> On 10/05/2016 04:22 PM, >>> renayama19661...@ybb.ne.jp wrote: >>>>>>>>>>>> Hi All, >>>>>>>>>>>> >>>>>>>>>>>>>> If a user uses sbd, > can >> the >>> cluster evade a >>>>>>> problem of >>>>>>>>> SIGSTOP of crmd? >>>>>>>>>>>>> >>>>>>>>>>>>> As pointed out earlier, > maybe >> crmd >>> should feed a >>>>>>> watchdog. Then >>>>>>>>> stopping >>>>>>>>>>> crmd >>>>>>>>>>>>> will reboot the node > (unless >> the >>> watchdog fails). >>>>>>>>>>>> Thank you for comment. >>>>>>>>>>>> >>>>>>>>>>>> We examine watchdog of crmd, > too. >>>>>>>>>>>> In addition, I comment after >>> examination advanced. >>>>>>>>>>> Was thinking of doing a small > test >>> implementation going >>>>>>>>>>> a little in the direction Lars >> Ellenberg >>> had been >>>>>> pointing >>>>>>> out. >>>>>>>>>>> a couple of thoughts I had so > far: >>>>>>>>>>> >>>>>>>>>>> - add an API (via DBus or libqb - > >> favoring >>> libqb atm) to >>>>>> sbd >>>>>>>>>>> an application can use to > create a >>> watchdog within sbd >>>>>>>>>> Why has it to be done within sbd? >>>>>>>>> Not necessarily, could be spawned out as > well >> into >>> an own project >>>>>> or >>>>>>>>> something already existent could be taken. >>>>>>>>> Remember to have added a dbus-interface to >>>>>>>>> https://sourceforge.net/projects/watchdog/ > for >> a >>> project once. >>>>>>>>> If you have a suggestion I'm open. >>>>>>>>> Going off sbd would have the advantage of > a >> smooth >>> start: >>>>>>>>> >>>>>>>>> - cluster/pacemaker-watcher are there > already >> and >>> can >>>>>>>>> be replaced/moved over time >>>>>>>>> - the lifecycle of the daemon (when >> started/stopped) >>> is >>>>>>>>> already something that is in the code > and in >> the >>> people's >>>>>> minds >>>>>>>>>>> - parameters for the first are a > name >> and a >>> timeout >>>>>>>>>>> >>>>>>>>>>> - first use-case would be crmd >> observation >>>>>>>>>>> >>>>>>>>>>> - later on we could think of > removing >>> pacemaker >>>>>> dependencies >>>>>>>>>>> from sbd by moving the actual >>> implementation of >>>>>>>>>>> pacemaker-watcher and probably >>> cluster-watcher as well >>>>>>>>>>> into pacemaker - using the new > API >>>>>>>>>>> >>>>>>>>>>> - this of course creates sbd >> dependency >>> within pacemaker >>>>>> so >>>>>>>>>>> that it would make sense to > offer a >>> simpler and >>>>>>> self-contained >>>>>>>>>>> implementation within pacemaker > as >> an >>> alternative >>>>>>>>>> I think the watchdog interface is so >> simple >>> that you >>>>>> don't >>>>>>> need a relay >>>>>>>>> for it. The only limit I can imagine is > the >> number >>> of watchdogs >>>>>>> available of >>>>>>>>> some specific hardware. >>>>>>>>> That is the point ;-) >>>>>>>>>>> thus it would be favorable to > have >> the >>> dependency >>>>>>>>>>> within a non-compulsory >> pacemaker-rpm so >>> that >>>>>>>>>>> we can offer an alternative > that >>> doesn't use sbd >>>>>>>>>>> at maybe the cost of being less > >> reliable >>> or one >>>>>>>>>>> that owns a hardware-watchdog > by >> itself >>> for systems >>>>>>>>>>> where this is still unused. >>>>>>>>>>> >>>>>>>>>>> - e.g. via some kind of plugin >> (Andrew >>> forgive me - >>>>>>>>>>> > >> >>> no >>>>>> pils ;-) >>>>>>> ) >>>>>>>>>>> - or via an additional daemon >>>>>>>>>>> >>>>>>>>>>> What did you have in mind? >>>>>>>>>>> Maybe it makes sense to > synchronize... >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Klaus >>>>>>>>>>> >>>>>>>>>>>> Best Regards, >>>>>>>>>>>> Hideo Yamauchi. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ----- Original Message ----- >>>>>>>>>>>>> From: Ulrich Windl >>>>>>> <ulrich.wi...@rz.uni-regensburg.de> >>>>>>>>>>>>> To: > users@clusterlabs.org; >>>>>> renayama19661...@ybb.ne.jp >>>>>>>>>>>>> Cc: >>>>>>>>>>>>> Date: 2016/10/5, Wed > 23:08 >>>>>>>>>>>>> Subject: Antw: Re: >> [ClusterLabs] >>> Antw: Re: When >>>>>> the DC >>>>>>> crmd is >>>>>>>>> frozen, >>>>>>>>>>> cluster decisions are delayed >> infinitely >>>>>>>>>>>>>>>> >>> <renayama19661...@ybb.ne.jp> >>>>>>> schrieb am >>>>>>>>> 21.09.2016 um 11:52 >>>>>>>>>>>>> in Nachricht >>>>>>>>>>>>> >>>>>>> > <876439.61305...@web200311.mail.ssk.yahoo.co.jp>: >>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Was the final > conclusion >> given >>> about this >>>>>>> problem? >>>>>>>>>>>>>> If a user uses sbd, > can >> the >>> cluster evade a >>>>>>> problem of >>>>>>>>> SIGSTOP of crmd? >>>>>>>>>>>>> As pointed out earlier, > maybe >> crmd >>> should feed a >>>>>>> watchdog. Then >>>>>>>>> stopping >>>>>>>>>>> crmd >>>>>>>>>>>>> will reboot the node > (unless >> the >>> watchdog fails). >>>>>>>>>>>>> >>>>>>>>>>>>>> We are interested in > this >> >>> problem, too. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hideo Yamauchi. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>>>>>> Users mailing list: >>> Users@clusterlabs.org >>>>>>>>>>>>>> >>> http://clusterlabs.org/mailman/listinfo/users >>>>>>>>>>>>>> Project Home: >>> http://www.clusterlabs.org >>>>>>>>>>>>>> Getting started: >>>>>>>>> >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>>>>>>> Bugs: >>> http://bugs.clusterlabs.org >>>>>>>>>>>> >>> _______________________________________________ >>>>>>>>>>>> Users mailing list: >>> Users@clusterlabs.org >>>>>>>>>>>> >>> http://clusterlabs.org/mailman/listinfo/users >>>>>>>>>>>> >>>>>>>>>>>> Project Home: >>> http://www.clusterlabs.org >>>>>>>>>>>> Getting started: >>>>>>>>> >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>>>>> Bugs: > http://bugs.clusterlabs.org >>>>>>>>>>> >>> _______________________________________________ >>>>>>>>>>> Users mailing list: >> Users@clusterlabs.org >>>>>>>>>>> >>> http://clusterlabs.org/mailman/listinfo/users >>>>>>>>>>> >>>>>>>>>>> Project Home: >> http://www.clusterlabs.org >>>>>>>>>>> Getting started: >>>>>>>>> >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>>>> > _______________________________________________ >>>>>>>>> Users mailing list: Users@clusterlabs.org >>>>>>>>> > http://clusterlabs.org/mailman/listinfo/users >>>>>>>>> >>>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>>> Getting started: >>>>>>> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>>>> >>>>>>>> > _______________________________________________ >>>>>>>> Users mailing list: Users@clusterlabs.org >>>>>>>> http://clusterlabs.org/mailman/listinfo/users >>>>>>>> >>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>> Getting started: >>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Users mailing list: Users@clusterlabs.org >>>>>>> http://clusterlabs.org/mailman/listinfo/users >>>>>>> >>>>>>> Project Home: http://www.clusterlabs.org >>>>>>> Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>> >>>>>> _______________________________________________ >>>>>> Users mailing list: Users@clusterlabs.org >>>>>> http://clusterlabs.org/mailman/listinfo/users >>>>>> >>>>>> Project Home: http://www.clusterlabs.org >>>>>> Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>> Bugs: http://bugs.clusterlabs.org >>>>>> >>>>> _______________________________________________ >>>>> Users mailing list: Users@clusterlabs.org >>>>> http://clusterlabs.org/mailman/listinfo/users >>>>> >>>>> Project Home: http://www.clusterlabs.org >>>>> Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>> Bugs: http://bugs.clusterlabs.org >>>> >>>> >>>> >>>> _______________________________________________ >>>> Users mailing list: Users@clusterlabs.org >>>> http://clusterlabs.org/mailman/listinfo/users >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>>> >>> >>> >>> _______________________________________________ >>> Users mailing list: Users@clusterlabs.org >>> http://clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >> >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org >> http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org