Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
. >>>>>>>> >>>>>>>> >>>>>>>> - Original Message - >>>>>>>>> From: Klaus Wenninger >> <kwenn...@redhat.com> >>>>>>>>> To: Ulrich Windl >>> <ulrich.wi...@rz.uni-regensburg.de>; >>>>>>> users@clusterlabs.org >>>>>>>>> Cc: >>>>>>>>> Date: 2016/10/7, Fri 17:47 >>>>>>>>> Subject: Re: [ClusterLabs] Antw: Re: Antw: > Re: >> Antw: >>> Re: When the >>>>>> DC >>>>>>> crmd is frozen, cluster decisions are delayed > infinitely >>>>>>>>> On 10/07/2016 08:14 AM, Ulrich Windl > wrote: >>>>>>>>>>>>> Klaus Wenninger >>> <kwenn...@redhat.com> >>>>>> schrieb am >>>>>>>>> 06.10.2016 um 18:03 in >>>>>>>>>> Nachricht >>>>>> <3980cfdd-ebd9-1597-f6bd-a1ca808f7...@redhat.com>: >>>>>>>>>>> On 10/05/2016 04:22 PM, >>> renayama19661...@ybb.ne.jp wrote: >>>>>>>>>>>> Hi All, >>>>>>>>>>>> >>>>>>>>>>>>>> If a user uses sbd, > can >> the >>> cluster evade a >>>>>>> problem of >>>>>>>>> SIGSTOP of crmd? >>>>>>>>>>>>> >>>>>>>>>>>>> As pointed out earlier, > maybe >> crmd >>> should feed a >>>>>>> watchdog. Then >>>>>>>>> stopping >>>>>>>>>>> crmd >>>>>>>>>>>>> will reboot the node > (unless >> the >>> watchdog fails). >>>>>>>>>>>> Thank you for comment. >>>>>>>>>>>> >>>>>>>>>>>> We examine watchdog of crmd, > too. >>>>>>>>>>>> In addition, I comment after >>> examination advanced. >>>>>>>>>>> Was thinking of doing a small > test >>> implementation going >>>>>>>>>>> a little in the direction Lars >> Ellenberg >>> had been >>>>>> pointing >>>>>>> out. >>>>>>>>>>> a couple of thoughts I had so > far: >>>>>>>>>>> >>>>>>>>>>> - add an API (via DBus or libqb - > >> favoring >>> libqb atm) to >>>>>> sbd >>>>>>>>>>> an application can use to > create a >>> watchdog within sbd >>>>>>>>>> Why has it to be done within sbd? >>>>>>>>> Not necessarily, could be spawned out as > well >> into >>> an own project >>>>>> or >>>>>>>>> something already existent could be taken. >>>>>>>>> Remember to have added a dbus-interface to >>>>>>>>> https://sourceforge.net/projects/watchdog/ > for >> a >>> project once. >>>>>>>>> If you have a suggestion I'm open. >>>>>>>>> Going off sbd would have the advantage of > a >> smooth >>> start: >>>>>>>>> >>>>>>>>> - cluster/pacemaker-watcher are there > already >> and >>> can >>>>>>>>> be replaced/moved over time >>>>>>>>> - the lifecycle of the daemon (when >> started/stopped) >>> is >>>>>>>>> already something that is in the code > and in >> the >>> people's >>>>>> minds >>>>>>>>>>> - parameters for the first are a > name >> and a >>> timeout >>>>>>>>>>> >>>>>>>>>>> - first use-case would be crmd >> observation >>>>>>&
Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
ut earlier, maybe > crmd >> should feed a >>>>>> watchdog. Then >>>>>>>> stopping >>>>>>>>>> crmd >>>>>>>>>>>> will reboot the node (unless > the >> watchdog fails). >>>>>>>>>>> Thank you for comment. >>>>>>>>>>> >>>>>>>>>>> We examine watchdog of crmd, too. >>>>>>>>>>> In addition, I comment after >> examination advanced. >>>>>>>>>> Was thinking of doing a small test >> implementation going >>>>>>>>>> a little in the direction Lars > Ellenberg >> had been >>>>> pointing >>>>>> out. >>>>>>>>>> a couple of thoughts I had so far: >>>>>>>>>> >>>>>>>>>> - add an API (via DBus or libqb - > favoring >> libqb atm) to >>>>> sbd >>>>>>>>>> an application can use to create a >> watchdog within sbd >>>>>>>>> Why has it to be done within sbd? >>>>>>>> Not necessarily, could be spawned out as well > into >> an own project >>>>> or >>>>>>>> something already existent could be taken. >>>>>>>> Remember to have added a dbus-interface to >>>>>>>> https://sourceforge.net/projects/watchdog/ for > a >> project once. >>>>>>>> If you have a suggestion I'm open. >>>>>>>> Going off sbd would have the advantage of a > smooth >> start: >>>>>>>> >>>>>>>> - cluster/pacemaker-watcher are there already > and >> can >>>>>>>> be replaced/moved over time >>>>>>>> - the lifecycle of the daemon (when > started/stopped) >> is >>>>>>>> already something that is in the code and in > the >> people's >>>>> minds >>>>>>>>>> - parameters for the first are a name > and a >> timeout >>>>>>>>>> >>>>>>>>>> - first use-case would be crmd > observation >>>>>>>>>> >>>>>>>>>> - later on we could think of removing >> pacemaker >>>>> dependencies >>>>>>>>>> from sbd by moving the actual >> implementation of >>>>>>>>>> pacemaker-watcher and probably >> cluster-watcher as well >>>>>>>>>> into pacemaker - using the new API >>>>>>>>>> >>>>>>>>>> - this of course creates sbd > dependency >> within pacemaker >>>>> so >>>>>>>>>> that it would make sense to offer a >> simpler and >>>>>> self-contained >>>>>>>>>> implementation within pacemaker as > an >> alternative >>>>>>>>> I think the watchdog interface is so > simple >> that you >>>>> don't >>>>>> need a relay >>>>>>>> for it. The only limit I can imagine is the > number >> of watchdogs >>>>>> available of >>>>>>>> some specific hardware. >>>>>>>> That is the point ;-) >>>>>>>>>> thus it would be favorable to have > the >> dependency >>>>>>>>>> within a non-compulsory > pacemaker-rpm so >> that >>>>>>>>>> we can offer an alternative that >> doesn't use sbd >>>>>>>>>> at maybe the cost of being less > reliable >> or one >>>>>>>>>> that owns a hardware-watchdog by > itself >> for systems >>>>>>>>>> where this is still unused. >>>>>>>>>> >>>>>>>>>> - e.g. via some kind of plugin > (Andrew >> forgive me - >>>>>>>>>>
Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
I had so far: - add an API (via DBus or libqb - favoring libqb atm) to sbd an application can use to create a watchdog within sbd Why has it to be done within sbd? Not necessarily, could be spawned out as well into an own project or something already existent could be taken. Remember to have added a dbus-interface to https://sourceforge.net/projects/watchdog/ for a project once. If you have a suggestion I'm open. Going off sbd would have the advantage of a smooth start: - cluster/pacemaker-watcher are there already and can be replaced/moved over time - the lifecycle of the daemon (when started/stopped) is already something that is in the code and in the people's minds - parameters for the first are a name and a timeout - first use-case would be crmd observation - later on we could think of removing pacemaker dependencies from sbd by moving the actual implementation of pacemaker-watcher and probably cluster-watcher as well into pacemaker - using the new API - this of course creates sbd dependency within pacemaker so that it would make sense to offer a simpler and self-contained implementation within pacemaker as an alternative I think the watchdog interface is so simple that you don't need a relay for it. The only limit I can imagine is the number of watchdogs available of some specific hardware. That is the point ;-) thus it would be favorable to have the dependency within a non-compulsory pacemaker-rpm so that we can offer an alternative that doesn't use sbd at maybe the cost of being less reliable or one that owns a hardware-watchdog by itself for systems where this is still unused. - e.g. via some kind of plugin (Andrew forgive me - no pils ;-) ) - or via an additional daemon What did you have in mind? Maybe it makes sense to synchronize... Regards, Klaus Best Regards, Hideo Yamauchi. - Original Message - From: Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de> To: users@clusterlabs.org; renayama19661...@ybb.ne.jp Cc: Date: 2016/10/5, Wed 23:08 Subject: Antw: Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely <renayama19661...@ybb.ne.jp> schrieb am 21.09.2016 um 11:52 in Nachricht <876439.61305...@web200311.mail.ssk.yahoo.co.jp>: Hi All, Was the final conclusion given about this problem? If a user uses sbd, can the cluster evade a problem of SIGSTOP of crmd? As pointed out earlier, maybe crmd should feed a watchdog. Then stopping crmd will reboot the node (unless the watchdog fails). We are interested in this problem, too. Best Regards, Hideo Yamauchi. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users
Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
er time >>>> - the lifecycle of the daemon (when started/stopped) is >>>> already something that is in the code and in the people's > minds >>>> >>>>>> - parameters for the first are a name and a timeout >>>>>> >>>>>> - first use-case would be crmd observation >>>>>> >>>>>> - later on we could think of removing pacemaker > dependencies >>>>>> from sbd by moving the actual implementation of >>>>>> pacemaker-watcher and probably cluster-watcher as well >>>>>> into pacemaker - using the new API >>>>>> >>>>>> - this of course creates sbd dependency within pacemaker > so >>>>>> that it would make sense to offer a simpler and >> self-contained >>>>>> implementation within pacemaker as an alternative >>>>> I think the watchdog interface is so simple that you > don't >> need a relay >>>> for it. The only limit I can imagine is the number of watchdogs >> available of >>>> some specific hardware. >>>> That is the point ;-) >>>>>> thus it would be favorable to have the dependency >>>>>> within a non-compulsory pacemaker-rpm so that >>>>>> we can offer an alternative that doesn't use sbd >>>>>> at maybe the cost of being less reliable or one >>>>>> that owns a hardware-watchdog by itself for systems >>>>>> where this is still unused. >>>>>> >>>>>> - e.g. via some kind of plugin (Andrew forgive me - >>>>>> no > pils ;-) >> ) >>>>>> - or via an additional daemon >>>>>> >>>>>> What did you have in mind? >>>>>> Maybe it makes sense to synchronize... >>>>>> >>>>>> Regards, >>>>>> Klaus >>>>>> >>>>>>> Best Regards, >>>>>>> Hideo Yamauchi. >>>>>>> >>>>>>> >>>>>>> >>>>>>> - Original Message - >>>>>>>> From: Ulrich Windl >> <ulrich.wi...@rz.uni-regensburg.de> >>>>>>>> To: users@clusterlabs.org; > renayama19661...@ybb.ne.jp >>>>>>>> Cc: >>>>>>>> Date: 2016/10/5, Wed 23:08 >>>>>>>> Subject: Antw: Re: [ClusterLabs] Antw: Re: When > the DC >> crmd is >>>> frozen, >>>>>> cluster decisions are delayed infinitely >>>>>>>>>>> <renayama19661...@ybb.ne.jp> >> schrieb am >>>> 21.09.2016 um 11:52 >>>>>>>> in Nachricht >>>>>>>> >> <876439.61305...@web200311.mail.ssk.yahoo.co.jp>: >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> Was the final conclusion given about this >> problem? >>>>>>>>> >>>>>>>>> If a user uses sbd, can the cluster evade a >> problem of >>>> SIGSTOP of crmd? >>>>>>>> As pointed out earlier, maybe crmd should feed a >> watchdog. Then >>>> stopping >>>>>> crmd >>>>>>>> will reboot the node (unless the watchdog fails). >>>>>>>> >>>>>>>>> We are interested in this problem, too. >>>>>>>>> >>>>>>>>> Best Regards, >>>>>>>>> >>>>>>>>> Hideo Yamauchi. >>>>>>>>> >>>>>>>>> >>>>>>>>> > ___ >>>>>>>>> Users mailing list: Users@clusterlabs.org >>>>>>>>> http://clusterlabs.org/mailman/listinfo/users > >>>>>>>>> >>>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>>> Getting started: >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>> ___ >>>>>>> Users mailing list: Users@clusterlabs.org >>>>>>> http://clusterlabs.org/mailman/listinfo/users >>>>>>> >>>>>>> Project Home: http://www.clusterlabs.org >>>>>>> Getting started: >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>> >>>>>> ___ >>>>>> Users mailing list: Users@clusterlabs.org >>>>>> http://clusterlabs.org/mailman/listinfo/users >>>>>> >>>>>> Project Home: http://www.clusterlabs.org >>>>>> Getting started: >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>> Bugs: http://bugs.clusterlabs.org >>>>> >>>> >>>> ___ >>>> Users mailing list: Users@clusterlabs.org >>>> http://clusterlabs.org/mailman/listinfo/users >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>>> >>> ___ >>> Users mailing list: Users@clusterlabs.org >>> http://clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> >> >> ___ >> Users mailing list: Users@clusterlabs.org >> http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
ble of >>> some specific hardware. >>> That is the point ;-) >>>>> thus it would be favorable to have the dependency >>>>> within a non-compulsory pacemaker-rpm so that >>>>> we can offer an alternative that doesn't use sbd >>>>> at maybe the cost of being less reliable or one >>>>> that owns a hardware-watchdog by itself for systems >>>>> where this is still unused. >>>>> >>>>> - e.g. via some kind of plugin (Andrew forgive me - >>>>> no pils ;-) > ) >>>>> - or via an additional daemon >>>>> >>>>> What did you have in mind? >>>>> Maybe it makes sense to synchronize... >>>>> >>>>> Regards, >>>>> Klaus >>>>> >>>>>> Best Regards, >>>>>> Hideo Yamauchi. >>>>>> >>>>>> >>>>>> >>>>>> - Original Message - >>>>>>> From: Ulrich Windl > <ulrich.wi...@rz.uni-regensburg.de> >>>>>>> To: users@clusterlabs.org; renayama19661...@ybb.ne.jp >>>>>>> Cc: >>>>>>> Date: 2016/10/5, Wed 23:08 >>>>>>> Subject: Antw: Re: [ClusterLabs] Antw: Re: When the DC > crmd is >>> frozen, >>>>> cluster decisions are delayed infinitely >>>>>>>>>> <renayama19661...@ybb.ne.jp> > schrieb am >>> 21.09.2016 um 11:52 >>>>>>> in Nachricht >>>>>>> > <876439.61305...@web200311.mail.ssk.yahoo.co.jp>: >>>>>>>> Hi All, >>>>>>>> >>>>>>>> Was the final conclusion given about this > problem? >>>>>>>> >>>>>>>> If a user uses sbd, can the cluster evade a > problem of >>> SIGSTOP of crmd? >>>>>>> As pointed out earlier, maybe crmd should feed a > watchdog. Then >>> stopping >>>>> crmd >>>>>>> will reboot the node (unless the watchdog fails). >>>>>>> >>>>>>>> We are interested in this problem, too. >>>>>>>> >>>>>>>> Best Regards, >>>>>>>> >>>>>>>> Hideo Yamauchi. >>>>>>>> >>>>>>>> >>>>>>>> ___ >>>>>>>> Users mailing list: Users@clusterlabs.org >>>>>>>> http://clusterlabs.org/mailman/listinfo/users >>>>>>>> >>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>> Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>> ___ >>>>>> Users mailing list: Users@clusterlabs.org >>>>>> http://clusterlabs.org/mailman/listinfo/users >>>>>> >>>>>> Project Home: http://www.clusterlabs.org >>>>>> Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>> Bugs: http://bugs.clusterlabs.org >>>>> >>>>> ___ >>>>> Users mailing list: Users@clusterlabs.org >>>>> http://clusterlabs.org/mailman/listinfo/users >>>>> >>>>> Project Home: http://www.clusterlabs.org >>>>> Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>> Bugs: http://bugs.clusterlabs.org >>>> >>> >>> ___ >>> Users mailing list: Users@clusterlabs.org >>> http://clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >> ___ >> Users mailing list: Users@clusterlabs.org >> http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
On 10/07/2016 11:10 PM, renayama19661...@ybb.ne.jp wrote: > Hi All, > > Our user may not necessarily use sdb. > > I confirmed that there was a method using WD service of corosync as one > method not to use sdb. > Pacemaker watches the process of pacemaker by WD service using CMAP and can > carry out watchdog. Have to have a look at that... But if we establish some in-between-layer in pacemaker we could have this as one of the possibilities besides e.g. sbd (with enhanced API), going for a watchdog-device directly, ... > > > We can set up a patch of pacemaker. Always helpful to discuss/clarify an idea once some code is available ... > Was the discussion of using WD service over so far? Not from my pov. Just a day off ;-) > > > Best Regard, > Hideo Yamauchi. > > > - Original Message - >> From: Klaus Wenninger <kwenn...@redhat.com> >> To: Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de>; users@clusterlabs.org >> Cc: >> Date: 2016/10/7, Fri 17:47 >> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is >> frozen, cluster decisions are delayed infinitely >> >> On 10/07/2016 08:14 AM, Ulrich Windl wrote: >>>>>> Klaus Wenninger <kwenn...@redhat.com> schrieb am >> 06.10.2016 um 18:03 in >>> Nachricht <3980cfdd-ebd9-1597-f6bd-a1ca808f7...@redhat.com>: >>>> On 10/05/2016 04:22 PM, renayama19661...@ybb.ne.jp wrote: >>>>> Hi All, >>>>> >>>>>>> If a user uses sbd, can the cluster evade a problem of >> SIGSTOP of crmd? >>>>>> >>>>>> As pointed out earlier, maybe crmd should feed a watchdog. Then >> stopping >>>> crmd >>>>>> will reboot the node (unless the watchdog fails). >>>>> Thank you for comment. >>>>> >>>>> We examine watchdog of crmd, too. >>>>> In addition, I comment after examination advanced. >>>> Was thinking of doing a small test implementation going >>>> a little in the direction Lars Ellenberg had been pointing out. >>>> >>>> a couple of thoughts I had so far: >>>> >>>> - add an API (via DBus or libqb - favoring libqb atm) to sbd >>>>an application can use to create a watchdog within sbd >>> Why has it to be done within sbd? >> Not necessarily, could be spawned out as well into an own project or >> something already existent could be taken. >> Remember to have added a dbus-interface to >> https://sourceforge.net/projects/watchdog/ for a project once. >> If you have a suggestion I'm open. >> Going off sbd would have the advantage of a smooth start: >> >> - cluster/pacemaker-watcher are there already and can >> be replaced/moved over time >> - the lifecycle of the daemon (when started/stopped) is >> already something that is in the code and in the people's minds >> >>>> - parameters for the first are a name and a timeout >>>> >>>> - first use-case would be crmd observation >>>> >>>> - later on we could think of removing pacemaker dependencies >>>>from sbd by moving the actual implementation of >>>>pacemaker-watcher and probably cluster-watcher as well >>>>into pacemaker - using the new API >>>> >>>> - this of course creates sbd dependency within pacemaker so >>>>that it would make sense to offer a simpler and self-contained >>>>implementation within pacemaker as an alternative >>> I think the watchdog interface is so simple that you don't need a relay >> for it. The only limit I can imagine is the number of watchdogs available of >> some specific hardware. >> That is the point ;-) >>>>thus it would be favorable to have the dependency >>>>within a non-compulsory pacemaker-rpm so that >>>>we can offer an alternative that doesn't use sbd >>>>at maybe the cost of being less reliable or one >>>>that owns a hardware-watchdog by itself for systems >>>>where this is still unused. >>>> >>>>- e.g. via some kind of plugin (Andrew forgive me - >>>> no pils ;-) ) >>>>- or via an additional daemon >>>> >>>> What did you have in mind? >>>> Maybe it makes sense to synchronize... >>>> >>>> Regards, >>>> Klaus >>>> >>>>> Best
[ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
>>> Klaus Wenninger <kwenn...@redhat.com> schrieb am 06.10.2016 um 18:03 in Nachricht <3980cfdd-ebd9-1597-f6bd-a1ca808f7...@redhat.com>: > On 10/05/2016 04:22 PM, renayama19661...@ybb.ne.jp wrote: >> Hi All, >> >>>> If a user uses sbd, can the cluster evade a problem of SIGSTOP of crmd? >>> >>> As pointed out earlier, maybe crmd should feed a watchdog. Then stopping > crmd >>> will reboot the node (unless the watchdog fails). >> >> Thank you for comment. >> >> We examine watchdog of crmd, too. >> In addition, I comment after examination advanced. > > Was thinking of doing a small test implementation going > a little in the direction Lars Ellenberg had been pointing out. > > a couple of thoughts I had so far: > > - add an API (via DBus or libqb - favoring libqb atm) to sbd > an application can use to create a watchdog within sbd Why has it to be done within sbd? > > - parameters for the first are a name and a timeout > > - first use-case would be crmd observation > > - later on we could think of removing pacemaker dependencies > from sbd by moving the actual implementation of > pacemaker-watcher and probably cluster-watcher as well > into pacemaker - using the new API > > - this of course creates sbd dependency within pacemaker so > that it would make sense to offer a simpler and self-contained > implementation within pacemaker as an alternative I think the watchdog interface is so simple that you don't need a relay for it. The only limit I can imagine is the number of watchdogs available of some specific hardware. > > thus it would be favorable to have the dependency > within a non-compulsory pacemaker-rpm so that > we can offer an alternative that doesn't use sbd > at maybe the cost of being less reliable or one > that owns a hardware-watchdog by itself for systems > where this is still unused. > > - e.g. via some kind of plugin (Andrew forgive me - >no pils ;-) ) > - or via an additional daemon > > What did you have in mind? > Maybe it makes sense to synchronize... > > Regards, > Klaus > >> >> >> Best Regards, >> Hideo Yamauchi. >> >> >> >> - Original Message - >>> From: Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de> >>> To: users@clusterlabs.org; renayama19661...@ybb.ne.jp >>> Cc: >>> Date: 2016/10/5, Wed 23:08 >>> Subject: Antw: Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, > cluster decisions are delayed infinitely >>> >>>>>> <renayama19661...@ybb.ne.jp> schrieb am 21.09.2016 um 11:52 >>> in Nachricht >>> <876439.61305...@web200311.mail.ssk.yahoo.co.jp>: >>>> Hi All, >>>> >>>> Was the final conclusion given about this problem? >>>> >>>> If a user uses sbd, can the cluster evade a problem of SIGSTOP of crmd? >>> As pointed out earlier, maybe crmd should feed a watchdog. Then stopping > crmd >>> will reboot the node (unless the watchdog fails). >>> >>>> We are interested in this problem, too. >>>> >>>> Best Regards, >>>> >>>> Hideo Yamauchi. >>>> >>>> >>>> ___ >>>> Users mailing list: Users@clusterlabs.org >>>> http://clusterlabs.org/mailman/listinfo/users >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >> ___ >> Users mailing list: Users@clusterlabs.org >> http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org