On 09/08/2016 11:51 AM, Shermal Fernando wrote: > Hi Jehan-Guillaume, > > Sorry for disturbing you. This is really important for us to pass this test > on the pacemaker resiliency and robustness. > To my understanding, it's the pacemakerd who feeds the watchdog. If only the > crmd is hung, fencing will not work. Am I correct here?
sbd is observing pacemaker (basically by interfering with corosync and reading the cib - both obviously not affected by your test-scenario) and is feeding the watchdog if everything seems ok. > > Regards, > Shermal Fernando > > > > > > > > -----Original Message----- > From: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com] > Sent: Thursday, September 08, 2016 3:12 PM > To: Shermal Fernando > Cc: Cluster Labs - All topics related to open-source clustering welcomed > Subject: Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster > decisions are delayed infinitely > > On Thu, 8 Sep 2016 08:58:15 +0000 > Shermal Fernando <sherma...@millenniumit.com> wrote: > >> Hi Jehan-Guillaume, >> >> Does this means watchdog will serf-terminate the machine when the crm >> daemon is frozen? > This means that if the machine is under such a load that PAcemaker is not > able to feed the watchdog, the watchdog will fence the machine itself. > >> -----Original Message----- >> From: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com] >> Sent: Thursday, September 08, 2016 12:52 PM >> To: Digimer >> Cc: Cluster Labs - All topics related to open-source clustering >> welcomed >> Subject: Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, >> cluster decisions are delayed infinitely >> >> On Thu, 8 Sep 2016 15:55:50 +0900 >> Digimer <li...@alteeve.ca> wrote: >> >>> On 08/09/16 03:47 PM, Ulrich Windl wrote: >>>>>>> Shermal Fernando <sherma...@millenniumit.com> schrieb am >>>>>>> 08.09.2016 um >>>>>>> 06:41 in >>>> Nachricht >>>> <8ce6e8d87f896546b9c65ed80d30a4336578c...@lg-spmb-mbx02.lseg.stockex.local>: >>>>> The whole cluster will fail if the DC (crm daemon) is frozen due >>>>> to CPU starvation or hanging while trying to perform a IO operation. >>>>> Please share some thoughts on this issue. >>>> What is "the whole cluster will fail"? If the DC times out, some >>>> recovery will take place. >>> Yup. The starved node should be declared lost by corosync, the >>> remaining nodes reform and if they're still quorate, the hung node >>> should be fenced. Recovery occur and life goes on. >> +1 >> >> And fencing might either come from outside, or just from the server >> itself using watchdog. > > This e-mail transmission (inclusive of any attachments) is strictly > confidential and intended solely for the ordinary user of the e-mail address > to which it was addressed. It may contain legally privileged and/or > CONFIDENTIAL information. The unauthorized use, disclosure, distribution > printing and/or copying of this e-mail or any information it contains is > prohibited and could, in certain circumstances, constitute an offence. If you > have received this e-mail in error or are not an intended recipient please > inform the sender of the email and MillenniumIT immediately by return e-mail > or telephone (+94-11) 2416000. We advise that in keeping with good computing > practice, the recipient of this e-mail should ensure that it is virus free. > We do not accept responsibility for any virus that may be transferred by way > of this e-mail. E-mail may be susceptible to data corruption, interception > and unauthorized amendment, and we do not accept liability for any such > corruption, interceptio! > n or amen > dment or any consequences thereof. www.millenniumit.com > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org