Hi Jehan-Guillaume, Sorry for disturbing you. This is really important for us to pass this test on the pacemaker resiliency and robustness. To my understanding, it's the pacemakerd who feeds the watchdog. If only the crmd is hung, fencing will not work. Am I correct here?
Regards, Shermal Fernando -----Original Message----- From: Jehan-Guillaume de Rorthais [mailto:[email protected]] Sent: Thursday, September 08, 2016 3:12 PM To: Shermal Fernando Cc: Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely On Thu, 8 Sep 2016 08:58:15 +0000 Shermal Fernando <[email protected]> wrote: > Hi Jehan-Guillaume, > > Does this means watchdog will serf-terminate the machine when the crm > daemon is frozen? This means that if the machine is under such a load that PAcemaker is not able to feed the watchdog, the watchdog will fence the machine itself. > -----Original Message----- > From: Jehan-Guillaume de Rorthais [mailto:[email protected]] > Sent: Thursday, September 08, 2016 12:52 PM > To: Digimer > Cc: Cluster Labs - All topics related to open-source clustering > welcomed > Subject: Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, > cluster decisions are delayed infinitely > > On Thu, 8 Sep 2016 15:55:50 +0900 > Digimer <[email protected]> wrote: > > > On 08/09/16 03:47 PM, Ulrich Windl wrote: > > >>>> Shermal Fernando <[email protected]> schrieb am > > >>>> 08.09.2016 um > > >>>> 06:41 in > > > Nachricht > > > <8ce6e8d87f896546b9c65ed80d30a4336578c...@lg-spmb-mbx02.lseg.stockex.local>: > > >> The whole cluster will fail if the DC (crm daemon) is frozen due > > >> to CPU starvation or hanging while trying to perform a IO operation. > > >> Please share some thoughts on this issue. > > > > > > What is "the whole cluster will fail"? If the DC times out, some > > > recovery will take place. > > > > Yup. The starved node should be declared lost by corosync, the > > remaining nodes reform and if they're still quorate, the hung node > > should be fenced. Recovery occur and life goes on. > > +1 > > And fencing might either come from outside, or just from the server > itself using watchdog. This e-mail transmission (inclusive of any attachments) is strictly confidential and intended solely for the ordinary user of the e-mail address to which it was addressed. It may contain legally privileged and/or CONFIDENTIAL information. The unauthorized use, disclosure, distribution printing and/or copying of this e-mail or any information it contains is prohibited and could, in certain circumstances, constitute an offence. If you have received this e-mail in error or are not an intended recipient please inform the sender of the email and MillenniumIT immediately by return e-mail or telephone (+94-11) 2416000. We advise that in keeping with good computing practice, the recipient of this e-mail should ensure that it is virus free. We do not accept responsibility for any virus that may be transferred by way of this e-mail. E-mail may be susceptible to data corruption, interception and unauthorized amendment, and we do not accept liability for any such corruption, interception or amen dment or any consequences thereof. www.millenniumit.com _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
