>>> Klaus Wenninger <[email protected]> schrieb am 08.09.2016 um 09:13 in Nachricht <[email protected]>: > On 09/08/2016 08:55 AM, Digimer wrote: >> On 08/09/16 03:47 PM, Ulrich Windl wrote: >>>>>> Shermal Fernando <[email protected]> schrieb am 08.09.2016 um >>>>>> 06:41 > in >>> Nachricht >>> <8ce6e8d87f896546b9c65ed80d30a4336578c...@lg-spmb-mbx02.lseg.stockex.local>: >>>> The whole cluster will fail if the DC (crm daemon) is frozen due to CPU >>>> starvation or hanging while trying to perform a IO operation. >>>> Please share some thoughts on this issue. >>> What is "the whole cluster will fail"? If the DC times out, some recovery > will take place. >> Yup. The starved node should be declared lost by corosync, the remaining >> nodes reform and if they're still quorate, the hung node should be >> fenced. Recovery occur and life goes on. > Didn't happen in my test (SIGSTOP to crmd). > Might be a configuration mistake though... > Even had sbd with a watchdog active (amongst > other - real - fencing devices). > Thinking if it might make sense so tickle the > crmd-API from sbd-pacemaker-watcher ...
OK, so we mix "DC" and crmd. crmd is just a part of the DC. I guess if corosync is up and happy, but crmd is silent, the cluster just thinks that the DC has nothing to say. But I still wonder what will happen if crmd is goinf to send some reply to a command. >> >> Unless you don't have fencing, then may $deity of mercy. ;) >> > > > _______________________________________________ > Users mailing list: [email protected] > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
