Hi! I'm wondering about this: vmhost1-fsl.bcn is shutting down
That doesn't read like a STONITH, but like a regular shutdown (which may hang). The other thing that reads strange for a two-node cluster is this: [11130] vmhost0-fsl.jsc.nasa.gov corosyncnotice [TOTEM ] A new membership (192.168.1.140:184) was formed. Members left: 2 This sounds odd, too: [11130] vmhost0-fsl.jsc.nasa.gov corosyncwarning [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly. Regards, Ulrich >>> "Dickerson, Charles Chuck (JSC-EG)[Jacobs Technology, Inc.]" <charles.e.dicker...@nasa.gov> schrieb am 11.05.2018 um 19:02 in Nachricht <0c5150d42e2b3f43b83ec3f62b3b8ee421d5f...@ndjsmbx201.ndc.nasa.gov>: > I have attached the /var/log/cluster/corosync.log here. > > The fenced node continues to be rebooted even after the stonith timeout. > The only way I have of stopping the reboot cycle is to completely stop the > cluster on the remaining node. > > Stonith should be able to detect that the fenced node was successfully > rebooted and stop trying to fence it. I have done this using both the cycle > method and the onoff method, both methods have the same result. > > Chuck Dickerson > Jacobs > JSC - EG3 > (281) 244-5895 > > -----Original Message----- > From: Users [mailto:users-boun...@clusterlabs.org] On Behalf Of Ulrich Windl > Sent: Friday, May 11, 2018 8:47 AM > To: users@clusterlabs.org > Subject: [ClusterLabs] Antw: stonith continues to reboot server once fencing > occurs > > Hi! > > Could it be that the node reboots faster than the stonith timeout? So the > node will unexpectedly come up... > > Without logs it's hard to say. > > Regards, > Ulrich > >>>> "Dickerson, Charles Chuck (JSC-EG)[Jacobs Technology, Inc.]" > <charles.e.dicker...@nasa.gov> schrieb am 11.05.2018 um 15:32 in Nachricht > <0c5150d42e2b3f43b83ec3f62b3b8ee421d5f...@ndjsmbx201.ndc.nasa.gov>: >> I have a 2 node cluster, once fencing occurs, the fenced node is >> continually > >> rebooted every time it comes up. >> >> Configuration: 2 identical nodes ‑ Centos 7.4, pacemaker 1.1.18, pcs >> 0.9.162, fencing configured using fence_ipmilan The cluster is set to >> ignore quorum and stonith is enabled. Firewalld has been disabled. >> >> I can manually issue the fence_ipmilan command and the specified node >> is rebooted, comes back up and fence_ipmilan sees this and reports success. >> >> If fencing is initiated via the "pcs stonith fence" command, >> stonith_admin command, or by disrupting the communication between the >> nodes, the proper node is rebooted, but the stonith_admin command >> times out and never sees the > >> node as rebooted. The node is then rebooted every time it comes back >> up on > >> the network. The status remains UNCLEAN in pcs status. >> >> Chuck Dickerson >> Jacobs >> JSC ‑ EG3 >> (281) 244‑5895 > > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org