On 27.02.2021 09:05, Eric Robinson wrote: >> -----Original Message----- >> From: Users <[email protected]> On Behalf Of Andrei >> Borzenkov >> Sent: Friday, February 26, 2021 1:25 PM >> To: [email protected] >> Subject: Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went >> Down Anyway? >> >> On 26.02.2021 21:58, Eric Robinson wrote: >>>> -----Original Message----- >>>> From: Users <[email protected]> On Behalf Of Andrei >>>> Borzenkov >>>> Sent: Friday, February 26, 2021 11:27 AM >>>> To: [email protected] >>>> Subject: Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice >>>> Went Down Anyway? >>>> >>>> 26.02.2021 19:19, Eric Robinson пишет: >>>>> At 5:16 am Pacific time Monday, one of our cluster nodes failed and >>>>> its >>>> mysql services went down. The cluster did not automatically recover. >>>>> >>>>> We're trying to figure out: >>>>> >>>>> >>>>> 1. Why did it fail? >>>> >>>> Pacemaker only registered loss of connection between two nodes. You >>>> need to investigate why it happened. >>>> >>>>> 2. Why did it not automatically recover? >>>>> >>>>> The cluster did not recover until we manually executed... >>>>> >>>> >>>> *Cluster* never failed in the first place. Specific resource may. Do >>>> not confuse things more than is necessary. >>>> >>>>> # pcs resource cleanup p_mysql_622 >>>>> >>>> >>>> Because this resource failed to stop and this is fatal. >>>> >>>>> Feb 22 05:16:30 [91682] 001db01a pengine: notice: LogAction: * >> Stop >>>> p_mysql_622 ( 001db01a ) due to no quorum >>>> >>>> Remaining node lost quorum and decided to stop resources >>>> >>> >>> I consider this a cluster failure, exacerbated by a resource failure. We can >> investigate why resource p_mysql_622 failed to stop, but it seems the >> underlying problem is the loss of quorum. >> >> This problem is outside of pacemaker scope. You are shooting the messenger >> here. >> > > I appreciate your patience here. Here is my confusion. We have three > devices--two database servers and a qdevice. Unless two devices lost > connection with the network at the same time, the cluster should not have > lost quorum.
No, you misunderstand how qdevice works. qdevice is not passive witness - when cluster is split in multiple partitions, qdevice decides which partition should remain active and provides votes to this partition so it remains quorate. All other partitions will go out of quorum. So even if only connection between two nodes was lost, but both nodes retained connection to qnetd server, one node is expected to go out of quorum. > If node 001db01a lost all connectivity (and therefore quorum), then I > understand that the default Pacemaker action would be to stop its services. > However, that does not explain why node 001db01b did not take over and start > the services, as it would still have had quorum. You really need to show your corosync and pacemaker configuration. > >>> That should not have happened with the qdevice in the mix, should it? >>> >> >> Huh? It is up to you to provide infrastructure where qdevice connection >> never fails. Again this is outside of pacemaker scope. >> > > Does something in the logs indicate that BOTH database nodes lost quorum? Not that I can see; second node apparently remained in quorum. > Are you suggesting that Azure's network went down and all the devices lost > communication with each other, and that's why quorum was lost? Communication between two pacemaker nodes was definitely lost, at least from pacemaker point of view. Communication to qnetd server may have been lost, but it does not change the end result - one node was expected to go out of quorum. > >>> I'm confused about what is supposed to happen here. If the root cause is >> that node 001db01a briefly lost all communication with the network (just >> guessing), then it should have taken no action, including STONITH, since >> there would be no quorum. >> >> Read pacemaker documentation. Default action when node goes out of >> quorum is to stop all resources. >> >>> (There is no physical STONITH device anyway, as both nodes are in Azure.) >> Meanwhile, node 001db01b would still have had quorum (itself plus the >> qdevice), and should have assumed ownership of the resources and started >> them, or no? >> >> I commented on this in another mail. pacemaker documentation does not >> really describe what happens, and blindly restarting all resources locally >> would easily lead to data corruption. >> >> Having STONITH would solve your problem. 001db01b would have fenced >> 001db01a and restarted all resources. >> >> Without STONITH it is not possible in general to handle split brain and >> resource stop failures. You do not know what is left active and what not so >> it >> is not safe to attempt to restart resources elsewhere. > > The nodes are using DRBD. Since that has its own split-brain detection, I > don't think there is a concern about data corruption as there would be with > shared storage. To my best knowledge to *resolve* DRBD split brain you need fencing. But I do not have first hand experience with DRBD, so cannot comment here. > In a scenario where node 001db01a loses connectivity, 001db01b still has > quorum because of the vote from the qdevice. It should promote DRBD and start > the mysql services. If 001db01a subsequently comes back online, then both > DRBD devices go into standalone and the services go back down, but there's no > corruption. You then do a manual split-brain recovery (discard data on > 001db01a) and you're back up. > > I don't see how STONITH makes things stable in this scenario. If all the > nodes lose quorum, would they take STONITH action? If so, which node would > in? I'm worried about enabling STONITH because unless we understand why the > nodes lost quorum, don't we run the risk of random unwanted STONITH events? Quorum is not replacement for fencing. Actually HA cluster does not need quorum at all - all that it needs is fencing. All of two node heartbeat/pacemaker clusters I have been using for the past decade had no-quorum-policy=ignore and corosync two_node option also does exactly that - it *fakes* quorum just to please default pacemaker no-quorum-policy value. Quorum provides one possibility to chose which node(s) should be left running. It still does not mean it is safe to take over resources from remaining nodes. Even without shared storage, consider trivial case of duplicated IP address. What quorum makes possible is self-fencing. Nodes that go out of quorum commit suicide and *that* enables quorate partition to assume "clean state" and start takeover (and *NOT* the fact that remaining partition is quorate). Most commercial HA managers I am aware of work with self-fencing and do not even offer possibility to use anything else. This is probably what made quorum idea so deep ingrained in people brains - because every documentation you read goes about need to have quorum without actually explaining *why* you need to have quorum. In case of pacemaker quorate partition will initiate fencing of other nodes and only after fencing has been successful will continue with taking over their resources. Pacemaker also supports self-fencing via SBD watchdog if no external fencing mechanism is possible. _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
