>>> Ken Gaillot <kgail...@redhat.com> schrieb am 14.04.2021 um 18:35 in Nachricht <00635dba0dfc70430d4fd7820677b47d242d65d2.ca...@redhat.com>:
[...] >> >> Startup fencing is pacemaker default (startup‑fencing cluster >> option). > > Start‑up fencing will have the desired effect in >2 node cluster, but > in 2‑node cluster the corosync wait_for_all option is key. This is another good example where pacemaker is (maybe for historic reasons) more complicated than necessary (IMHO): Why not have a single "cluster-formation-timeout" that waits for nodes to join when initially forming a cluster (i.e. the node starting has no quorum (yet))? So if that timeout expired and there is no quorum (subject of other configuration parameters), the node will commit suicide (self-fencing, preferably to "off" instead of "reboot"). Of course any two-node cluster would need some tie-breaker (like grabbing some exclusive lock on a shared storage). > > If wait_for_all is true (which is the default when two_node is set), > then a node that comes up alone will wait until it sees the other node > at least once before becoming quorate. This prevents an isolated node > from coming up and fencing a node that's happily running. > > Setting wait_for_all to false will make an isolated node immediately > become quorate. It will do what you want, which is fence the other node > and take over resources, but the danger is that this node is the one > that's having trouble (e.g. can't see the other node due to a network > card issue). The healthy node could fence the unhealthy node, which > might then reboot and come up and shoot the healthy node. > > There's no direct equivalent of a delay before becoming quorate, but I > don't think that helps ‑‑ the boot time acts as a sort of random delay, > and a delay doesn't help the issue of an unhealthy node shooting a > healthy one. > > My recommendation would be to set wait_for_all to true as long as both > nodes are known to be healthy. Once an unhealthy node is down and > expected to stay down, set wait_for_all to false on the healthy node so > it can reboot and bring the cluster up. (The unhealthy node will still > have wait_for_all=true, so it won't cause any trouble even if it comes > up.) > [...] Regards, Ulrich _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/