On 8/16/20 11:40 AM, Andrei Borzenkov wrote: > 16.08.2020 04:25, Reid Wahl пишет: >> >>> - considering that I have both nodes with stonith against the other node, >>> once the two nodes can communicate, how can I be sure the two nodes will >>> not try to stonith each other? >>> >> The simplest option is to add a delay attribute (e.g., delay=10) to one of >> the stonith devices. That way, if both nodes want to fence each other, the >> node whose stonith device has a delay configured will wait for the delay to >> expire before executing the reboot action. If your fence-agent supports a delay attribute you can of course use that. As this isn't available with every fence-agent or is looking differently depending on the fence-agent we've introduced pcmk_delay_max & pcmk_delay_base. These are applied prior to actually calling the fence-agent and thus are always available and always look the same. The delay is gonna be some random time between pcmk_delay_base and pcmk_delay_max. This takes us to another approach how you can reduce chances of a fatal fence-race. Assuming that the reason why the fence-race is triggered is detected around the same time when just adding a random time you will very likely prevent them killing each other. This is especially interesting when there is no clear / easy way to determine which of the nodes is more important at this time. >> > Current pacemaker (2.0.4) also supports priority-fencing-delay option > that computes delay based on which resources are active on specific > node, so favoring node with "more important" resources. > >> Alternatively, you can set up corosync-qdevice, using a separate system >> running qnetd server as a quorum arbitrator. >> > Any solution that is based on node suicide is prone to complete cluster > loss. In particular, in two node cluster with qdevice surviving node > will commit suicide is qnetd is not accessible. I don't think that what Reid suggested was going for nodes that loose quorum to commit suicide right away. You can use quorum simply as a means of preventing fence-races otherwise inherent to 2-node-clusters. > > As long as external stonith is reasonably reliable it is much preferred > to any solution based on quorum (unless you have very specific > requirements and can tolerate running remaining nodes in "frozen" mode > to limit unavailability). Well we can name the predominant scenario why one might not want to depend on fencing-devices like ipmi: If you want to cover a scenario where the nodes don't just loose corosync connectivity but as well access from one node to the fencing device of the other is interrupted you probably won't get around an approach that involves some kind of arbitrator. > > And before someone jumps in - SBD falls into "solution based on suicide" > as well. Got your point without that hint ;-)
Klaus > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
