Sorry for getting back to you so late. On Fri, 25 May 2018 11:58:59 -0600 Casey & Gina <caseyandg...@icloud.com> wrote:
> > On May 25, 2018, at 7:01 AM, Casey Allen Shobe <caseyandg...@icloud.com> > > wrote: > >> Actually, why is Pacemaker fencing the standby node just because a > >> resource fails to start there? I thought only the master should be fenced > >> if it were assumed to be broken. > > This is probably the most important thing to ask outside of the PAF resource > agent which many may not be as fluent with as pacemaker itself, and perhaps > the most indicative of me setting something up incorrectly outside of that > resource agent. > > My understanding of fencing was that pacemaker would only fence a node if it > was the master but had stopped responding, to avoid a split-brain situation. > Why would pacemaker ever fence a standby node with no resources currently > allocated to it? So, as discussed on IRC and for the mailing list history, here is the answer: https://clusterlabs.github.io/PAF/administration.html#failover In short: after a failure (either on a primary or a standby), you MUST fix things on the node before starting Pacemaker. If you don't, PAF will detect something incoherent and raise an error, leading Pacemaker to most likely fence your node, again. As instance, after a primary crash, you will have to resync it as a standby with the new master before starting Pacemaker on the node and giving PAF the relay. It is actually really important if you don't want to end up with a silently corrupted standby in your cluster. Cheers, _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org