On Mon, Sep 4, 2023 at 1:45 PM David Dolan <daithido...@gmail.com> wrote: > > Hi Klaus, > > With default quorum options I've performed the following on my 3 node cluster > > Bring down cluster services on one node - the running services migrate to > another node > Wait 3 minutes > Bring down cluster services on one of the two remaining nodes - the surviving > node in the cluster is then fenced >
Is it fenced or is it reset? It is not the same. The default for no-quorum-policy is "stop". So you either have "no-quorum-policy" set to "suicide", or node is reset by something outside of pacemaker. This "something" may initiate fencing too. > Instead of the surviving node being fenced, I hoped that the services would > migrate and run on that remaining node. > > Just looking for confirmation that my understanding is ok and if I'm missing > something? > > Thanks > David > > > > On Thu, 31 Aug 2023 at 11:59, David Dolan <daithido...@gmail.com> wrote: >> >> I just tried removing all the quorum options setting back to defaults so no >> last_man_standing or wait_for_all. >> I still see the same behaviour where the third node is fenced if I bring >> down services on two nodes. >> Thanks >> David >> >> On Thu, 31 Aug 2023 at 11:44, Klaus Wenninger <kwenn...@redhat.com> wrote: >>> >>> >>> >>> On Thu, Aug 31, 2023 at 12:28 PM David Dolan <daithido...@gmail.com> wrote: >>>> >>>> >>>> >>>> On Wed, 30 Aug 2023 at 17:35, David Dolan <daithido...@gmail.com> wrote: >>>>> >>>>> >>>>> >>>>>> > Hi All, >>>>>> > >>>>>> > I'm running Pacemaker on Centos7 >>>>>> > Name : pcs >>>>>> > Version : 0.9.169 >>>>>> > Release : 3.el7.centos.3 >>>>>> > Architecture: x86_64 >>>>>> > >>>>>> > >>>>>> Besides the pcs-version versions of the other cluster-stack-components >>>>>> could be interesting. (pacemaker, corosync) >>>>> >>>>> rpm -qa | egrep "pacemaker|pcs|corosync|fence-agents" >>>>> fence-agents-vmware-rest-4.2.1-41.el7_9.6.x86_64 >>>>> corosynclib-2.4.5-7.el7_9.2.x86_64 >>>>> pacemaker-cluster-libs-1.1.23-1.el7_9.1.x86_64 >>>>> fence-agents-common-4.2.1-41.el7_9.6.x86_64 >>>>> corosync-2.4.5-7.el7_9.2.x86_64 >>>>> pacemaker-cli-1.1.23-1.el7_9.1.x86_64 >>>>> pacemaker-1.1.23-1.el7_9.1.x86_64 >>>>> pcs-0.9.169-3.el7.centos.3.x86_64 >>>>> pacemaker-libs-1.1.23-1.el7_9.1.x86_64 >>>>>> >>>>>> >>>>>> >>>>>> > I'm performing some cluster failover tests in a 3 node cluster. We >>>>>> > have 3 >>>>>> > resources in the cluster. >>>>>> > I was trying to see if I could get it working if 2 nodes fail at >>>>>> > different >>>>>> > times. I'd like the 3 resources to then run on one node. >>>>>> > >>>>>> > The quorum options I've configured are as follows >>>>>> > [root@node1 ~]# pcs quorum config >>>>>> > Options: >>>>>> > auto_tie_breaker: 1 >>>>>> > last_man_standing: 1 >>>>>> > last_man_standing_window: 10000 >>>>>> > wait_for_all: 1 >>>>>> > >>>>>> > >>>>>> Not sure if the combination of auto_tie_breaker and last_man_standing >>>>>> makes >>>>>> sense. >>>>>> And as you have a cluster with an odd number of nodes auto_tie_breaker >>>>>> should be >>>>>> disabled anyway I guess. >>>>> >>>>> Ah ok I'll try removing auto_tie_breaker and leave last_man_standing >>>>>> >>>>>> >>>>>> >>>>>> > [root@node1 ~]# pcs quorum status >>>>>> > Quorum information >>>>>> > ------------------ >>>>>> > Date: Wed Aug 30 11:20:04 2023 >>>>>> > Quorum provider: corosync_votequorum >>>>>> > Nodes: 3 >>>>>> > Node ID: 1 >>>>>> > Ring ID: 1/1538 >>>>>> > Quorate: Yes >>>>>> > >>>>>> > Votequorum information >>>>>> > ---------------------- >>>>>> > Expected votes: 3 >>>>>> > Highest expected: 3 >>>>>> > Total votes: 3 >>>>>> > Quorum: 2 >>>>>> > Flags: Quorate WaitForAll LastManStanding AutoTieBreaker >>>>>> > >>>>>> > Membership information >>>>>> > ---------------------- >>>>>> > Nodeid Votes Qdevice Name >>>>>> > 1 1 NR node1 (local) >>>>>> > 2 1 NR node2 >>>>>> > 3 1 NR node3 >>>>>> > >>>>>> > If I stop the cluster services on node 2 and 3, the groups all >>>>>> > failover to >>>>>> > node 1 since it is the node with the lowest ID >>>>>> > But if I stop them on node1 and node 2 or node1 and node3, the cluster >>>>>> > fails. >>>>>> > >>>>>> > I tried adding this line to corosync.conf and I could then bring down >>>>>> > the >>>>>> > services on node 1 and 2 or node 2 and 3 but if I left node 2 until >>>>>> > last, >>>>>> > the cluster failed >>>>>> > auto_tie_breaker_node: 1 3 >>>>>> > >>>>>> > This line had the same outcome as using 1 3 >>>>>> > auto_tie_breaker_node: 1 2 3 >>>>>> > >>>>>> > >>>>>> Giving multiple auto_tie_breaker-nodes doesn't make sense to me but >>>>>> rather >>>>>> sounds dangerous if that configuration is possible at all. >>>>>> >>>>>> Maybe the misbehavior of last_man_standing is due to this (maybe not >>>>>> recognized) misconfiguration. >>>>>> Did you wait long enough between letting the 2 nodes fail? >>>>> >>>>> I've done it so many times so I believe so. But I'll try remove the >>>>> auto_tie_breaker config, leaving the last_man_standing. I'll also make >>>>> sure I leave a couple of minutes between bringing down the nodes and post >>>>> back. >>>> >>>> Just confirming I removed the auto_tie_breaker config and tested. Quorum >>>> configuration is as follows: >>>> Options: >>>> last_man_standing: 1 >>>> last_man_standing_window: 10000 >>>> wait_for_all: 1 >>>> >>>> I waited 2-3 minutes between stopping cluster services on two nodes via >>>> pcs cluster stop >>>> The remaining cluster node is then fenced. I was hoping the remaining node >>>> would stay online running the resources. >>> >>> >>> Yep - that would've been my understanding as well. >>> But honestly I've never used last_man_standing in this context - wasn't >>> even aware that it was >>> offered without qdevice nor have I checked how it is implemented. >>> >>> Klaus >>>> >>>> >>>>>> >>>>>> Klaus >>>>>> >>>>>> >>>>>> > So I'd like it to failover when any combination of two nodes fail but >>>>>> > I've >>>>>> > only had success when the middle node isn't last. >>>>>> > >>>>>> > Thanks >>>>>> > David >>>>>> >>>>>> >>>>>> > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/