> Hi All, > > > > I'm running Pacemaker on Centos7 > > Name : pcs > > Version : 0.9.169 > > Release : 3.el7.centos.3 > > Architecture: x86_64 > > > > > Besides the pcs-version versions of the other cluster-stack-components > could be interesting. (pacemaker, corosync) > rpm -qa | egrep "pacemaker|pcs|corosync|fence-agents" fence-agents-vmware-rest-4.2.1-41.el7_9.6.x86_64 corosynclib-2.4.5-7.el7_9.2.x86_64 pacemaker-cluster-libs-1.1.23-1.el7_9.1.x86_64 fence-agents-common-4.2.1-41.el7_9.6.x86_64 corosync-2.4.5-7.el7_9.2.x86_64 pacemaker-cli-1.1.23-1.el7_9.1.x86_64 pacemaker-1.1.23-1.el7_9.1.x86_64 pcs-0.9.169-3.el7.centos.3.x86_64 pacemaker-libs-1.1.23-1.el7_9.1.x86_64
> > > > I'm performing some cluster failover tests in a 3 node cluster. We have 3 > > resources in the cluster. > > I was trying to see if I could get it working if 2 nodes fail at > different > > times. I'd like the 3 resources to then run on one node. > > > > The quorum options I've configured are as follows > > [root@node1 ~]# pcs quorum config > > Options: > > auto_tie_breaker: 1 > > last_man_standing: 1 > > last_man_standing_window: 10000 > > wait_for_all: 1 > > > > > Not sure if the combination of auto_tie_breaker and last_man_standing makes > sense. > And as you have a cluster with an odd number of nodes auto_tie_breaker > should be > disabled anyway I guess. > Ah ok I'll try removing auto_tie_breaker and leave last_man_standing > > > > [root@node1 ~]# pcs quorum status > > Quorum information > > ------------------ > > Date: Wed Aug 30 11:20:04 2023 > > Quorum provider: corosync_votequorum > > Nodes: 3 > > Node ID: 1 > > Ring ID: 1/1538 > > Quorate: Yes > > > > Votequorum information > > ---------------------- > > Expected votes: 3 > > Highest expected: 3 > > Total votes: 3 > > Quorum: 2 > > Flags: Quorate WaitForAll LastManStanding AutoTieBreaker > > > > Membership information > > ---------------------- > > Nodeid Votes Qdevice Name > > 1 1 NR node1 (local) > > 2 1 NR node2 > > 3 1 NR node3 > > > > If I stop the cluster services on node 2 and 3, the groups all failover > to > > node 1 since it is the node with the lowest ID > > But if I stop them on node1 and node 2 or node1 and node3, the cluster > > fails. > > > > I tried adding this line to corosync.conf and I could then bring down the > > services on node 1 and 2 or node 2 and 3 but if I left node 2 until last, > > the cluster failed > > auto_tie_breaker_node: 1 3 > > > > This line had the same outcome as using 1 3 > > auto_tie_breaker_node: 1 2 3 > > > > > Giving multiple auto_tie_breaker-nodes doesn't make sense to me but rather > sounds dangerous if that configuration is possible at all. > > Maybe the misbehavior of last_man_standing is due to this (maybe not > recognized) misconfiguration. > Did you wait long enough between letting the 2 nodes fail? > I've done it so many times so I believe so. But I'll try remove the auto_tie_breaker config, leaving the last_man_standing. I'll also make sure I leave a couple of minutes between bringing down the nodes and post back. > > Klaus > > > > So I'd like it to failover when any combination of two nodes fail but > I've > > only had success when the middle node isn't last. > > > > Thanks > > David > > > >
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/