Hi Strahil, corosync-cfgtool -s Printing ring status. Local node ID 1 RING ID 0 id = 172.17.150.20 status = ring 0 active with no faults RING ID 1 id = 10.0.6.20 status = ring 1 active with no faults
corosync-quorumtool -s Quorum information ------------------ Date: Fri Feb 14 14:41:11 2020 Quorum provider: corosync_votequorum Nodes: 2 Node ID: 1 Ring ID: 1/96 Quorate: Yes Votequorum information ---------------------- Expected votes: 2 Highest expected: 2 Total votes: 2 Quorum: 1 Flags: 2Node Quorate WaitForAll Membership information ---------------------- Nodeid Votes Name 1 1 node1.primary.network (local) 2 1 node2.primary.network On the surviving node, the 10.0.6.21 interface flipflopped (though nothing detected on the other node), and that is what started it all off. We have no firewall running. Cheers, Alastair. On Fri, 14 Feb 2020, Strahil Nikolov wrote: > On February 14, 2020 12:41:58 PM GMT+02:00, "BASDEN, ALASTAIR G." > <a.g.bas...@durham.ac.uk> wrote: >> Hi, >> I wonder whether anyone could give me some advice about a stonith >> configuration. >> >> We have 2 nodes, which form a HA cluster. >> >> These have 3 networks: >> A generic network over which they are accessed (eg ssh) >> (node1.primary.network, node2.primary.network) >> A directly connected cable between them (10.0.6.20, 10.0.6.21). >> A management network, on which ipmi is (172.16.150.20, 172.16.150.21) >> >> We have done: >> pcs cluster setup --name hacluster node1.primary.network,10.0.6.20 >> node2.primary.network,10.0.6.21 --token 20000 >> pcs cluster start --all >> pcs property set no-quorum-policy=ignore >> pcs property set stonith-enabled=true >> pcs property set symmetric-cluster=true >> pcs stonith create node1_ipmi fence_ipmilan ipaddr="172.16.150.20" >> lanplus=true login="root" passwd="password" >> pcmk_host_list="node1.primary.network" power_wait=10 >> pcs stonith create node2_ipmi fence_ipmilan ipaddr="172.16.150.21" >> lanplus=true login="root" passwd="password" >> pcmk_host_list="node2.primary.network" power_wait=10 >> >> /etc/corosync/corosync.conf has: >> totem { >> version: 2 >> cluster_name: hacluster >> secauth: off >> transport: udpu >> rrp_mode: passive >> token: 20000 >> } >> >> nodelist { >> node { >> ring0_addr: node1.primary.network >> ring1_addr: 10.0.6.20 >> nodeid: 1 >> } >> >> node { >> ring0_addr: node2.primary.network >> ring1_addr: 10.0.6.21 >> nodeid: 2 >> } >> } >> >> quorum { >> provider: corosync_votequorum >> two_node: 1 >> } >> >> logging { >> to_logfile: yes >> logfile: /var/log/cluster/corosync.log >> to_syslog: no >> } >> >> >> What I find is that if there is a problem with the directly connected >> cable, the nodes stonith each other, even though the generic network is >> >> fine. >> >> What I would expect is that they would only shoot each other when both >> networks are down (generic and directly connected). >> >> Any ideas? >> >> Thanks, >> Alastair. >> _______________________________________________ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ > > What is the output of : > corosync-cfgtool -s > corosync-quorumtool -s > > Also check the logs of the suvived node for clues. > > What about firewall ? > Have you enabled 'high-availability' service on firewalld on all zones for > your interfaces ? > > Best Regards, > Strahil Nikolov > > _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/