Hi,

When setting up a cluster with just 1 node with auto-tie-breaker and DLM, and incrementally adding more I got some unexpected fencing if the 2nd node doesn't join the cluster soon enough.

What I also found surprising is that if the cluster has ever seen 2 nodes, then turning off the 2nd node works fine and doesn't cause fencing (using auto-tie-breaker).


I have a hardware watchdog, and can reproduce the problem with these (or older) versions and sequence of steps:

corosync-2.4.0-9.el7.x86_64
pacemaker-1.1.16-12.el7.x86_64
sbd-1.3.0-3.el7.x86_64
pcs-0.9.158-6.el7.x86_64

pcs cluster destroy
rm /var/lib/corosync/* -f
pcs cluster auth -u hacluster cluster1 cluster2
pcs cluster setup --name cluster cluster1 --auto_tie_breaker=1
pcs stonith sbd enable
pcs cluster start --all
pcs property set no-quorum-policy=ignore
# or pcs property set no-quorum-policy=freeze
# or pcs property set no-quorum-policy=suicide
pcs resource create dlm ocf:pacemaker:controld op monitor interval=30s on-fail=fence clone interleave=true ordered=true
while ! dlm_tool join testls; do sleep 1; done
crm_mon -1
pcs cluster node add cluster2&
journalctl --follow


What am I doing wrong, and how can I avoid fencing?
I thought that setting no-quorum-policy to ignore would prevent this (if I have just 1 node I don't really need fencing until the 2nd node is actually up), but if there are any active DLM lockspaces that doesn't seem to be the case.

Thanks,
--Edwin

_______________________________________________
Users mailing list: [email protected]
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to