Additional information from the charm:
Without cluster_count set to NUM_UNITS a race occurs where the relation
to the last hacluster node is not yet set leading to the attempt to
startup corosync and pacemaker with only n-1/n nodes.
The last node only has one relationship it is aware of yet when there should be
2 relations:
relation-list -r hanode:0
hacluster/0
corosync.conf looks like the following when there should be 3 nodes:
nodelist {
node {
ring0_addr: 10.5.35.235
nodeid: 1000
}
node {
ring0_addr: 10.5.35.237
nodeid: 1001
}
}
The services themselves (not the charm) fail:
corosync logs thousands of RETRANSMIT errors.
pacemaker eventually times out after waiting on corosync.
Adding more documentation to push the setting of cluster_count and
updating the amulet tests to include it.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1654403
Title:
Race condition in hacluster charm that leaves pacemaker down
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1654403/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs