This may have been fixed as of the 1.1.15-1 version of the pacemaker package. Prior to commit 071796e, "Restart=on-failure" was patched out. I've attached the diff of the commit that reverted that.
** Patch added: "pacemaker-071796e.diff" https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1654403/+attachment/4803560/+files/pacemaker-071796e.diff -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to corosync in Ubuntu. https://bugs.launchpad.net/bugs/1654403 Title: Race condition in hacluster charm that leaves pacemaker down Status in corosync package in Ubuntu: New Status in hacluster package in Juju Charms Collection: Triaged Bug description: Symptom: one or more hacluster nodes are left in an executing state. Observing the process list on the affected nodes the command 'crm node list' is in an infinite loop and pacemaker is not started. On nodes that complete the crm node list and other crm commands pacemaker is started. See the artefacts from this run: https://openstack-ci-reports.ubuntu.com/artifacts/test_charm_pipeline/openstack/charm-percona-cluster/417131/1/1873/index.html Hypothesis: There is a race that leads to crm node list being executed before pacemaker is started. It is also possible that something causes pacemaker to fail to start. Suggest a check for pacemaker heath before any crm commands are run. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1654403/+subscriptions _______________________________________________ Mailing list: https://launchpad.net/~ubuntu-ha Post to : [email protected] Unsubscribe : https://launchpad.net/~ubuntu-ha More help : https://help.launchpad.net/ListHelp

