[Ubuntu-ha] [Bug 1654403] Re: Race condition in hacluster charm that leaves pacemaker down

Corey Bryant Thu, 12 Jan 2017 12:06:53 -0800

This may have been fixed as of the 1.1.15-1 version of the pacemaker
package. Prior to commit 071796e, "Restart=on-failure" was patched out.
I've attached the diff of the commit that reverted that.


** Patch added: "pacemaker-071796e.diff"
   
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1654403/+attachment/4803560/+files/pacemaker-071796e.diff

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to corosync in Ubuntu.
https://bugs.launchpad.net/bugs/1654403

Title:
  Race condition in hacluster charm that leaves pacemaker down

Status in corosync package in Ubuntu:
  New
Status in hacluster package in Juju Charms Collection:
  Triaged

Bug description:
  Symptom: one or more hacluster nodes are left in an executing state.
  Observing the process list on the affected nodes the command 'crm node list' 
is in an infinite loop and pacemaker is not started. On nodes that complete the 
crm node list and other crm commands pacemaker is started.

  See the artefacts from this run:
  
https://openstack-ci-reports.ubuntu.com/artifacts/test_charm_pipeline/openstack/charm-percona-cluster/417131/1/1873/index.html

  Hypothesis: There is a race that leads to crm node list being executed
  before pacemaker is started. It is also possible that something causes
  pacemaker to fail to start.

  Suggest a check for pacemaker heath before any crm commands are run.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1654403/+subscriptions

_______________________________________________
Mailing list: https://launchpad.net/~ubuntu-ha
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp

[Ubuntu-ha] [Bug 1654403] Re: Race condition in hacluster charm that leaves pacemaker down

Reply via email to