[Bug 1654403] Re: Race condition in hacluster charm that leaves pacemaker down
** Also affects: corosync (Ubuntu Xenial) Importance: Undecided Status: New ** Changed in: corosync (Ubuntu Xenial) Status: New => Incomplete ** Changed in: corosync (Ubuntu) Status: Incomplete => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1654403 Title: Race condition in hacluster charm that leaves pacemaker down To manage notifications about this bug go to: https://bugs.launchpad.net/charm-hacluster/+bug/1654403/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1654403] Re: Race condition in hacluster charm that leaves pacemaker down
** Also affects: corosync (Ubuntu Xenial) Importance: Undecided Status: New ** Changed in: corosync (Ubuntu Xenial) Status: New => Incomplete ** Changed in: corosync (Ubuntu) Status: Incomplete => Fix Released -- You received this bug notification because you are a member of Ubuntu Server, which is subscribed to corosync in Ubuntu. https://bugs.launchpad.net/bugs/1654403 Title: Race condition in hacluster charm that leaves pacemaker down To manage notifications about this bug go to: https://bugs.launchpad.net/charm-hacluster/+bug/1654403/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1654403] Re: Race condition in hacluster charm that leaves pacemaker down
For this particular bug, it seems we have no description on why corosync was taking too long to start, just that it took too long and all the workaround made to pacemaker initialization and charm handling. With that, I'm marking corosync as incomplete for now, that I'm gathering all work to be done in HA packages. Please re-open this if you disagree, so we can discuss this bug again. Thank you! ** Changed in: corosync (Ubuntu) Status: New => Incomplete -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1654403 Title: Race condition in hacluster charm that leaves pacemaker down To manage notifications about this bug go to: https://bugs.launchpad.net/charm-hacluster/+bug/1654403/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1654403] Re: Race condition in hacluster charm that leaves pacemaker down
Hi, Corey mentioned 1.1.15 might be fixed a while ago. You have all the context - is it? So would that be for corosync: - Yakkety/Zesty Fixed - Xenial SRU needed Or is this totally solved by the charm changes you submitted. Or ... TL;DR please help me to understand what might be left on the corosync task of this bug :-) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1654403 Title: Race condition in hacluster charm that leaves pacemaker down To manage notifications about this bug go to: https://bugs.launchpad.net/charm-hacluster/+bug/1654403/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1654403] Re: Race condition in hacluster charm that leaves pacemaker down
** Changed in: charm-hacluster Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to corosync in Ubuntu. https://bugs.launchpad.net/bugs/1654403 Title: Race condition in hacluster charm that leaves pacemaker down To manage notifications about this bug go to: https://bugs.launchpad.net/charm-hacluster/+bug/1654403/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1654403] Re: Race condition in hacluster charm that leaves pacemaker down
** Changed in: charm-hacluster Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1654403 Title: Race condition in hacluster charm that leaves pacemaker down To manage notifications about this bug go to: https://bugs.launchpad.net/charm-hacluster/+bug/1654403/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1654403] Re: Race condition in hacluster charm that leaves pacemaker down
** Changed in: charm-hacluster Milestone: None => 17.02 -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to corosync in Ubuntu. https://bugs.launchpad.net/bugs/1654403 Title: Race condition in hacluster charm that leaves pacemaker down To manage notifications about this bug go to: https://bugs.launchpad.net/charm-hacluster/+bug/1654403/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1654403] Re: Race condition in hacluster charm that leaves pacemaker down
** Changed in: charm-hacluster Milestone: None => 17.02 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1654403 Title: Race condition in hacluster charm that leaves pacemaker down To manage notifications about this bug go to: https://bugs.launchpad.net/charm-hacluster/+bug/1654403/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1654403] Re: Race condition in hacluster charm that leaves pacemaker down
** Changed in: charm-hacluster Importance: Undecided => High ** Changed in: charm-hacluster Status: New => Fix Committed ** Changed in: charm-hacluster Assignee: (unassigned) => David Ames (thedac) ** Changed in: hacluster (Juju Charms Collection) Status: Fix Committed => Invalid -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to corosync in Ubuntu. https://bugs.launchpad.net/bugs/1654403 Title: Race condition in hacluster charm that leaves pacemaker down To manage notifications about this bug go to: https://bugs.launchpad.net/charm-hacluster/+bug/1654403/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1654403] Re: Race condition in hacluster charm that leaves pacemaker down
** Changed in: charm-hacluster Importance: Undecided => High ** Changed in: charm-hacluster Status: New => Fix Committed ** Changed in: charm-hacluster Assignee: (unassigned) => David Ames (thedac) ** Changed in: hacluster (Juju Charms Collection) Status: Fix Committed => Invalid -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1654403 Title: Race condition in hacluster charm that leaves pacemaker down To manage notifications about this bug go to: https://bugs.launchpad.net/charm-hacluster/+bug/1654403/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1654403] Re: Race condition in hacluster charm that leaves pacemaker down
** Changed in: hacluster (Juju Charms Collection) Status: Triaged => Fix Committed ** Changed in: hacluster (Juju Charms Collection) Assignee: (unassigned) => David Ames (thedac) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1654403 Title: Race condition in hacluster charm that leaves pacemaker down To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1654403/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1654403] Re: Race condition in hacluster charm that leaves pacemaker down
Additional information from the charm: Without cluster_count set to NUM_UNITS a race occurs where the relation to the last hacluster node is not yet set leading to the attempt to startup corosync and pacemaker with only n-1/n nodes. The last node only has one relationship it is aware of yet when there should be 2 relations: relation-list -r hanode:0 hacluster/0 corosync.conf looks like the following when there should be 3 nodes: nodelist { node { ring0_addr: 10.5.35.235 nodeid: 1000 } node { ring0_addr: 10.5.35.237 nodeid: 1001 } } The services themselves (not the charm) fail: corosync logs thousands of RETRANSMIT errors. pacemaker eventually times out after waiting on corosync. Adding more documentation to push the setting of cluster_count and updating the amulet tests to include it. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1654403 Title: Race condition in hacluster charm that leaves pacemaker down To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1654403/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1654403] Re: Race condition in hacluster charm that leaves pacemaker down
** Tags added: patch -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1654403 Title: Race condition in hacluster charm that leaves pacemaker down To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1654403/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1654403] Re: Race condition in hacluster charm that leaves pacemaker down
David, you could try adding "Restart=on-failure" back to the init file as a test. If it works, we could look into backporting that to xenial, however I'm hesitant to do that until we know better why they dropped the restart bits in the first place. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1654403 Title: Race condition in hacluster charm that leaves pacemaker down To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1654403/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1654403] Re: Race condition in hacluster charm that leaves pacemaker down
David, you could try adding "Restart=on-failure" back to the init file as a test. If it works, we could look into backporting that to xenial, however I'm hesitant to do that until we know better why they dropped the restart bits in the first place. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to corosync in Ubuntu. https://bugs.launchpad.net/bugs/1654403 Title: Race condition in hacluster charm that leaves pacemaker down To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1654403/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1654403] Re: Race condition in hacluster charm that leaves pacemaker down
This may have been fixed as of the 1.1.15-1 version of the pacemaker package. Prior to commit 071796e, "Restart=on-failure" was patched out. I've attached the diff of the commit that reverted that. ** Patch added: "pacemaker-071796e.diff" https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1654403/+attachment/4803560/+files/pacemaker-071796e.diff -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1654403 Title: Race condition in hacluster charm that leaves pacemaker down To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1654403/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1654403] Re: Race condition in hacluster charm that leaves pacemaker down
This may have been fixed as of the 1.1.15-1 version of the pacemaker package. Prior to commit 071796e, "Restart=on-failure" was patched out. I've attached the diff of the commit that reverted that. ** Patch added: "pacemaker-071796e.diff" https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1654403/+attachment/4803560/+files/pacemaker-071796e.diff -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to corosync in Ubuntu. https://bugs.launchpad.net/bugs/1654403 Title: Race condition in hacluster charm that leaves pacemaker down To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1654403/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1654403] Re: Race condition in hacluster charm that leaves pacemaker down
Corey, This is Mitaka on Xenial. I suspect that the package remains the same on Xenial for the other OpenStack releases. I'll try and confirm this. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1654403 Title: Race condition in hacluster charm that leaves pacemaker down To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1654403/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1654403] Re: Race condition in hacluster charm that leaves pacemaker down
David, what release of ubuntu/openstack does this affect? I'd like to see if we can get a package update in a PPA for you to test with. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to corosync in Ubuntu. https://bugs.launchpad.net/bugs/1654403 Title: Race condition in hacluster charm that leaves pacemaker down To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1654403/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1654403] Re: Race condition in hacluster charm that leaves pacemaker down
David, what release of ubuntu/openstack does this affect? I'd like to see if we can get a package update in a PPA for you to test with. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1654403 Title: Race condition in hacluster charm that leaves pacemaker down To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1654403/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1654403] Re: Race condition in hacluster charm that leaves pacemaker down
Root cause: 1) When corosync is restarted it may take up to a minute for it to finish setting up. 2) The systemd timeout value is exceeded. Jan 10 18:57:49 juju-39e3e2-percona-3 systemd[1]: Failed to start Corosync Cluster Engine. Jan 10 18:57:49 juju-39e3e2-percona-3 systemd[1]: corosync.service: Unit entered failed state. Jan 10 18:57:49 juju-39e3e2-percona-3 systemd[1]: corosync.service: Failed with result 'timeout'. 3) Pacemaker is then started. Pacemaker systemd script has a dependency on corosync which may still be in the process of comming up. 4) Pacemaker fails to start due to dependency Jan 10 18:57:49 juju-39e3e2-percona-3 systemd[1]: pacemaker.service: Job pacemaker.service/start failed with result 'dependency'. 5) Pacemaker remains down. 6) Subsequently, the charm checks for pacemaker health by running `crm node list` in a loop until it succeeds. 7) This is an infinite loop. Soulitions 1) Adding corosync to this bug for systemd script timeout change 2) Charm needs to better handle validation of restart of the services and better communicate to the end user when an error has occured Current Work in Process https://review.openstack.org/#/c/419204/ ** Also affects: corosync (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1654403 Title: Race condition in hacluster charm that leaves pacemaker down To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1654403/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs