[Ubuntu-ha] [Bug 1884149] Update Released
The verification of the Stable Release Update for haproxy has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions. -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to haproxy in Ubuntu. https://bugs.launchpad.net/bugs/1884149 Title: haproxy crashes on in __pool_get_first if unique-id-header is used Status in HAProxy: Fix Released Status in haproxy package in Ubuntu: Fix Released Status in haproxy source package in Bionic: Fix Released Status in haproxy package in Debian: Fix Released Bug description: [Impact] * The handling of locks in haproxy led to a state that between idle http connections one could have indicated a connection was destroyed. In that case the code went on and accessed a just freed resource. As upstream puts it "It can have random implications between requests as it may lead a wrong connection's polling to be re-enabled or disabled for example, especially with threads." * Backport the fix from upstreams 1.8 stable branch [Test Case] * It is a race and might be hard to trigger. An haproxy config to be in front of three webservers can be seen below. Setting up three apaches locally didn't trigger the same bug, but we know it is timing sensitive. * Simon (anbox) has a setup which reliably triggers this and will run the tests there. * The bad case will trigger a crash as reported below. [Regression Potential] * This change is in >=Disco and has no further bugs reported against it (no follow on change) which should make it rather safe. Also no other change to that file context in 1.8 stable since then. The change is on the locking of connections. So if we want to expect regressions, then they would be at the handling of concurrent connections. [Other Info] * Strictly speaking it is a race, so triggering it depends on load and machine cpu/IO speed. --- Version 1.8.8-1ubuntu0.10 of haproxy in Ubuntu 18.04 (bionic) crashes with Thread 2.1 "haproxy" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xf77b1010 (LWP 17174)] __pool_get_first (pool=0xaac6ddd0, pool=0xaac6ddd0) at include/common/memory.h:124 124 include/common/memory.h: No such file or directory. (gdb) bt #0 __pool_get_first (pool=0xaac6ddd0, pool=0xaac6ddd0) at include/common/memory.h:124 #1 pool_alloc_dirty (pool=0xaac6ddd0) at include/common/memory.h:154 #2 pool_alloc (pool=0xaac6ddd0) at include/common/memory.h:229 #3 conn_new () at include/proto/connection.h:655 #4 cs_new (conn=0x0) at include/proto/connection.h:683 #5 connect_conn_chk (t=0xaacb8820) at src/checks.c:1553 #6 process_chk_conn (t=0xaacb8820) at src/checks.c:2135 #7 process_chk (t=0xaacb8820) at src/checks.c:2281 #8 0xaabca0b4 in process_runnable_tasks () at src/task.c:231 #9 0xaab76f44 in run_poll_loop () at src/haproxy.c:2399 #10 run_thread_poll_loop (data=) at src/haproxy.c:2461 #11 0xaaad79ec in main (argc=, argv=0xaac61b30) at src/haproxy.c:3050 when running on an ARM64 system. The haproxy.cfg looks like this: global log /dev/log local0 log /dev/log local1 notice maxconn 4096 user haproxy group haproxy spread-checks 0 tune.ssl.default-dh-param 1024 ssl-default-bind-ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:!DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:!DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:!CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA defaults log global mode tcp option httplog option dontlognull retries 3 timeout queue 2 timeout client 5 timeout connect 5000 timeout server 5 frontend anbox-stream-gateway-lb-5-80 bind 0.0.0.0:80 default_backend api_http mode http
[Ubuntu-ha] [Bug 1877280] Re: attrd can segfault on exit
Hello Dan, or anyone else affected, Accepted pacemaker into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/pacemaker/1.1.14-2ubuntu1.8 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed- xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification- failed-xenial. In either case, without details of your testing we will not be able to proceed. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping! N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days. ** Changed in: pacemaker (Ubuntu Xenial) Status: In Progress => Fix Committed ** Tags added: verification-needed verification-needed-xenial -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to pacemaker in Ubuntu. https://bugs.launchpad.net/bugs/1877280 Title: attrd can segfault on exit Status in pacemaker package in Ubuntu: Fix Released Status in pacemaker source package in Xenial: Fix Committed Bug description: [impact] pacemaker's attrd may segfault on exit. [test case] this is a follow on to bug 1871166, the patches added there prevented one segfault but this one emerged. As with that bug, I can't reproduce this myself, but the original reporter is able to reproduce intermittently. [regression potential] any regression would likely impact the exit path of attrd, possibly causing a segfault or other incorrect exit. [scope] this is needed only for Xenial. this is fixed upstream by commit 3c62fb1d0d which is included in Bionic and later. [other info] this is a follow on to bug 1871166. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1877280/+subscriptions ___ Mailing list: https://launchpad.net/~ubuntu-ha Post to : ubuntu-ha@lists.launchpad.net Unsubscribe : https://launchpad.net/~ubuntu-ha More help : https://help.launchpad.net/ListHelp
[Ubuntu-ha] [Bug 1871166] Update Released
The verification of the Stable Release Update for pacemaker has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions. -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to pacemaker in Ubuntu. https://bugs.launchpad.net/bugs/1871166 Title: lrmd crashes Status in pacemaker package in Ubuntu: Fix Released Status in pacemaker source package in Xenial: Fix Released Bug description: [impact] lrmd crashes and dumps core. [test case] I can not reproduce, but it is reproducable in the specific setup of the person reporting the bug to me. [regression potential] this patches the cancel/cleanup part of the code, so regressions would likely involve possible memory leaks (instead of use-after-free segfaults), failure to correctly cancel or cleanup operations, or other failure during cancel action. [scope] this is fixed by commits: 933d46ef20591757301784773a37e06b78906584 94a4c58f675d163085a055f59fd6c3a2c9f57c43 dc36d4375c049024a6f9e4d2277a3e6444fad05b deabcc5a6aa93dadf0b20364715b559a5b9848ac b85037b75255061a41d0ec3fd9b64f271351b43e which are all included starting with version 1.1.17, and Bionic includes version 1.1.18, so this is fixed already in Bionic and later. This is needed only for Xenial. [other info] As mentioned in the test case section, I do not have a setup where I'm able to reproduce this, but I can ask the initial reporter to test and verify the fix, and they have verified a test build fixed the problem for them. Also, the upstream commits removed two symbols, which I elided from the backported patches; those symbols are still available and, while it is unlikely there were any users of those symbols outside pacemaker itself, this change should not break any possible external users. See patch 0002 header in the upload for more detail. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1871166/+subscriptions ___ Mailing list: https://launchpad.net/~ubuntu-ha Post to : ubuntu-ha@lists.launchpad.net Unsubscribe : https://launchpad.net/~ubuntu-ha More help : https://help.launchpad.net/ListHelp
[Ubuntu-ha] [Bug 1866119] Update Released
The verification of the Stable Release Update for pacemaker has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions. -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to pacemaker in Ubuntu. https://bugs.launchpad.net/bugs/1866119 Title: [bionic] fence_scsi not working properly with Pacemaker 1.1.18-2ubuntu1.1 Status in pacemaker package in Ubuntu: Fix Released Status in pacemaker source package in Bionic: Fix Released Bug description: OBS: This bug was originally into LP: #1865523 but it was split. SRU: pacemaker [Impact] * fence_scsi is not currently working in a share disk environment * all clusters relying in fence_scsi and/or fence_scsi + watchdog won't be able to start the fencing agents OR, in worst case scenarios, the fence_scsi agent might start but won't make scsi reservations in the shared scsi disk. * this bug is taking care of pacemaker 1.1.18 issues with fence_scsi, since the later was fixed at LP: #1865523. [Test Case] * having a 3-node setup, nodes called "clubionic01, clubionic02, clubionic03", with a shared scsi disk (fully supporting persistent reservations) /dev/sda, with corosync and pacemaker operational and running, one might try: rafaeldtinoco@clubionic01:~$ crm configure crm(live)configure# property stonith-enabled=on crm(live)configure# property stonith-action=off crm(live)configure# property no-quorum-policy=stop crm(live)configure# property have-watchdog=true crm(live)configure# commit crm(live)configure# end crm(live)# end rafaeldtinoco@clubionic01:~$ crm configure primitive fence_clubionic \ stonith:fence_scsi params \ pcmk_host_list="clubionic01 clubionic02 clubionic03" \ devices="/dev/sda" \ meta provides=unfencing And see the following errors: Failed Actions: * fence_clubionic_start_0 on clubionic02 'unknown error' (1): call=6, status=Error, exitreason='', last-rc-change='Wed Mar 4 19:53:12 2020', queued=0ms, exec=1105ms * fence_clubionic_start_0 on clubionic03 'unknown error' (1): call=6, status=Error, exitreason='', last-rc-change='Wed Mar 4 19:53:13 2020', queued=0ms, exec=1109ms * fence_clubionic_start_0 on clubionic01 'unknown error' (1): call=6, status=Error, exitreason='', last-rc-change='Wed Mar 4 19:53:11 2020', queued=0ms, exec=1108ms and corosync.log will show: warning: unpack_rsc_op_failure: Processing failed op start for fence_clubionic on clubionic01: unknown error (1) [Regression Potential] * LP: #1865523 shows fence_scsi fully operational after SRU for that bug is done. * LP: #1865523 used pacemaker 1.1.19 (vanilla) in order to fix fence_scsi. * There are changes to: cluster resource manager daemon, local resource manager daemon and police engine. From all the changes, the police engine fix is the biggest, but still not big for a SRU. This could cause police engine, thus cluster decisions, to mal function. * All patches are based in upstream fixes made right after Pacemaker-1.1.18, used by Ubuntu Bionic and were tested with fence_scsi to make sure it fixed the issues. [Other Info] * Original Description: Trying to setup a cluster with an iscsi shared disk, using fence_scsi as the fencing mechanism, I realized that fence_scsi is not working in Ubuntu Bionic. I first thought it was related to Azure environment (LP: #1864419), where I was trying this environment, but then, trying locally, I figured out that somehow pacemaker 1.1.18 is not fencing the shared scsi disk properly. Note: I was able to "backport" vanilla 1.1.19 from upstream and fence_scsi worked. I have then tried 1.1.18 without all quilt patches and it didnt work as well. I think that bisecting 1.1.18 <-> 1.1.19 might tell us which commit has fixed the behaviour needed by the fence_scsi agent. (k)rafaeldtinoco@clubionic01:~$ crm conf show node 1: clubionic01.private node 2: clubionic02.private node 3: clubionic03.private primitive fence_clubionic stonith:fence_scsi \ params pcmk_host_list="10.250.3.10 10.250.3.11 10.250.3.12" devices="/dev/sda" \ meta provides=unfencing property cib-bootstrap-options: \ have-watchdog=false \ dc-version=1.1.18-2b07d5c5a9 \ cluster-infrastructure=corosync \ cluster-name=clubionic \ stonith-enabled=on \ stonith-action=off \ no-quorum-policy=stop \ symmetric-cluster=true (k)rafaeldtinoco@clubionic02:~$ sudo crm_mon -1 Stack:
[Ubuntu-ha] [Bug 1871166] Re: lrmd crashes
This looks good, though this is quite a lot of code changes (and refactoring) for a bug without a clear reproduction scenario. Would it be possible to, along with the verification to be done by the reporting person, perform some sanity runs to make sure the cancel/cleanup parts of the code did not regress? Thanks! ** Changed in: pacemaker (Ubuntu Xenial) Status: In Progress => Fix Committed ** Tags added: verification-needed verification-needed-xenial -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to pacemaker in Ubuntu. https://bugs.launchpad.net/bugs/1871166 Title: lrmd crashes Status in pacemaker package in Ubuntu: Fix Released Status in pacemaker source package in Xenial: Fix Committed Bug description: [impact] lrmd crashes and dumps core. [test case] I can not reproduce, but it is reproducable in the specific setup of the person reporting the bug to me. [regression potential] this patches the cancel/cleanup part of the code, so regressions would likely involve possible memory leaks (instead of use-after-free segfaults), failure to correctly cancel or cleanup operations, or other failure during cancel action. [scope] this is fixed by commits: 933d46ef20591757301784773a37e06b78906584 94a4c58f675d163085a055f59fd6c3a2c9f57c43 dc36d4375c049024a6f9e4d2277a3e6444fad05b deabcc5a6aa93dadf0b20364715b559a5b9848ac b85037b75255061a41d0ec3fd9b64f271351b43e which are all included starting with version 1.1.17, and Bionic includes version 1.1.18, so this is fixed already in Bionic and later. This is needed only for Xenial. [other info] As mentioned in the test case section, I do not have a setup where I'm able to reproduce this, but I can ask the initial reporter to test and verify the fix, and they have verified a test build fixed the problem for them. Also, the upstream commits removed two symbols, which I elided from the backported patches; those symbols are still available and, while it is unlikely there were any users of those symbols outside pacemaker itself, this change should not break any possible external users. See patch 0002 header in the upload for more detail. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1871166/+subscriptions ___ Mailing list: https://launchpad.net/~ubuntu-ha Post to : ubuntu-ha@lists.launchpad.net Unsubscribe : https://launchpad.net/~ubuntu-ha More help : https://help.launchpad.net/ListHelp
[Ubuntu-ha] [Bug 1871166] Please test proposed package
Hello Dan, or anyone else affected, Accepted pacemaker into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/pacemaker/1.1.14-2ubuntu1.7 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed- xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification- failed-xenial. In either case, without details of your testing we will not be able to proceed. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping! N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days. -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to pacemaker in Ubuntu. https://bugs.launchpad.net/bugs/1871166 Title: lrmd crashes Status in pacemaker package in Ubuntu: Fix Released Status in pacemaker source package in Xenial: Fix Committed Bug description: [impact] lrmd crashes and dumps core. [test case] I can not reproduce, but it is reproducable in the specific setup of the person reporting the bug to me. [regression potential] this patches the cancel/cleanup part of the code, so regressions would likely involve possible memory leaks (instead of use-after-free segfaults), failure to correctly cancel or cleanup operations, or other failure during cancel action. [scope] this is fixed by commits: 933d46ef20591757301784773a37e06b78906584 94a4c58f675d163085a055f59fd6c3a2c9f57c43 dc36d4375c049024a6f9e4d2277a3e6444fad05b deabcc5a6aa93dadf0b20364715b559a5b9848ac b85037b75255061a41d0ec3fd9b64f271351b43e which are all included starting with version 1.1.17, and Bionic includes version 1.1.18, so this is fixed already in Bionic and later. This is needed only for Xenial. [other info] As mentioned in the test case section, I do not have a setup where I'm able to reproduce this, but I can ask the initial reporter to test and verify the fix, and they have verified a test build fixed the problem for them. Also, the upstream commits removed two symbols, which I elided from the backported patches; those symbols are still available and, while it is unlikely there were any users of those symbols outside pacemaker itself, this change should not break any possible external users. See patch 0002 header in the upload for more detail. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1871166/+subscriptions ___ Mailing list: https://launchpad.net/~ubuntu-ha Post to : ubuntu-ha@lists.launchpad.net Unsubscribe : https://launchpad.net/~ubuntu-ha More help : https://help.launchpad.net/ListHelp
[Ubuntu-ha] [Bug 1848902] Update Released
The verification of the Stable Release Update for haproxy has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions. -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to haproxy in Ubuntu. https://bugs.launchpad.net/bugs/1848902 Title: haproxy in bionic can get stuck Status in haproxy package in Ubuntu: Fix Released Status in haproxy source package in Bionic: Fix Released Bug description: [Impact] * The master process will exit with the status of the last worker. When the worker is killed with SIGTERM, it is expected to get 143 as an exit status. Therefore, we consider this exit status as normal from a systemd point of view. If it happens when not stopping, the systemd unit is configured to always restart, so it has no adverse effect. * Backport upstream fix - adding another accepted RC to the systemd service [Test Case] * You want to install haproxy and have it running. Then sigterm it a lot. With the fix it would restart the service all the time, well except restart limit. But in the bad case it will just stay down and didn't even try to restart it. $ apt install haproxy $ for x in {1..100}; do pkill -TERM -x haproxy ; sleep 0.1 ; done $ systemctl status haproxy The above is a hacky way to trigger some A/B behavior on the fix. It isn't perfect as systemd restart counters will kick in and you essentially check a secondary symptom. I'd recommend to in addition run the following: $ apt install haproxy $ for x in {1..1000}; do pkill -TERM -x haproxy ; sleep 0.001 systemctl reset-failed haproxy.service; done $ systemctl status haproxy You can do so with even smaller sleeps, that should keep the service up and running (this isn't changing with the fix, but should work with the new code). [Regression Potential] * This eventually is a conffile modification, so if there are other modifications done by the user they will get a prompt. But that isn't a regression. I checked the code and I can't think of another RC=143 that would due to that "no more" detected as error. I really think other than the update itself triggering a restart (as usual for services) there is no further regression potential to this. [Other Info] * Fix already active in IS hosted cloud without issues since a while * Also reports (comment #5) show that others use this in production as well --- On a Bionic/Stein cloud, after a network partition, we saw several units (glance, swift-proxy and cinder) fail to start haproxy, like so: root@juju-df624b-6-lxd-4:~# systemctl status haproxy.service ● haproxy.service - HAProxy Load Balancer Loaded: loaded (/lib/systemd/system/haproxy.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Sun 2019-10-20 00:23:18 UTC; 1h 35min ago Docs: man:haproxy(1) file:/usr/share/doc/haproxy/configuration.txt.gz Process: 2002655 ExecStart=/usr/sbin/haproxy -Ws -f $CONFIG -p $PIDFILE $EXTRAOPTS (code=exited, status=143) Process: 2002649 ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q $EXTRAOPTS (code=exited, status=0/SUCCESS) Main PID: 2002655 (code=exited, status=143) Oct 20 00:16:52 juju-df624b-6-lxd-4 systemd[1]: Starting HAProxy Load Balancer... Oct 20 00:16:52 juju-df624b-6-lxd-4 systemd[1]: Started HAProxy Load Balancer. Oct 20 00:23:18 juju-df624b-6-lxd-4 systemd[1]: Stopping HAProxy Load Balancer... Oct 20 00:23:18 juju-df624b-6-lxd-4 haproxy[2002655]: [WARNING] 292/001652 (2002655) : Exiting Master process... Oct 20 00:23:18 juju-df624b-6-lxd-4 haproxy[2002655]: [ALERT] 292/001652 (2002655) : Current worker 2002661 exited with code 143 Oct 20 00:23:18 juju-df624b-6-lxd-4 haproxy[2002655]: [WARNING] 292/001652 (2002655) : All workers exited. Exiting... (143) Oct 20 00:23:18 juju-df624b-6-lxd-4 systemd[1]: haproxy.service: Main process exited, code=exited, status=143/n/a Oct 20 00:23:18 juju-df624b-6-lxd-4 systemd[1]: haproxy.service: Failed with result 'exit-code'. Oct 20 00:23:18 juju-df624b-6-lxd-4 systemd[1]: Stopped HAProxy Load Balancer. root@juju-df624b-6-lxd-4:~# The Debian maintainer came up with the following patch for this: https://www.mail-archive.com/haproxy@formilux.org/msg30477.html Which was added to the 1.8.10-1 Debian upload and merged into upstream 1.8.13. Unfortunately Bionic is on 1.8.8-1ubuntu0.4 and doesn't have this patch. Please consider pulling this patch into an SRU for
[Ubuntu-ha] [Bug 1815101] Update Released
The verification of the Stable Release Update for systemd has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions. -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to keepalived in Ubuntu. https://bugs.launchpad.net/bugs/1815101 Title: [master] Restarting systemd-networkd breaks keepalived, heartbeat, corosync, pacemaker (interface aliases are restarted) Status in Keepalived Charm: New Status in netplan: Confirmed Status in heartbeat package in Ubuntu: Won't Fix Status in keepalived package in Ubuntu: In Progress Status in systemd package in Ubuntu: In Progress Status in keepalived source package in Bionic: Confirmed Status in systemd source package in Bionic: Confirmed Status in keepalived source package in Disco: Confirmed Status in systemd source package in Disco: Confirmed Status in keepalived source package in Eoan: In Progress Status in systemd source package in Eoan: Fix Released Bug description: [impact] - ALL related HA software has a small problem if interfaces are being managed by systemd-networkd: nic restarts/reconfigs are always going to wipe all interfaces aliases when HA software is not expecting it to (no coordination between them. - keepalived, smb ctdb, pacemaker, all suffer from this. Pacemaker is smarter in this case because it has a service monitor that will restart the virtual IP resource, in affected node & nic, before considering a real failure, but other HA service might consider a real failure when it is not. [test case] - comment #14 is a full test case: to have 3 node pacemaker, in that example, and cause a networkd service restart: it will trigger a failure for the virtual IP resource monitor. - other example is given in the original description for keepalived. both suffer from the same issue (and other HA softwares as well). [regression potential] - this backports KeepConfiguration parameter, which adds some significant complexity to networkd's configuration and behavior, which could lead to regressions in correctly configuring the network at networkd start, or incorrectly maintaining configuration at networkd restart, or losing network state at networkd stop. - Any regressions are most likely to occur during networkd start, restart, or stop, and most likely to involve missing or incorrect ip address(es). - the change is based in upstream patches adding the exact feature we needed to fix this issue & it will be integrated with a netplan change to add the needed stanza to systemd nic configuration file (KeepConfiguration=) [other info] original description: --- Configure netplan for interfaces, for example (a working config with IP addresses obfuscated) network: ethernets: eth0: addresses: [192.168.0.5/24] dhcp4: false nameservers: search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, phone.blah.com] addresses: [10.22.11.1] eth2: addresses: - 12.13.14.18/29 - 12.13.14.19/29 gateway4: 12.13.14.17 dhcp4: false nameservers: search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, phone.blah.com] addresses: [10.22.11.1] eth3: addresses: [10.22.11.6/24] dhcp4: false nameservers: search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, phone.blah.com] addresses: [10.22.11.1] eth4: addresses: [10.22.14.6/24] dhcp4: false nameservers: search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, phone.blah.com] addresses: [10.22.11.1] eth7: addresses: [9.5.17.34/29] dhcp4: false optional: true nameservers: search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, phone.blah.com] addresses: [10.22.11.1] version: 2 Configure keepalived (again, a working config with IP addresses obfuscated) global_defs # Block id { notification_email { sysadm...@blah.com } notification_email_from keepali...@system3.hq.blah.com smtp_server 10.22.11.7 # IP smtp_connect_timeout 30 # integer, seconds router_id system3 # string identifying the machine,
[Ubuntu-ha] [Bug 1815101] Re: [master] Restarting systemd-networkd breaks keepalived, heartbeat, corosync, pacemaker (interface aliases are restarted)
Hello Leroy, or anyone else affected, Accepted systemd into eoan-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/242-7ubuntu3.2 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-eoan to verification-done-eoan. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-eoan. In either case, without details of your testing we will not be able to proceed. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping! N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days. ** Changed in: systemd (Ubuntu Eoan) Status: In Progress => Fix Committed ** Tags added: verification-needed verification-needed-eoan -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to keepalived in Ubuntu. https://bugs.launchpad.net/bugs/1815101 Title: [master] Restarting systemd-networkd breaks keepalived, heartbeat, corosync, pacemaker (interface aliases are restarted) Status in Keepalived Charm: New Status in netplan: Confirmed Status in heartbeat package in Ubuntu: Triaged Status in keepalived package in Ubuntu: In Progress Status in systemd package in Ubuntu: In Progress Status in heartbeat source package in Bionic: Triaged Status in keepalived source package in Bionic: Confirmed Status in systemd source package in Bionic: Confirmed Status in heartbeat source package in Disco: Triaged Status in keepalived source package in Disco: Confirmed Status in systemd source package in Disco: Confirmed Status in heartbeat source package in Eoan: Triaged Status in keepalived source package in Eoan: In Progress Status in systemd source package in Eoan: Fix Committed Bug description: [impact] - ALL related HA software has a small problem if interfaces are being managed by systemd-networkd: nic restarts/reconfigs are always going to wipe all interfaces aliases when HA software is not expecting it to (no coordination between them. - keepalived, smb ctdb, pacemaker, all suffer from this. Pacemaker is smarter in this case because it has a service monitor that will restart the virtual IP resource, in affected node & nic, before considering a real failure, but other HA service might consider a real failure when it is not. [test case] - comment #14 is a full test case: to have 3 node pacemaker, in that example, and cause a networkd service restart: it will trigger a failure for the virtual IP resource monitor. - other example is given in the original description for keepalived. both suffer from the same issue (and other HA softwares as well). [regression potential] - this backports KeepConfiguration parameter, which adds some significant complexity to networkd's configuration and behavior, which could lead to regressions in correctly configuring the network at networkd start, or incorrectly maintaining configuration at networkd restart, or losing network state at networkd stop. - Any regressions are most likely to occur during networkd start, restart, or stop, and most likely to involve missing or incorrect ip address(es). - the change is based in upstream patches adding the exact feature we needed to fix this issue & it will be integrated with a netplan change to add the needed stanza to systemd nic configuration file (KeepConfiguration=) [other info] original description: --- Configure netplan for interfaces, for example (a working config with IP addresses obfuscated) network: ethernets: eth0: addresses: [192.168.0.5/24] dhcp4: false nameservers: search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, phone.blah.com] addresses: [10.22.11.1] eth2: addresses: - 12.13.14.18/29 - 12.13.14.19/29 gateway4: 12.13.14.17 dhcp4: false nameservers: search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, phone.blah.com] addresses: [10.22.11.1] eth3: addresses: [10.22.11.6/24] dhcp4: false nameservers: search: [blah.com, other.blah.com, hq.blah.com,
[Ubuntu-ha] [Bug 1841936] Update Released
The verification of the Stable Release Update for haproxy has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions. -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to haproxy in Ubuntu. https://bugs.launchpad.net/bugs/1841936 Title: Rebuild openssl 1.1.1 to pickup TLSv1.3 (bionic) and unbreak existing builds against 1.1.1 (dh key size) Status in HAProxy: Fix Released Status in haproxy package in Ubuntu: Fix Committed Status in haproxy source package in Bionic: Fix Committed Status in haproxy source package in Disco: Fix Released Status in haproxy source package in Eoan: Fix Released Status in haproxy source package in Focal: Fix Committed Bug description: [Impact-Bionic] * openssl 1.1.1 has been backported to Bionic for its longer support upstream period * That would allow the extra feature of TLSv1.3 in some consuming packages what seems "for free". Just with a no change rebuild it would pick that up. [Impact Disco-Focal] * openssl >=1.1.1 is in Disco-Focal already and thereby it was built against that already. That made it pick up TLSv1.3, but also a related bug that broke the ability to control the DHE key, it was always in "ECDH auto" mode. Therefore the daemon didn't follow the config anymore. Upgraders would regress having their DH key behavior changed unexpectedly. [Test Case] A) * run "haproxy -vv" and check the reported TLS versions to include 1.3 B) * download https://github.com/drwetter/testssl.sh * Install haproxy * ./testssl.sh --pfs :443 * Check the reported DH key/group (shoudl stay 1024) * Check if settings work to bump it like tune.ssl.default-dh-param 2048 into /etc/haproxy/haproxy.cfg [Regression Potential-Bionic] * This should be low, the code already runs against the .so of the newer openssl library. This would only make it recognize the newer TLS support. i'd expect more trouble as-is with the somewhat big delta between what it was built against vs what it runs with than afterwards. * [1] and [2] indicate that any config that would have been made for TLSv1.2 [1] would not apply to the v1.3 as it would be configured in [2]. It is good to have no entry for [2] yet as following the defaults of openssl is the safest as that would be updated if new insights/CVEs are known. But this could IMHO be the "regression that I'd expect", one explcitly configured the v1.2 things and once both ends support v1.3 that might be auto-negotiated. One can then set "force-tlsv12" but that is an administrative action [3] * Yet AFAIK this fine grained control [2] for TLSv1.3 only exists in >=1.8.15 [4] and Bionic is on haproxy 1.8.8. So any user of TLSv1.3 in Bionic haproxy would have to stay without that. There are further changes to TLS v1.3 handling enhancements [5] but also fixes [6] which aren't in 1.8.8 in Bionic. So one could say enabling this will enable an inferior TLSv1.3 and one might better not enable it, for an SRU the bar to not break old behavior is intentionally high - I tried to provide as much as possible background, the decision is up to the SRU team. [Regression Potential-Disco-Focal] * The fixes let the admin regain control of the DH key configuration which is the fix. But remember that the default config didn't specify any. Therefore we have two scenarios: a) an admin had set custom DH parameters which were ignored. He had no chance to control them and needs the fix. He might have been under the impression that his keys are safe (there is a CVE against small ones) and only now is he really safe -> gain high, regression low b) an admin had not set anything, the default config is meant to use (compatibility) and the program reported "I'm using 1024, but you should set it higher". But what really happened was ECDH auto mode which has longer keys and different settings. Those systems will be "fixed" by finally following the config, but that means the key will "now" after the fix be vulnerable. -> for their POV a formerly secure setup will become vulnerable I'd expect that any professional setup would use explicit config as it has seen the warning since day #1 and also any kind of deployment recipes should use big keys. So the majority of users should be in (a). c) And OTOH there are people like the reporter
[Ubuntu-ha] [Bug 1819046] Update Released
The verification of the Stable Release Update for pacemaker has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions. -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to pacemaker in Ubuntu. https://bugs.launchpad.net/bugs/1819046 Title: Systemd unit file reads settings from wrong path Status in pacemaker package in Ubuntu: Fix Released Status in pacemaker source package in Xenial: Fix Released Bug description: [Impact] Systemd Unit file doesn't read any settings by default [Description] The unit file shipped with the Xenial pacemaker package tries to read environment settings from /etc/sysconfig/ instead of /etc/default/. The result is that settings defined in /etc/default/pacemaker are not effective. Since the /etc/default/pacemaker file is created with default values when the pacemaker package is installed, we should source that in the systemd unit file. [Test Case] 1) Deploy a Xenial container: $ lxc launch ubuntu:xenial pacemaker 2) Update container and install pacemaker: root@pacemaker:~# apt update && apt install pacemaker -y 3) Change default pacemaker log location: root@pacemaker:~# echo "PCMK_logfile=/tmp/pacemaker.log" >> /etc/default/pacemaker 4) Restart pacemaker service and verify that log file exists: root@pacemaker:~# systemctl restart pacemaker.service root@pacemaker:~# ls -l /tmp/pacemaker.log ls: cannot access '/tmp/pacemaker.log': No such file or directory After fixing the systemd unit, changes to /etc/default/pacemaker get picked up correctly: root@pacemaker:~# ls -l /tmp/pacemaker.log -rw-rw 1 hacluster haclient 27335 Mar 7 20:46 /tmp/pacemaker.log [Regression Potential] The regression potential for this should be very low, since the configuration file is already being created by default and other systemd unit files are using the /etc/default config. In case the file doesn't exist or the user removed it, the "-" prefix will gracefully ignore the missing file according to the systemd.exec manual [0]. Nonetheless, the new package will be tested with autopkgtests and the fix will be validated in a reproduction environment. [0] https://www.freedesktop.org/software/systemd/man/systemd.exec.html To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1819046/+subscriptions ___ Mailing list: https://launchpad.net/~ubuntu-ha Post to : ubuntu-ha@lists.launchpad.net Unsubscribe : https://launchpad.net/~ubuntu-ha More help : https://help.launchpad.net/ListHelp
[Ubuntu-ha] [Bug 1819046] Re: Systemd unit file reads settings from wrong path
Hello Heitor, or anyone else affected, Accepted pacemaker into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/pacemaker/1.1.14-2ubuntu1.5 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping! N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days. ** Changed in: pacemaker (Ubuntu Xenial) Status: In Progress => Fix Committed ** Tags added: verification-needed verification-needed-xenial -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to pacemaker in Ubuntu. https://bugs.launchpad.net/bugs/1819046 Title: Systemd unit file reads settings from wrong path Status in pacemaker package in Ubuntu: Fix Released Status in pacemaker source package in Xenial: Fix Committed Bug description: [Impact] Systemd Unit file doesn't read any settings by default [Description] The unit file shipped with the Xenial pacemaker package tries to read environment settings from /etc/sysconfig/ instead of /etc/default/. The result is that settings defined in /etc/default/pacemaker are not effective. Since the /etc/default/pacemaker file is created with default values when the pacemaker package is installed, we should source that in the systemd unit file. [Test Case] 1) Deploy a Xenial container: $ lxc launch ubuntu:xenial pacemaker 2) Update container and install pacemaker: root@pacemaker:~# apt update && apt install pacemaker -y 3) Change default pacemaker log location: root@pacemaker:~# echo "PCMK_logfile=/tmp/pacemaker.log" >> /etc/default/pacemaker 4) Restart pacemaker service and verify that log file exists: root@pacemaker:~# systemctl restart pacemaker.service root@pacemaker:~# ls -l /tmp/pacemaker.log ls: cannot access '/tmp/pacemaker.log': No such file or directory After fixing the systemd unit, changes to /etc/default/pacemaker get picked up correctly: root@pacemaker:~# ls -l /tmp/pacemaker.log -rw-rw 1 hacluster haclient 27335 Mar 7 20:46 /tmp/pacemaker.log [Regression Potential] The regression potential for this should be very low, since the configuration file is already being created by default and other systemd unit files are using the /etc/default config. In case the file doesn't exist or the user removed it, the "-" prefix will gracefully ignore the missing file according to the systemd.exec manual [0]. Nonetheless, the new package will be tested with autopkgtests and the fix will be validated in a reproduction environment. [0] https://www.freedesktop.org/software/systemd/man/systemd.exec.html To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1819046/+subscriptions ___ Mailing list: https://launchpad.net/~ubuntu-ha Post to : ubuntu-ha@lists.launchpad.net Unsubscribe : https://launchpad.net/~ubuntu-ha More help : https://help.launchpad.net/ListHelp
[Ubuntu-ha] [Bug 1804069] Update Released
The verification of the Stable Release Update for haproxy has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions. -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to haproxy in Ubuntu. https://bugs.launchpad.net/bugs/1804069 Title: haproxy fails on arm64 due to alignment error Status in haproxy package in Ubuntu: Fix Released Status in haproxy source package in Bionic: Fix Committed Status in haproxy source package in Cosmic: Fix Released Bug description: [Impact] haproxy as shipped with bionic and cosmic doesn't work on arm64 architectures, crashing the moment it serves a request. [Test Case] * install haproxy and apache in an up-to-date ubuntu release you are testing, in an arm64 system: sudo apt update && sudo apt dist-upgrade -y && sudo apt install haproxy apache2 -y * Create /etc/haproxy/haproxy.cfg with the following contents: global chroot /var/lib/haproxy user haproxy group haproxy daemon maxconn 4096 defaults log global option dontlognull option redispatch retries 3 timeout client 50s timeout connect 10s timeout http-request 5s timeout server 50s maxconn 4096 frontend test-front bind *:8080 mode http default_backend test-back backend test-back mode http stick store-request src stick-table type ip size 256k expire 30m server test-1 localhost:80 * in one terminal, keep tailing the (still nonexistent) haproxy log file: tail -F /var/log/haproxy.log * in another terminal, restart haproxy: sudo systemctl restart haproxy * The haproxy log will become live, and already show errors: Jan 24 19:22:23 cosmic-haproxy-1804069 haproxy[2286]: [WARNING] 023/191958 (2286) : Exiting Master process... Jan 24 19:22:23 cosmic-haproxy-1804069 haproxy[2286]: [ALERT] 023/191958 (2286) : Current worker 2287 exited with code 143 Jan 24 19:22:23 cosmic-haproxy-1804069 haproxy[2286]: [WARNING] 023/191958 (2286) : All workers exited. Exiting... (143) * Run wget to try to fetch the apache frontpage, via haproxy, limited to one attempt. It will fail: $ wget -t1 http://localhost:8080 --2019-01-24 19:23:51-- http://localhost:8080/ Resolving localhost (localhost)... 127.0.0.1 Connecting to localhost (localhost)|127.0.0.1|:8080... connected. HTTP request sent, awaiting response... No data received. Giving up. $ echo $? 4 * the haproxy logs will show errors: Jan 24 19:24:36 cosmic-haproxy-1804069 haproxy[6411]: [ALERT] 023/192351 (6411) : Current worker 6412 exited with code 135 Jan 24 19:24:36 cosmic-haproxy-1804069 haproxy[6411]: [ALERT] 023/192351 (6411) : exit-on-failure: killing every workers with SIGTERM Jan 24 19:24:36 cosmic-haproxy-1804069 haproxy[6411]: [WARNING] 023/192351 (6411) : All workers exited. Exiting... (135) * Update the haproxy package and try the wget one more time. This time it will work, and the haproxy logs will stay silent: $ wget -t1 http://localhost:8080 --2019-01-24 19:26:14-- http://localhost:8080/ Resolving localhost (localhost)... 127.0.0.1 Connecting to localhost (localhost)|127.0.0.1|:8080... connected. HTTP request sent, awaiting response... 200 OK Length: 10918 (11K) [text/html] Saving to: ‘index.html’ index.html 100%[>] 10.66K --.-KB/sin 0s 2019-01-24 19:26:14 (75.3 MB/s) - ‘index.html’ saved [10918/10918] [Regression Potential] Patch was applied upstream in 1.8.15 and is available in the same form in the latest 1.8.17 release. The patch is a bit low level, but seems to have been well understood. [Other Info] After writing the testing instructions for this bug, I decided they could be easily converted to a DEP8 test, which I did and included in this SRU. This new test, very simple but effective, shows that arm64 is working, and that the other architectures didn't break. [Original Description] This fault was reported via the haproxy mailing list https://www.mail- archive.com/hapr...@formilux.org/msg31749.html And then patched in the haproxy code here https://github.com/haproxy/haproxy/commit/52dabbc4fad338233c7f0c96f977a43f8f81452a Without this patch haproxy is not functional on aarch64/arm64. Experimental deployments of openstack-ansible on arm64 fail because of this bug, and without a fix applied to the ubuntu
[Ubuntu-ha] [Bug 1755061] Re: HAProxyContext on Ubuntu 14.04 generates config that fails to start on boot
The version of haproxy in the proposed pocket of Trusty that was purported to fix this bug report has been removed because the bugs that were to be fixed by the upload were not verified in a timely (105 days) fashion. ** Tags removed: verification-needed-trusty ** Changed in: haproxy (Ubuntu Trusty) Status: Fix Committed => Won't Fix -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to haproxy in Ubuntu. https://bugs.launchpad.net/bugs/1755061 Title: HAProxyContext on Ubuntu 14.04 generates config that fails to start on boot Status in haproxy package in Ubuntu: Fix Released Status in haproxy source package in Trusty: Won't Fix Bug description: [Impact] Valid haproxy configuration directives don't work on trusty as /run/haproxy does not survive reboots and is not-recreated on daemon start. [Test Case] sudo apt install haproxy configure /etc/haproxy.cfg with a admin socket in /run/haproxy: global log /var/lib/haproxy/dev/log local0 log /var/lib/haproxy/dev/log local1 notice maxconn 2 user haproxy group haproxy spread-checks 0 stats socket /var/run/haproxy/admin.sock mode 600 level admin stats timeout 2m Restart haproxy (will fail as /{,var/}run/haproxy does not exist) [Regression Potential] Minimal - same fix is in later package revisions [Original Bug Report] While testing upgrades of an Ubuntu 14.04 deployment of OpenStack from ~15.04 to 17.11 charms, I noticed that a number of the OpenStack charmed services failed to start haproxy when I rebooted their units: cinder, glance, keystone, neutron-api, nova-cloud-controller, and swift-proxy. The following was in /var/log/boot.log: [ALERT] 069/225906 (1100) : cannot bind socket for UNIX listener (/var/run/haproxy/admin.sock). Aborting. [ALERT] 069/225906 (1100) : [/usr/sbin/haproxy.main()] Some protocols failed to start their listeners! Exiting. * Starting haproxy haproxy [fail] The charm created /var/run/haproxy, but since /var/run (really /run) is a tmpfs, this did not survive the reboot and so haproxy could not create the socket. I compared the haproxy.cfg the charm creates with the default config shipped by the Ubuntu 16.04 haproxy package, and it seems that charmhelpers/contrib/openstack/templates/haproxy.cfg is closely based on the package, including the admin.sock directive. However, on Ubuntu 16.04, /etc/init.d/haproxy ensures that /var/run/haproxy exists before it starts haproxy: [agnew(work)] diff -u haproxy-1.4.24/debian/haproxy.init haproxy-1.6.3/debian/haproxy.init --- haproxy-1.4.24/debian/haproxy.init 2015-12-16 03:55:29.0 +1300 +++ haproxy-1.6.3/debian/haproxy.init 2015-12-31 20:10:38.0 +1300 [...] @@ -50,6 +41,10 @@ haproxy_start() { + [ -d "$RUNDIR" ] || mkdir "$RUNDIR" + chown haproxy:haproxy "$RUNDIR" + chmod 2775 "$RUNDIR" + check_haproxy_config start-stop-daemon --quiet --oknodo --start --pidfile "$PIDFILE" \ [...] charm-helpers or the OpenStack charms or both should be updated so that haproxy will start on boot when running on Ubuntu 14.04. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/haproxy/+bug/1755061/+subscriptions ___ Mailing list: https://launchpad.net/~ubuntu-ha Post to : ubuntu-ha@lists.launchpad.net Unsubscribe : https://launchpad.net/~ubuntu-ha More help : https://help.launchpad.net/ListHelp
[Ubuntu-ha] [Bug 1755061] Proposed package removed from archive
The version of haproxy in the proposed pocket of Trusty that was purported to fix this bug report has been removed because the bugs that were to be fixed by the upload were not verified in a timely (105 days) fashion. ** Tags removed: verification-needed -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to haproxy in Ubuntu. https://bugs.launchpad.net/bugs/1755061 Title: HAProxyContext on Ubuntu 14.04 generates config that fails to start on boot Status in haproxy package in Ubuntu: Fix Released Status in haproxy source package in Trusty: Won't Fix Bug description: [Impact] Valid haproxy configuration directives don't work on trusty as /run/haproxy does not survive reboots and is not-recreated on daemon start. [Test Case] sudo apt install haproxy configure /etc/haproxy.cfg with a admin socket in /run/haproxy: global log /var/lib/haproxy/dev/log local0 log /var/lib/haproxy/dev/log local1 notice maxconn 2 user haproxy group haproxy spread-checks 0 stats socket /var/run/haproxy/admin.sock mode 600 level admin stats timeout 2m Restart haproxy (will fail as /{,var/}run/haproxy does not exist) [Regression Potential] Minimal - same fix is in later package revisions [Original Bug Report] While testing upgrades of an Ubuntu 14.04 deployment of OpenStack from ~15.04 to 17.11 charms, I noticed that a number of the OpenStack charmed services failed to start haproxy when I rebooted their units: cinder, glance, keystone, neutron-api, nova-cloud-controller, and swift-proxy. The following was in /var/log/boot.log: [ALERT] 069/225906 (1100) : cannot bind socket for UNIX listener (/var/run/haproxy/admin.sock). Aborting. [ALERT] 069/225906 (1100) : [/usr/sbin/haproxy.main()] Some protocols failed to start their listeners! Exiting. * Starting haproxy haproxy [fail] The charm created /var/run/haproxy, but since /var/run (really /run) is a tmpfs, this did not survive the reboot and so haproxy could not create the socket. I compared the haproxy.cfg the charm creates with the default config shipped by the Ubuntu 16.04 haproxy package, and it seems that charmhelpers/contrib/openstack/templates/haproxy.cfg is closely based on the package, including the admin.sock directive. However, on Ubuntu 16.04, /etc/init.d/haproxy ensures that /var/run/haproxy exists before it starts haproxy: [agnew(work)] diff -u haproxy-1.4.24/debian/haproxy.init haproxy-1.6.3/debian/haproxy.init --- haproxy-1.4.24/debian/haproxy.init 2015-12-16 03:55:29.0 +1300 +++ haproxy-1.6.3/debian/haproxy.init 2015-12-31 20:10:38.0 +1300 [...] @@ -50,6 +41,10 @@ haproxy_start() { + [ -d "$RUNDIR" ] || mkdir "$RUNDIR" + chown haproxy:haproxy "$RUNDIR" + chmod 2775 "$RUNDIR" + check_haproxy_config start-stop-daemon --quiet --oknodo --start --pidfile "$PIDFILE" \ [...] charm-helpers or the OpenStack charms or both should be updated so that haproxy will start on boot when running on Ubuntu 14.04. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/haproxy/+bug/1755061/+subscriptions ___ Mailing list: https://launchpad.net/~ubuntu-ha Post to : ubuntu-ha@lists.launchpad.net Unsubscribe : https://launchpad.net/~ubuntu-ha More help : https://help.launchpad.net/ListHelp
[Ubuntu-ha] [Bug 1744062] Re: [SRU] L3 HA: multiple agents are active at the same time
Hello Corey, or anyone else affected, Accepted keepalived into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/keepalived/1:1.2.24-1ubuntu0.16.04.1 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance! ** Changed in: keepalived (Ubuntu Xenial) Status: Triaged => Fix Committed ** Tags added: verification-needed-xenial -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to keepalived in Ubuntu. https://bugs.launchpad.net/bugs/1744062 Title: [SRU] L3 HA: multiple agents are active at the same time Status in Ubuntu Cloud Archive: Triaged Status in Ubuntu Cloud Archive mitaka series: Triaged Status in Ubuntu Cloud Archive queens series: Fix Committed Status in neutron: New Status in keepalived package in Ubuntu: Fix Released Status in neutron package in Ubuntu: Invalid Status in keepalived source package in Xenial: Fix Committed Status in neutron source package in Xenial: Invalid Status in keepalived source package in Bionic: Fix Released Status in neutron source package in Bionic: Invalid Bug description: [Impact] This is the same issue reported in https://bugs.launchpad.net/neutron/+bug/1731595, however that is marked as 'Fix Released' and the issue is still occurring and I can't change back to 'New' so it seems best to just open a new bug. It seems as if this bug surfaces due to load issues. While the fix provided by Venkata in https://bugs.launchpad.net/neutron/+bug/1731595 (https://review.openstack.org/#/c/522641/) should help clean things up at the time of l3 agent restart, issues seem to come back later down the line in some circumstances. xavpaice mentioned he saw multiple routers active at the same time when they had 464 routers configured on 3 neutron gateway hosts using L3HA, and each router was scheduled to all 3 hosts. However, jhebden mentions that things seem stable at the 400 L3HA router mark, and it's worth noting this is the same deployment that xavpaice was referring to. keepalived has a patch upstream in 1.4.0 that provides a fix for removing left-over addresses if keepalived aborts. That patch will be cherry-picked to Ubuntu keepalived packages. [Test Case] The following SRU process will be followed: https://wiki.ubuntu.com/OpenStackUpdates In order to avoid regression of existing consumers, the OpenStack team will run their continuous integration test against the packages that are in -proposed. A successful run of all available tests will be required before the proposed packages can be let into -updates. The OpenStack team will be in charge of attaching the output summary of the executed tests. The OpenStack team members will not mark ‘verification-done’ until this has happened. [Regression Potential] The regression potential is lowered as the fix is cherry-picked without change from upstream. In order to mitigate the regression potential, the results of the aforementioned tests are attached to this bug. [Discussion] To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1744062/+subscriptions ___ Mailing list: https://launchpad.net/~ubuntu-ha Post to : ubuntu-ha@lists.launchpad.net Unsubscribe : https://launchpad.net/~ubuntu-ha More help : https://help.launchpad.net/ListHelp
[Ubuntu-ha] [Bug 1744062] Update Released
The verification of the Stable Release Update for keepalived has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions. -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to keepalived in Ubuntu. https://bugs.launchpad.net/bugs/1744062 Title: [SRU] L3 HA: multiple agents are active at the same time Status in Ubuntu Cloud Archive: Triaged Status in Ubuntu Cloud Archive mitaka series: Triaged Status in Ubuntu Cloud Archive queens series: Fix Committed Status in neutron: New Status in keepalived package in Ubuntu: Fix Released Status in neutron package in Ubuntu: Invalid Status in keepalived source package in Xenial: Triaged Status in neutron source package in Xenial: Invalid Status in keepalived source package in Bionic: Fix Committed Status in neutron source package in Bionic: Invalid Bug description: [Impact] This is the same issue reported in https://bugs.launchpad.net/neutron/+bug/1731595, however that is marked as 'Fix Released' and the issue is still occurring and I can't change back to 'New' so it seems best to just open a new bug. It seems as if this bug surfaces due to load issues. While the fix provided by Venkata in https://bugs.launchpad.net/neutron/+bug/1731595 (https://review.openstack.org/#/c/522641/) should help clean things up at the time of l3 agent restart, issues seem to come back later down the line in some circumstances. xavpaice mentioned he saw multiple routers active at the same time when they had 464 routers configured on 3 neutron gateway hosts using L3HA, and each router was scheduled to all 3 hosts. However, jhebden mentions that things seem stable at the 400 L3HA router mark, and it's worth noting this is the same deployment that xavpaice was referring to. keepalived has a patch upstream in 1.4.0 that provides a fix for removing left-over addresses if keepalived aborts. That patch will be cherry-picked to Ubuntu keepalived packages. [Test Case] The following SRU process will be followed: https://wiki.ubuntu.com/OpenStackUpdates In order to avoid regression of existing consumers, the OpenStack team will run their continuous integration test against the packages that are in -proposed. A successful run of all available tests will be required before the proposed packages can be let into -updates. The OpenStack team will be in charge of attaching the output summary of the executed tests. The OpenStack team members will not mark ‘verification-done’ until this has happened. [Regression Potential] The regression potential is lowered as the fix is cherry-picked without change from upstream. In order to mitigate the regression potential, the results of the aforementioned tests are attached to this bug. [Discussion] To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1744062/+subscriptions ___ Mailing list: https://launchpad.net/~ubuntu-ha Post to : ubuntu-ha@lists.launchpad.net Unsubscribe : https://launchpad.net/~ubuntu-ha More help : https://help.launchpad.net/ListHelp
[Ubuntu-ha] [Bug 1744062] Re: [SRU] L3 HA: multiple agents are active at the same time
Hello Corey, or anyone else affected, Accepted keepalived into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/keepalived/1:1.3.9-1ubuntu0.18.04.1 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance! ** Changed in: keepalived (Ubuntu Bionic) Status: Triaged => Fix Committed ** Tags added: verification-needed verification-needed-bionic -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to keepalived in Ubuntu. https://bugs.launchpad.net/bugs/1744062 Title: [SRU] L3 HA: multiple agents are active at the same time Status in Ubuntu Cloud Archive: Triaged Status in Ubuntu Cloud Archive mitaka series: Triaged Status in Ubuntu Cloud Archive ocata series: Triaged Status in Ubuntu Cloud Archive pike series: Triaged Status in Ubuntu Cloud Archive queens series: Triaged Status in neutron: New Status in keepalived package in Ubuntu: Fix Released Status in neutron package in Ubuntu: New Status in keepalived source package in Xenial: Triaged Status in neutron source package in Xenial: New Status in keepalived source package in Bionic: Fix Committed Status in neutron source package in Bionic: New Bug description: [Impact] This is the same issue reported in https://bugs.launchpad.net/neutron/+bug/1731595, however that is marked as 'Fix Released' and the issue is still occurring and I can't change back to 'New' so it seems best to just open a new bug. It seems as if this bug surfaces due to load issues. While the fix provided by Venkata in https://bugs.launchpad.net/neutron/+bug/1731595 (https://review.openstack.org/#/c/522641/) should help clean things up at the time of l3 agent restart, issues seem to come back later down the line in some circumstances. xavpaice mentioned he saw multiple routers active at the same time when they had 464 routers configured on 3 neutron gateway hosts using L3HA, and each router was scheduled to all 3 hosts. However, jhebden mentions that things seem stable at the 400 L3HA router mark, and it's worth noting this is the same deployment that xavpaice was referring to. keepalived has a patch upstream in 1.4.0 that provides a fix for removing left-over addresses if keepalived aborts. That patch will be cherry-picked to Ubuntu keepalived packages. [Test Case] The following SRU process will be followed: https://wiki.ubuntu.com/OpenStackUpdates In order to avoid regression of existing consumers, the OpenStack team will run their continuous integration test against the packages that are in -proposed. A successful run of all available tests will be required before the proposed packages can be let into -updates. The OpenStack team will be in charge of attaching the output summary of the executed tests. The OpenStack team members will not mark ‘verification-done’ until this has happened. [Regression Potential] The regression potential is lowered as the fix is cherry-picked without change from upstream. In order to mitigate the regression potential, the results of the aforementioned tests are attached to this bug. [Discussion] To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1744062/+subscriptions ___ Mailing list: https://launchpad.net/~ubuntu-ha Post to : ubuntu-ha@lists.launchpad.net Unsubscribe : https://launchpad.net/~ubuntu-ha More help : https://help.launchpad.net/ListHelp
[Ubuntu-ha] [Bug 1316970] Update Released
The verification of the Stable Release Update for pacemaker has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions. -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to pacemaker in Ubuntu. https://bugs.launchpad.net/bugs/1316970 Title: g_dbus memory leak in lrmd Status in pacemaker package in Ubuntu: Fix Released Status in pacemaker source package in Trusty: Fix Released Bug description: [Impact] lrmd daemon with upstart resource has memory leak in Trusty affected to pacemaker 1.1.10. affected to glib2.0 2.40.2-0ubuntu1 >> for glib2.0 created new lp [1] Please note that patch for pacemaker is created myself. [Test Case] https://pastebin.ubuntu.com/p/fqK6Cx3SKK/ you can check memory leak with this script [Regression] Restarting daemon after upgrading this pkg will be needed. this patch adds free for non-freed dynamic allocated memory. so it solves memory leak. [Others] this patch is from my self with testing. Please review carefully if it is ok. [1] https://bugs.launchpad.net/ubuntu/+source/glib2.0/+bug/1750741 [Original Description] I'm running Pacemaker 1.1.10+git20130802-1ubuntu1 on Ubuntu Saucy (13.10) and have encountered a memory leak in lrmd. The details of the bug are covered here in this thread (http://oss.clusterlabs.org/pipermail/pacemaker/2014-May/021689.html) but to summarise, the Pacemaker developers believe the leak is caused by the g_dbus API, the use of which was removed in Pacemaker 1.11. I've also attached the Valgrind output from the run that exposed the issue. Given that this issue affects production stability (a periodic restart of Pacemaker is required), will a version of 1.11 be released for Trusty? (I'm happy to upgrade the OS to Trusty to get it). If not, can you advise which version of the OS will be the first to take 1.11 please? To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1316970/+subscriptions ___ Mailing list: https://launchpad.net/~ubuntu-ha Post to : ubuntu-ha@lists.launchpad.net Unsubscribe : https://launchpad.net/~ubuntu-ha More help : https://help.launchpad.net/ListHelp
[Ubuntu-ha] [Bug 1316970] Re: g_dbus memory leak in lrmd
Hello Greg, or anyone else affected, Accepted pacemaker into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/pacemaker/1.1.10+git20130802-1ubuntu2.5 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-trusty to verification-done-trusty. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-trusty. In either case, without details of your testing we will not be able to proceed. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance! ** Changed in: pacemaker (Ubuntu Trusty) Status: In Progress => Fix Committed ** Tags added: verification-needed verification-needed-trusty -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to pacemaker in Ubuntu. https://bugs.launchpad.net/bugs/1316970 Title: g_dbus memory leak in lrmd Status in pacemaker package in Ubuntu: Fix Released Status in pacemaker source package in Trusty: Fix Committed Bug description: [Impact] lrmd daemon with upstart resource has memory leak in Trusty affected to pacemaker 1.1.10. affected to glib2.0 2.40.2-0ubuntu1 >> for glib2.0 created new lp [1] Please note that patch for pacemaker is created myself. [Test Case] https://pastebin.ubuntu.com/p/fqK6Cx3SKK/ you can check memory leak with this script [Regression] Restarting daemon after upgrading this pkg will be needed. this patch adds free for non-freed dynamic allocated memory. so it solves memory leak. [Others] this patch is from my self with testing. Please review carefully if it is ok. [1] https://bugs.launchpad.net/ubuntu/+source/glib2.0/+bug/1750741 [Original Description] I'm running Pacemaker 1.1.10+git20130802-1ubuntu1 on Ubuntu Saucy (13.10) and have encountered a memory leak in lrmd. The details of the bug are covered here in this thread (http://oss.clusterlabs.org/pipermail/pacemaker/2014-May/021689.html) but to summarise, the Pacemaker developers believe the leak is caused by the g_dbus API, the use of which was removed in Pacemaker 1.11. I've also attached the Valgrind output from the run that exposed the issue. Given that this issue affects production stability (a periodic restart of Pacemaker is required), will a version of 1.11 be released for Trusty? (I'm happy to upgrade the OS to Trusty to get it). If not, can you advise which version of the OS will be the first to take 1.11 please? To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1316970/+subscriptions ___ Mailing list: https://launchpad.net/~ubuntu-ha Post to : ubuntu-ha@lists.launchpad.net Unsubscribe : https://launchpad.net/~ubuntu-ha More help : https://help.launchpad.net/ListHelp
[Ubuntu-ha] [Bug 1740892] Update Released
The verification of the Stable Release Update for corosync has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions. -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to corosync in Ubuntu. https://bugs.launchpad.net/bugs/1740892 Title: corosync upgrade on 2018-01-02 caused pacemaker to fail Status in OpenStack hacluster charm: Invalid Status in corosync package in Ubuntu: Fix Released Status in pacemaker package in Ubuntu: Fix Released Status in corosync source package in Trusty: Won't Fix Status in pacemaker source package in Trusty: Won't Fix Status in corosync source package in Xenial: Fix Released Status in pacemaker source package in Xenial: Fix Released Status in corosync source package in Artful: Fix Released Status in pacemaker source package in Artful: Fix Released Status in corosync source package in Bionic: Fix Released Status in corosync package in Debian: New Bug description: [Impact] When corosync and pacemaker are both installed, a corosync upgrade caused pacemaker to fail. pacemaker will need to be restarted manually to work again, it won't recover by itself. [Test Case] 1) Have corosync (< 2.3.5-3ubuntu2) and pacemaker (< 1.1.14-2ubuntu1.3) installed 2) Make sure corosync & pacemaker are running via systemctl status cmd. 3) Upgrade corosync 4) Look corosync and pacemaker via systemctl status cmd again. You will notice pacemaker is dead (inactive) and doesn't recover, unless a systemctl start pacemaker is done manually. [Regression Potential] Regression potential is low, it doesn't change corosync/pacemaker core functionality. This patch make sure thing goes smoother at the packaging level during a corosync upgrade where pacemaker is installed/involved. This can also be useful in particular in situation where the system has "unattended-upgrades" enable (software upgrades without supervision), and no sysadmin available to start pacemaker manually because this isn't a schedule maintenance. For the symbol tag change in Artful to (optional), please refer yourself to comment #60 from slangasek. For the asctime change in Artful, please refer yourself to comment #51 & comment #52. Note that both Artful changes in pacemaker above are only necessary for the package to build (even as-is without this patch). They aren't a requirement for the patch the work, but for the src pkg to build. [Other Info] XENIAL Merge-proposal: https://code.launchpad.net/~nacc/ubuntu/+source/corosync/+git/corosync/+merge/336338 https://code.launchpad.net/~nacc/ubuntu/+source/pacemaker/+git/pacemaker/+merge/336339 [Original Description] During upgrades on 2018-01-02, corosync and it's libs were upgraded: (from a trusty/mitaka cloud) Upgrade: libcmap4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), corosync:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libcfg6:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libcpg4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libquorum5:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libcorosync-common4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libsam4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libvotequorum6:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libtotem-pg5:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4) During this process, it appears that pacemaker service is restarted and it errors: syslog:Jan 2 16:09:33 juju-machine-0-lxc-4 pacemakerd[1994]: notice: crm_update_peer_state: pcmk_quorum_notification: Node juju-machine-1-lxc-3[1001] - state is now lost (was member) syslog:Jan 2 16:09:34 juju-machine-0-lxc-4 pacemakerd[1994]: notice: crm_update_peer_state: pcmk_quorum_notification: Node juju-machine-1-lxc-3[1001] - state is now member (was lost) syslog:Jan 2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]:error: cfg_connection_destroy: Connection destroyed syslog:Jan 2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]: notice: pcmk_shutdown_worker: Shuting down Pacemaker syslog:Jan 2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]: notice: stop_child: Stopping crmd: Sent -15 to process 2050 syslog:Jan 2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]:error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) syslog:Jan 2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]:error: mcp_cpg_destroy: Connection destroyed Also affected xenial/ocata To manage notifications about this bug go to: https://bugs.launchpad.net/charm-hacluster/+bug/1740892/+subscriptions ___ Mailing list:
[Ubuntu-ha] [Bug 1740892] Re: corosync upgrade on 2018-01-02 caused pacemaker to fail
Hello Drew, or anyone else affected, Accepted pacemaker into artful-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/pacemaker/1.1.17+really1.1.16-1ubuntu2 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-artful to verification-done-artful. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-artful. In either case, without details of your testing we will not be able to proceed. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance! ** Changed in: pacemaker (Ubuntu Artful) Status: In Progress => Fix Committed ** Changed in: corosync (Ubuntu Xenial) Status: In Progress => Fix Committed ** Tags added: verification-needed-xenial -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to corosync in Ubuntu. https://bugs.launchpad.net/bugs/1740892 Title: corosync upgrade on 2018-01-02 caused pacemaker to fail Status in OpenStack hacluster charm: Invalid Status in corosync package in Ubuntu: Fix Released Status in pacemaker package in Ubuntu: Fix Released Status in corosync source package in Trusty: Won't Fix Status in pacemaker source package in Trusty: Won't Fix Status in corosync source package in Xenial: Fix Committed Status in pacemaker source package in Xenial: Fix Committed Status in corosync source package in Artful: Fix Committed Status in pacemaker source package in Artful: Fix Committed Status in corosync source package in Bionic: Fix Released Status in corosync package in Debian: New Bug description: [Impact] When corosync and pacemaker are both installed, a corosync upgrade caused pacemaker to fail. pacemaker will need to be restarted manually to work again, it won't recover by itself. [Test Case] 1) Have corosync (< 2.3.5-3ubuntu2) and pacemaker (< 1.1.14-2ubuntu1.3) installed 2) Make sure corosync & pacemaker are running via systemctl status cmd. 3) Upgrade corosync 4) Look corosync and pacemaker via systemctl status cmd again. You will notice pacemaker is dead (inactive) and doesn't recover, unless a systemctl start pacemaker is done manually. [Regression Potential] Regression potential is low, it doesn't change corosync/pacemaker core functionality. This patch make sure thing goes smoother at the packaging level during a corosync upgrade where pacemaker is installed/involved. This can also be useful in particular in situation where the system has "unattended-upgrades" enable (software upgrades without supervision), and no sysadmin available to start pacemaker manually because this isn't a schedule maintenance. For the symbol tag change in Artful to (optional), please refer yourself to comment #60 from slangasek. For the asctime change in Artful, please refer yourself to comment #51 & comment #52. Note that both Artful changes in pacemaker above are only necessary for the package to build (even as-is without this patch). They aren't a requirement for the patch the work, but for the src pkg to build. [Other Info] XENIAL Merge-proposal: https://code.launchpad.net/~nacc/ubuntu/+source/corosync/+git/corosync/+merge/336338 https://code.launchpad.net/~nacc/ubuntu/+source/pacemaker/+git/pacemaker/+merge/336339 [Original Description] During upgrades on 2018-01-02, corosync and it's libs were upgraded: (from a trusty/mitaka cloud) Upgrade: libcmap4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), corosync:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libcfg6:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libcpg4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libquorum5:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libcorosync-common4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libsam4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libvotequorum6:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libtotem-pg5:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4) During this process, it appears that pacemaker service is restarted and it errors: syslog:Jan 2 16:09:33 juju-machine-0-lxc-4 pacemakerd[1994]: notice: crm_update_peer_state: pcmk_quorum_notification: Node juju-machine-1-lxc-3[1001] - state is now lost (was member) syslog:Jan 2 16:09:34 juju-machine-0-lxc-4 pacemakerd[1994]: notice: crm_update_peer_state: pcmk_quorum_notification: Node juju-machine-1-lxc-3[1001] - state is now member (was lost) syslog:Jan 2 16:14:32 juju-machine-0-lxc-4
[Ubuntu-ha] [Bug 1740892] Re: corosync upgrade on 2018-01-02 caused pacemaker to fail
Hello Drew, or anyone else affected, Accepted corosync into artful-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/corosync/2.4.2-3ubuntu0.17.10.1 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-artful to verification-done-artful. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-artful. In either case, without details of your testing we will not be able to proceed. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance! ** Changed in: corosync (Ubuntu Artful) Status: In Progress => Fix Committed ** Tags added: verification-needed verification-needed-artful -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to corosync in Ubuntu. https://bugs.launchpad.net/bugs/1740892 Title: corosync upgrade on 2018-01-02 caused pacemaker to fail Status in OpenStack hacluster charm: Invalid Status in corosync package in Ubuntu: Fix Released Status in pacemaker package in Ubuntu: Fix Released Status in corosync source package in Trusty: Won't Fix Status in pacemaker source package in Trusty: Won't Fix Status in corosync source package in Xenial: In Progress Status in pacemaker source package in Xenial: In Progress Status in corosync source package in Artful: Fix Committed Status in pacemaker source package in Artful: In Progress Status in corosync source package in Bionic: Fix Released Status in corosync package in Debian: New Bug description: [Impact] When corosync and pacemaker are both installed, a corosync upgrade caused pacemaker to fail. pacemaker will need to be restarted manually to work again, it won't recover by itself. [Test Case] 1) Have corosync (< 2.3.5-3ubuntu2) and pacemaker (< 1.1.14-2ubuntu1.3) installed 2) Make sure corosync & pacemaker are running via systemctl status cmd. 3) Upgrade corosync 4) Look corosync and pacemaker via systemctl status cmd again. You will notice pacemaker is dead (inactive) and doesn't recover, unless a systemctl start pacemaker is done manually. [Regression Potential] Regression potential is low, it doesn't change corosync/pacemaker core functionality. This patch make sure thing goes smoother at the packaging level during a corosync upgrade where pacemaker is installed/involved. This can also be useful in particular in situation where the system has "unattended-upgrades" enable (software upgrades without supervision), and no sysadmin available to start pacemaker manually because this isn't a schedule maintenance. For the symbol tag change in Artful to (optional), please refer yourself to comment #60 from slangasek. For the asctime change in Artful, please refer yourself to comment #51 & comment #52. Note that both Artful changes in pacemaker above are only necessary for the package to build (even as-is without this patch). They aren't a requirement for the patch the work, but for the src pkg to build. [Other Info] XENIAL Merge-proposal: https://code.launchpad.net/~nacc/ubuntu/+source/corosync/+git/corosync/+merge/336338 https://code.launchpad.net/~nacc/ubuntu/+source/pacemaker/+git/pacemaker/+merge/336339 [Original Description] During upgrades on 2018-01-02, corosync and it's libs were upgraded: (from a trusty/mitaka cloud) Upgrade: libcmap4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), corosync:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libcfg6:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libcpg4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libquorum5:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libcorosync-common4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libsam4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libvotequorum6:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libtotem-pg5:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4) During this process, it appears that pacemaker service is restarted and it errors: syslog:Jan 2 16:09:33 juju-machine-0-lxc-4 pacemakerd[1994]: notice: crm_update_peer_state: pcmk_quorum_notification: Node juju-machine-1-lxc-3[1001] - state is now lost (was member) syslog:Jan 2 16:09:34 juju-machine-0-lxc-4 pacemakerd[1994]: notice: crm_update_peer_state: pcmk_quorum_notification: Node juju-machine-1-lxc-3[1001] - state is now member (was lost) syslog:Jan 2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]:error: cfg_connection_destroy: Connection destroyed syslog:Jan 2
[Ubuntu-ha] [Bug 1739033] Update Released
The verification of the Stable Release Update for corosync has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions. -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to corosync in Ubuntu. https://bugs.launchpad.net/bugs/1739033 Title: Corosync: Assertion 'sender_node != NULL' failed when bind iface is ready after corosync boots Status in corosync package in Ubuntu: Fix Released Status in corosync source package in Trusty: Fix Committed Status in corosync source package in Xenial: Fix Released Status in corosync source package in Zesty: Fix Released Status in corosync source package in Artful: Fix Released Bug description: [Impact] Corosync sigaborts if it starts before the interface it has to bind to is ready. On boot, if no interface in the bindnetaddr range is up/configured, corosync binds to lo (127.0.0.1). Once an applicable interface is up, corosync crashes with the following error message: corosync: votequorum.c:2019: message_handler_req_exec_votequorum_nodeinfo: Assertion `sender_node != NULL' failed. Aborted (core dumped) The last log entries show that the interface is trying to join the cluster: Dec 19 11:36:05 [22167] xenial-pacemaker corosync debug [TOTEM ] totemsrp.c:2089 entering OPERATIONAL state. Dec 19 11:36:05 [22167] xenial-pacemaker corosync notice [TOTEM ] totemsrp.c:2095 A new membership (169.254.241.10:444) was formed. Members joined: 704573706 During the quorum calculation, the generated nodeid (704573706) for the node is being used instead of the nodeid specified in the configuration file (1), and the assert fails because the nodeid is not present in the member list. Corosync should use the correct nodeid and continue running after the interface is up, as shown in a fixed corosync boot: Dec 19 11:50:56 [4824] xenial-corosync corosync notice [TOTEM ] totemsrp.c:2095 A new membership (169.254.241.10:80) was formed. Members joined: 1 [Environment] Xenial 16.04.3 Packages: ii corosync 2.3.5-3ubuntu1amd64cluster engine daemon and utilities ii libcorosync-common4:amd642.3.5-3ubuntu1amd64cluster engine common library [Test Case] Config: totem { version: 2 member { memberaddr: 169.254.241.10 } member { memberaddr: 169.254.241.20 } transport: udpu crypto_cipher: none crypto_hash: none nodeid: 1 interface { ringnumber: 0 bindnetaddr: 169.254.241.0 mcastport: 5405 ttl: 1 } } quorum { provider: corosync_votequorum expected_votes: 2 } nodelist { node { ring0_addr: 169.254.241.10 nodeid: 1 } node { ring0_addr: 169.254.241.20 nodeid: 2 } } 1. ifdown interface (169.254.241.10) 2. start corosync (/usr/sbin/corosync -f) 3. ifup interface [Regression Potential] This patch affects corosync boot; the regression potential is for other problems during corosync startup and/or configuration parsing. [Other info] # Upstream corosync commit : https://github.com/corosync/corosync/commit/aab55a004bb12ebe78db341dc56759dfe710c1b2 # git describe aab55a004bb12ebe78db341dc56759dfe710c1b2 v2.3.5-45-gaab55a0 # rmadison corosync corosync | 2.3.3-1ubuntu1 | trusty | source, amd64, arm64, armhf, i386, powerpc, ppc64el corosync | 2.3.3-1ubuntu3 | trusty-updates | source, amd64, arm64, armhf, i386, powerpc, ppc64el corosync | 2.3.5-3ubuntu1 | xenial | source, amd64, arm64, armhf, i386, powerpc, ppc64el, s390x corosync | 2.4.2-3build1| zesty | source, amd64, arm64, armhf, i386, ppc64el, s390x corosync | 2.4.2-3build1| artful | source, amd64, arm64, armhf, i386, ppc64el, s390x corosync | 2.4.2-3build1| bionic | source, amd64, arm64, armhf, i386, ppc64el, s390x To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1739033/+subscriptions ___ Mailing list: https://launchpad.net/~ubuntu-ha Post to : ubuntu-ha@lists.launchpad.net Unsubscribe : https://launchpad.net/~ubuntu-ha More help : https://help.launchpad.net/ListHelp