The MP review seems to get close to approval, added an SRU Template here. ** Description changed:
+ [Impact] + + * Libvirt service reports to be ready, but it has not spawned the libvirt + socket yet. Depending services fail. There was an SRU (#1455608) meant + to fix that but it has many deficiencies (not considering config, + giving up after 10 seconds, being an unconditional sleep 2, taking up + to 2 seconds to a service stop while in pist-start). + + * This is the backport and improvement of a change that was brought to + Yakkety already, but there due to systemd it doesn't matter too much. + + [Test Case] + + * There are two very different ways to "test" this due to the overload + based scenario where this really becomes important. + + * Version #1 - being lame + One can just modify the upstart script and exchange the check for the + socket with /bin/true. + That way it waits forever which allows you to check the log entries, + the abort responsiveness and similar. + + * Version #2 - recreating the case + - This mostly means the system has to be very slow and overloaded. + You can either just slow down the system (e.g. run a qemu with nice + MAX). Stress your host with other things burning CPU/memory/disk. + - we worked with adding autostart guests (see comment #35) but that + actually takes place after the socket is created. The reported acse + had a raid rebuilding. + - TL;DR get your system slow enough so that libvirt exceeds 10 seconds + to start properly (the old limit is 5*2 seconds) + + [Regression Potential] + + * I'd think that there might exist (super rare) cases were the post-start + now does spin forever. But by the definition + http://upstart.ubuntu.com/cookbook/#post-start this is correct. It is + started (yes) but not yet ready. Yet this might appear as a regression + to some. + * Other than that clearly this should fix more issues than it (hopefully + not) causes. + + [Other Info] + + * n/a + + + --- END SRU Template --- + + [ problem description ] sockfile_check_retries is first introduced by #1455608, for preventing the failure case of sockfile not ready, but it was default to a hard- coded value "5", it might be too short for a busy system boot. #1455608 - https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1455608 - [ step to reproduce ] setup a clean install system (Ubuntu Server 14.04.4 LTS), and assemble os disk as RAID-1, boot up some guest instances (count > 10, start-at- boot), force shutdown host by pressing power-button for 3s ~ 5s, or via IPMI command, then power-on afterward. it may sometimes failed to get sockfile ready after in "post-start" script, with an line of error in /var/log/syslog, ==> kernel: [ 313.059830] init: libvirt-bin post-start process (2430) terminated with status 1 <== since there's multiple VMs Read/Write before a non-graceful shutdown, RAID devices need to re-sync after boot, and lead to a slow response, but start-up script for libvirt-bin can only wait 5 cycles, 2 seconds wait for each cycle, so it will timed-out after 10s, and exit with "1". - [ possible solution ] extend the retry times for sockfile waiting, and make it possible to change via editing `/etc/default/libvirt-bin` file. <please see the patch file as attachment> - [ sysinfo ] $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 14.04.4 LTS Release: 14.04 Codename: trusty $ uname -a Linux host2 4.2.0-35-generic #40~14.04.1-Ubuntu SMP Fri Mar 18 16:37:35 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux - [ related issue ] #1386465 - https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1386465 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1571209 Title: Sockfile check retries too short for a busy system boot To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1571209/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
