From cloud-init point of view the solution now implemented make sense:
to run it before the apt-daily-upgrade. However, I wanted to add that
there are other use cases as well such as SSM documents being executed
on instances. These can be executed in batch at any time and may also
require installation of packages and thus interfere with these
unattended upgrades.
The execution of documents is not linked directly to cloud-init and may
be ran after the instances has been booted, so this falls in the other
category of having some kind queuing system or at least a centralized
way to obtain a lock to be able to use apt. At the moment there are
dozens of different possibilities how to get a mutex to be able to
execute apt, but somehow we couldn't find a bullet proof way that works
*every time*.
So maybe this does not really fit into this ticket, but to address that
this is only a partial fix to a bigger problem.
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to apt in Ubuntu.
https://bugs.launchpad.net/bugs/1693361
Title:
cloud-init sometimes fails on dpkg lock due to concurrent apt-
daily.service execution
Status in APT:
Fix Released
Status in cloud-init:
Fix Released
Status in apt package in Ubuntu:
Invalid
Status in cloud-init package in Ubuntu:
Fix Released
Status in cloud-init source package in Xenial:
Fix Released
Status in cloud-init source package in Yakkety:
Won't Fix
Status in cloud-init source package in Zesty:
Fix Released
Status in cloud-init source package in Artful:
Fix Released
Bug description:
=== Begin SRU Template ===
[Impact]
A cloud-config that contains packages to install (see below) or
'package_upgrade' will run 'apt-get update'. That can sometimes fail as a
result of contention with the apt-daily.service that updates that information.
Cloud-config showing the problem is just like:
$ cat my.yaml
#cloud-config
packages: ['hello']
[Test Case]
lxc-proposed-snapshot is
https://git.launchpad.net/~smoser/cloud-init/+git/sru-info/tree/bin/lxc-proposed-snapshot
It publishes an image to lxd with proposed enabled and cloud-init upgraded.
a.) launch an instance with proposed version of cloud-init and some user-data.
This is platform independent. The test case demonstrates lxd.
$ printf "%s\n%s\n%s\n" "#cloud-config" "packages: ['hello']" \
"package_upgrade: true" > config.yaml
$ release=xenial
$ ref=proposed-$release
$ ./lxc-proposed-snapshot --proposed --publish $release $ref;
b.) start the instance
$ name=$release-1693361
$ lxc launch my-xenial "--config=user.user-data=$(cat config.yaml)
$ sleep 1
$ lxc exec $name -- tail -f /var/log/cloud-init.log
/var/log/cloud-init-output.log
# watch this boot.
c.) Look for evidence of systemd failure
journalctl -o short-precise | grep -i break
journalctl -o short-precise | grep -i order
[Regression Potential]
Regression chance here is low. Its possible that ordering loops
could occur. When that does happen, journalctl will mention it.
Unfortunately
in such cases systemd somewhat randomly picks a service to kil so behavior
is somewhat undefined.
[Other Info]
Upstream commit at
https://git.launchpad.net/cloud-init/commit/?id=11121fe4
=== End SRU Template ===
apt-daily is now a systemd service rather than being invoked by
cron.daily. If one builds a custom AMI it is possible that the apt-
daily.timer will fire during boot. This can fire at the same time
cloud-init is running and if cloud-init loses the race the invocation
of apt (e.g. use of "packages:" in the config) will fail.
There is a lot of discussion online about this change to apt-daily
(e.g. unattended upgrades happening during business hours, delaying
boot, etc.) and discussion of potential systemd changes regarding
timers firing during boot (c.f.
https://github.com/systemd/systemd/issues/5659).
While it would be better to solve this in apt itself, I suggest that
cloud-init be defensive when calling apt and implement some retry
mechanism.
Various instances of people running into this issue:
https://github.com/chef/bento/issues/609
https://clusterhq.atlassian.net/browse/FLOC-4486
https://github.com/boxcutter/ubuntu/issues/73
https://unix.stackexchange.com/questions/315502/how-to-disable-apt-daily-service-on-ubuntu-cloud-vm-image
To manage notifications about this bug go to:
https://bugs.launchpad.net/apt/+bug/1693361/+subscriptions
--
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help : https://help.launchpad.net/ListHelp