[resending with the right systemd-devel address, sorry for that] Here are some thoughts on offline updates resulting from testing the new dnf fedup plugin developed by Will Woods [https://github.com/wgwoods/dnf-plugin-fedup].
I ran an update using dnf fedup and it works (or would have worked, if stuff didn't happen), which is already great for something so simple, but it exposes some shortcomings in the Offline Update spec itself [http://www.freedesktop.org/wiki/Software/systemd/SystemUpdates/]. The main issues are: - what happens when multiple offline mechanisms are present - how is failure handled On my test system, I had packagekit-offline-update.service already present when I installed the plugin and fedup-system-upgrade.service. After running 'dnf fedup download ...' and 'dnf fedup reboot' I saw something like this: Jul 20 21:54:55 fedora22 systemd[1]: ConditionPathExists=/system-update/.fedup-system-upgrade succeeded for fedup-system-up Jul 20 21:54:55 fedora22 systemd[1]: About to execute: /usr/bin/dnf --releasever=${RELEASEVER} fedup upgrade Jul 20 21:54:55 fedora22 systemd[1]: Forked /usr/bin/dnf as 655 Jul 20 21:54:55 fedora22 systemd[1]: fedup-system-upgrade.service changed dead -> start Jul 20 21:54:55 fedora22 systemd[1]: Starting System Upgrade... Jul 20 21:54:55 fedora22 systemd[655]: Executing: /usr/bin/dnf --releasever=rawhide fedup upgrade Jul 20 21:54:55 fedora22 systemd[1]: About to execute: /usr/libexec/pk-offline-update Jul 20 21:54:55 fedora22 systemd[1]: Forked /usr/libexec/pk-offline-update as 657 Jul 20 21:54:55 fedora22 systemd[1]: packagekit-offline-update.service changed dead -> running Jul 20 21:54:55 fedora22 systemd[1]: Job packagekit-offline-update.service/start finished, result=done Jul 20 21:54:55 fedora22 systemd[657]: Executing: /usr/libexec/pk-offline-update Jul 20 21:54:55 fedora22 systemd[1]: Started Updates the operating system whilst offline. Jul 20 21:54:55 fedora22 systemd[1]: Starting Updates the operating system whilst offline... fedup-system-upgrade.service uses an additional flag file which is checked with ConditionPathExists so it will not run if 'dnf fedup reboot' did not create the flag, even if we go into system-upgrade.target. packagekit-offline-update.service does not have anything like this, and is always run in system-upgrade.target. Running two upgrade mechanisms in parallel does not seem like a good idea. (Even if they use a lock file to prevent concurrent access to the rpm database, they are bound to interfere with one another: the first finishes and decides to reboot, or the first updates some packages and messes up the state for the second one...) It seems that *some* mechanism to run only one upgrade mechanism is wanted. The approach that dnf-plugin-fedup uses seems reasonable: it creates the file only when a reboot with 'dnf fedup reboot' is requested. As an alternative we could allow only one upgrade mechanism to be enabled. Dunno. ... continuing ... Jul 20 21:55:00 fedora22 pk-offline-update[657]: percentage 14% Jul 20 21:55:00 fedora22 pk-offline-update[657]: sent msg to plymouth 'Installing Updates - 14%' Jul 20 21:55:00 fedora22 dnf[655]: babl x86_64 0.1.12-3.fc23 @commandline 235 k Jul 20 21:55:00 fedora22 dnf[655]: baekmuk-bdf-fonts noarch 2.2-17.fc23 @commandline 6.9 M Jul 20 21:55:00 fedora22 dnf[655]: baekmuk-ttf-batang-fonts noarch 2.2-39.fc23 @commandline 3.6 M ... Jul 20 21:55:00 fedora22 pk-offline-update[657]: status download Jul 20 21:55:00 fedora22 pk-offline-update[657]: package downloading gstreamer1-1.4.5-1.fc22.x86_64 (fedora) Jul 20 21:55:00 fedora22 pk-offline-update[657]: status finished Jul 20 21:55:00 fedora22 pk-offline-update[657]: writing failed results Jul 20 21:55:00 fedora22 pk-offline-update[657]: failed to update system: cannot download Packages/g/gstreamer1-1.4.5-1.fc2 ... Jul 20 21:55:16 fedora22 systemd[1]: Trying to enqueue job reboot.target/start/replace Jul 20 21:55:16 fedora22 systemd[1]: Job system-update.target/start finished, result=canceled Jul 20 21:55:16 fedora22 systemd[1]: Installed new job system-update.target/stop as 762 ... Jul 20 21:55:16 fedora22 systemd[1]: Spawning new thread for sync Jul 20 21:55:16 fedora22 systemd[1]: Installed new job time-sync.target/stop as 736 Jul 20 21:55:16 fedora22 systemd[1]: Installed new job lvm2-lvmetad.service/stop as 753 Jul 20 21:55:16 fedora22 systemd[1]: Job fedup-system-upgrade.service/start finished, result=canceled Jul 20 21:55:16 fedora22 systemd[1]: Installed new job fedup-system-upgrade.service/stop as 769 Jul 20 21:55:16 fedora22 systemd[1]: Enqueued job reboot.target/start as 658 Jul 20 21:55:16 fedora22 systemd[1]: packagekit-offline-update.service failed. ... Jul 20 21:55:11 fedora22 systemd[1]: packagekit-offline-update.service: main process exited, code=exited, status=1/FAILURE Jul 20 21:55:11 fedora22 systemd[1]: packagekit-offline-update.service changed running -> failed Jul 20 21:55:11 fedora22 systemd[1]: Unit packagekit-offline-update.service entered failed state. Jul 20 21:55:11 fedora22 systemd[1]: Triggering OnFailure= dependencies of packagekit-offline-update.service. Jul 20 21:55:16 fedora22 systemd[1]: Job system-update.target/stop finished, result=done Jul 20 21:55:16 fedora22 systemd[1]: fedup-system-upgrade.service changed start -> stop-sigterm ... Jul 20 21:55:29 fedora22 systemd-journal[514]: Suppressed 978 messages from /system.slice/fedup-system-upgrade.service Jul 20 21:55:41 fedora22 dnf[655]: Upgrading : glibc-common-2.21.90-18.fc24.x86_64 29/3693 Jul 20 21:55:41 fedora22 systemd[1]: Serializing state to /run/systemd Jul 20 21:55:41 fedora22 systemd[1]: Reexecuting. ... now systemd reexecutes multiple time while dnf is updating packages ... then things seems to go wrong ... Jul 20 21:59:20 fedora22 systemd[1]: Looping too fast. Throttling execution a little. Jul 20 21:59:22 fedora22 systemd[1]: Looping too fast. Throttling execution a little. Jul 20 21:59:23 fedora22 systemd[1]: fedup-system-upgrade.service stop-sigterm timed out. Killing. Jul 20 21:59:23 fedora22 systemd[1]: fedup-system-upgrade.service changed stop-sigterm -> stop-sigkill Jul 20 21:59:23 fedora22 dnf[655]: Upgrading : pam-1.2.1-1.fc23.x86_64 263/3693 Jul 20 21:59:23 fedora22 systemd[1]: Child 655 (dnf) died (code=killed, status=9/KILL) Jul 20 21:59:23 fedora22 systemd[1]: Child 655 belongs to fedup-system-upgrade.service Jul 20 21:59:23 fedora22 systemd[1]: fedup-system-upgrade.service: main process exited, code=killed, status=9/KILL Jul 20 21:59:23 fedora22 systemd[1]: fedup-system-upgrade.service changed stop-sigkill -> failed Jul 20 21:59:23 fedora22 systemd[1]: Job fedup-system-upgrade.service/stop finished, result=done Jul 20 21:59:23 fedora22 systemd[1]: Stopped System Upgrade. Jul 20 21:59:23 fedora22 systemd[1]: Unit fedup-system-upgrade.service entered failed state. Jul 20 21:59:23 fedora22 systemd[1]: fedup-system-upgrade.service failed. Jul 20 21:59:23 fedora22 systemd[1]: Rebooting as result of failure. ... reboot seems to proceed normally. Based on the sequence of operations here, it seems that pk-offline-update schedules a reboot on its own when it is unable to complete the download, but it also has OnFailure=reboot.target so the reboot is started a second time when pk-offline-update exits (I should check the code, but I'm too lazy for that atm :)). Also, which is a minor thing, but related: OnFailure=reboot.target seems inferior to FailureAction=reboot. IIRC, the second one uses irreversible transaction and should be more robust. It also is a higher level setting in some sense. OnFailure=reboot.target is taken directly from the spec, so should be changed there first. Also, another related issue: packagekit-offline-update.service has Type=simple. (In the log above it is "started" almost immediately, so system-update.target could be reached while it is still running.) This should be Type=oneshot. It seems that failure handling is already shaky, but I think there more failure modes. Let's say that 'dnf fedup upgrade' didn't work for some reason (missing ConditionPathExists file, dnf installation problem, whatever). Then nothing would remove the /system-update link, and we would reboot, and run system-update.target again, and reboot, and run system-update.target. In general, creating /system-update without a working update service is enough to enter an infinite reboot loop. The spec file says that system-update.target should be removed by the service as early as possible, but it would be more robust to remove it even earlier. ExecStartPre=/bin/rm /system-update would be one option, but it is incompatible with Condition*s, because the service should always run. It don't think it can be removed by the generator, because the fs might still be ro when it runs (?). So maybe a tmpfiles snippet should be used to remove the link. Such a change would mean that the update services should not depend on the symlink being present, and should instead look for their installation data in their own state directory. To summarize, following changes to the spec are proposed: - use Condition* or similar to conditionalize whether a specific upgrade mechanism should run - use Action=reboot - use Type=oneshot - check that logind.Reboot() is not called on failure by the service - services should not look for /systemd-update symlink, and the symlink should be removed by tmpfiles before we even get to the upgrade. Zbyszek _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel