Hi Reiner, > After having several unpleasant encounters using sysupgrade, I had a > quick glance at the code, after more or less successfully implementing > workarounds for incomplete sysupgrades, resulting in inconsistent systems. > My questions are: > - Is it safe, simply to kill running processes during sysupgrade ? As > there might be services, restarted automatically (by procd ?).
Roughly, the sysupgrade process is as follows: 1) /sbin/sysupgrade (shell script) Parses arguments, sets default, assembles conffiles to backup, runs partials scripts in /lib/upgrade, checks the image, ends with `ubus call system sysupgrade`. All fatal exit conditions (such as invalid image) should be handled here. 2) ubus call system sysupgrade (procd ubus procedure) Invokes a procedure in procd that instructs procd to terminate itself and exec into /sbin/upgraded (which has been copied to a ramdisk at /tmp/root first), turning /tmp/root/sbin/upgraded into pid 1 and releasing the pid 1 use of /. 3) /tmp/root/sbin/upgraded (binary) Functions as pid 1 placeholder to prevent the kernel from panicking. It does two things; keep serving the watchdog to prevent spontaneous resets and executing /lib/upgrade/stage2 4) /lib/upgrade/stage2 (shell script) Assemble backup tarball, write image, append backup tarball to just written image. The exact procedure depends on the platform. So yes, it is safe to simply kill processes in the sense that there will be no procd running anymore at this point which would relaunch them. Merely killing processes instead of shutting them down through their respective init scripts is not ideal though, that eventually needs rework. Ideally sysupgrade should try to cleanly stop as many services through their respective init scripts as possible before invoking stage2, then only do the 'kill TERM; sleep 3; kill KILL' sequence on processes that somehow failed to stop initially (buggy init scripts, timeouts, ...). > - What about a killed process, simply taking some time to shut down ? > (example: squid closing lot of open files on block-device; having > internal shutdown timer 30s by default) Such services are not gracefully handled atm, see above. > - What about open swap file on block-device ? From a cursory look, it does not appear that sysupgrade currently performs any swapoff at all, adding a `swapoff -a` after the process termination would certainly make sense. > - What about mounted block-device for mass storage ? Same as swap, there is no umount handling either as far as I can see. I think this should be added as well along with the swapoff. Since the sysupgrade runs off a pivot_root'ed /tmp/root at this point, all fses should be free to umount. (Might still need two or three cycles due to layered mounts). > - What about (slow) wwan connection, managed by pppd. When killed by > sysupgrade, will netifd restart pppd ? It should not happen. Theoretically it could be that pppd is killed first while netifd is still running, netifd will then try to restart pppd shortly before netifd itself will get killed, but the second KILL loop three seconds later should catch this rare circumstance. However, as discussed above a graceful service shutdown would be better. > As a workaround, before calling sysupgrade I > - explicitly use /etc/init.d/most_services stop > - explicitly kill squid and wait for termination > - explicitly disable swap > - explicitly dismount mounted block-device > - ifdown wwan That certainly makes a lot sense and most of this should probably go into sysupgrade (stage1 aka /sbin/sysupgrade) directly. A slight difficulty is see is how to identify "most_services" but I guess a hardcoded whitelist for things like "dropbear", "openssh" or "telnetd" will do. As for awaiting squid termination - I think if its not already the case, the squid init script should be reworked so that /etc/init.d/squid stop does not return (successfully) before squid is actually stopped. > Before I had several cases, that > sysupgrade -n -v -f /tmp/newfiles.tar.gz /tmp/new_fw.bin > updated all files from /tmp/newfiles.tar.gz, but did not do the flash of > new_fw.bin This is quite strange as appending the /tmp/newfiles.tar.gz archive will only happen after /tmp/new_fw.bin has been written. I could only imagine that the image write procedure itself somehow failed, but appending the archive still worked. How exactly this could fail depends on the platform. Can you provide some more details about the device this issue occurred on? ~ Jo
signature.asc
Description: OpenPGP digital signature
_______________________________________________ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel