Hello, TL;DR criu works if you disable the journal and stop the .socket before restore, criu appears to be incompatible with systemd-nspawn.
I've been "having fun" with systemd, -nspawn, and the latest criu tools. These are just my research notes. I wanted to share progress, would love any feedback or pointers on places this does not work. Before we begin: localhost criu2 # systemd --version systemd 204 +PAM -LIBWRAP -AUDIT -SELINUX +IMA -SYSVINIT -LIBCRYPTSETUP -GCRYPT -ACL -XZ localhost criu2 # criu -V Version: 0.6 localhost criu2 # uname -a Linux localhost 3.10.0+ #4 SMP Mon Jul 1 13:36:25 PDT 2013 x86_64 Intel(R) Core(TM) i7-2677M CPU @ 1.80GHz GenuineIntel GNU/Linux In the following gist: 1) I setup a socket activated go http server (just plain, no nspawn) 2) start the process via socket activation 3) criu dump it 4) shut down the .socket 5) criu restore, works, yay https://gist.github.com/polvi/310fad0a2a3b0859cfb1 What works: - Application state is dumped and restored successfully. What doesn't work: - The system .socket has to be disabled to because the restore will open the socket. Not sure if there is a work around for this, with the exception of not using socket activation. - The status of the .service is now in a killed state. This is because the dump kills the process when it is done. - Once the process is restored (with the same pid) systemd is confused and is no longer monitoring it. Maybe there is a way to get systemd to realize that the process is running again? - I had to set StandardError=null and StanardOut=null to keep the journal from opening a socket to the service. With-out this the container will not checkpoint, because the criu tools do not allow one end of a socket to be checkpointed. For -nspawn, I'm pretty sure it is incompatible with criu. This took a bunch of fighting, and in the end, triggered a bug in criu. I was able to dump, but not restore the container. In the following gist you'll see where I hit bugs and the fix... 1) I start a busybox while loop in a container using systemd-nspawn 2) nsenter the container and umount /proc/sys/kernel/random/boot_id and /proc/kmsg because they are in a "(deleted)" state, and criu does not really care for that. I guess -nspawn is setting these up. 3) Updated my copy of iproute2, because criu requires the "ip addr save" functionality 4) Now it gets really bad... criu restores mount namespaces using pivot_root. systemd uses MS_ENTER. Initially I was hitting a bug where pivot_root was not working because my container was on a different filesystem. After bind mounting the container filesystem to my running root, the restore triggers a bug. 5) I give up. https://gist.github.com/polvi/d883043343e4db8e16cb What works: - It dumped the state of something! What does not work: - Restoring - Using the containers mount namespace (because of criu and pivot_root) In summary, to make this actually work, I think we'd need to implement checkpoint/restart into systemd itself. With this, we could get around all the journal issue, and maybe even make socket activation work. Containers seem to be their own beast. -Alex _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel