Re: [systemd-devel] Regression in v209: SIGKILL sent immediately after SIGTERM
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 24.10.2014 01:51, Lennart Poettering wrote: On Fri, 12.09.14 11:57, Stef Walter (st...@redhat.com) wrote: This commit breaks cockpit orderly shutdown: commit 743970d2ea6d08aa7c7bff8220f6b7702f2b1db7 Author: Lennart Poettering lenn...@poettering.net Date: Fri Feb 7 16:12:09 2014 +0100 core: one step back again, for nspawn we actually can't wait for cgroups running empty since systemd will get exactly zero notifications about it The children of a cockpit login session all get SIGKILL immediately after SIGTERM (less than a tenth of a second apart). cockpit-agent and cockpit-session takes more than a tenth of a second to shutdown cleanly. The easiest way to reproduce this here, is a system shutdown. Even the 'reboot' that started the system shutdown (executed via ssh) gets a SIGKILL before it can exit(). Here's some output from a simple systemtap probe which demonstrates this: https://github.com/cockpit-project/cockpit/issues/1155#issuecomment-55374240 Here you can see how a cockpit unit, its login session scope, unit file, unit properties: https://github.com/cockpit-project/cockpit/issues/1155#issuecomment-55381385 This commit was introduced in v209, so (for example) the problem is present in Fedora 21. Reverting the commit resolves the problem. Well, this is a whack-a-mole game: there's currently no reliable way to get notifications when scopes run empty. In some situations we get them in others we don't, hence we better not wait for them. I am not entirely sure though why this is a problem for cockpit? Because our entire user session gets a SIGKILL immediately. Which obviously could lead to data loss. Cockpit opens its own PAM sessions, has its own PAM session client code? What's the current logic for ending such a session? Do you properly invoke the PAM session end hooks? Can you elaborate on the way cockpit currently uses PAM? Here's our call to pam_close_session(). https://github.com/cockpit-project/cockpit/blob/master/src/ws/session.c#L974 For local sessions, we use a process called cockpit-session to do our PAM stack and switch to the right user. The cockpit-session process starts calls pam_open_session() and then forks cockpit-bridge, which in turn forks other user processes. When cockpit-bridge exits or terminates on a signal, cockpit-session session process calls pam_close_session(). Nothing fancy. In fact the exact same issue happens when sshd opening/closing the session and launching cockpit-bridge. So it's unlikely this has anything to do with our PAM code. With systemd v209 and later everything in the user session all its children get SIGKILL immediately after SIGTERM. In fact the two signals come so fast after each other that they sometimes seem to race (well at least the logging of the events do ... hard to tell). Stef -BEGIN PGP SIGNATURE- Version: GnuPG v1 iEYEARECAAYFAlRJ++oACgkQe/sRCNknZa+PfQCgrV4/3cktyUqxm+IpKvIdkVuV V0MAnjtooH1SFXctiqHJm+M7aWPiX5eY =BUCO -END PGP SIGNATURE- ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Regression in v209: SIGKILL sent immediately after SIGTERM
On Fri, 12.09.14 11:57, Stef Walter (st...@redhat.com) wrote: This commit breaks cockpit orderly shutdown: commit 743970d2ea6d08aa7c7bff8220f6b7702f2b1db7 Author: Lennart Poettering lenn...@poettering.net Date: Fri Feb 7 16:12:09 2014 +0100 core: one step back again, for nspawn we actually can't wait for cgroups running empty since systemd will get exactly zero notifications about it The children of a cockpit login session all get SIGKILL immediately after SIGTERM (less than a tenth of a second apart). cockpit-agent and cockpit-session takes more than a tenth of a second to shutdown cleanly. The easiest way to reproduce this here, is a system shutdown. Even the 'reboot' that started the system shutdown (executed via ssh) gets a SIGKILL before it can exit(). Here's some output from a simple systemtap probe which demonstrates this: https://github.com/cockpit-project/cockpit/issues/1155#issuecomment-55374240 Here you can see how a cockpit unit, its login session scope, unit file, unit properties: https://github.com/cockpit-project/cockpit/issues/1155#issuecomment-55381385 This commit was introduced in v209, so (for example) the problem is present in Fedora 21. Reverting the commit resolves the problem. Well, this is a whack-a-mole game: there's currently no reliable way to get notifications when scopes run empty. In some situations we get them in others we don't, hence we better not wait for them. I am not entirely sure though why this is a problem for cockpit? Cockpit opens its own PAM sessions, has its own PAM session client code? What's the current logic for ending such a session? Do you properly invoke the PAM session end hooks? Can you elaborate on the way cockpit currently uses PAM? Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Regression in v209: SIGKILL sent immediately after SIGTERM
This commit breaks cockpit orderly shutdown: commit 743970d2ea6d08aa7c7bff8220f6b7702f2b1db7 Author: Lennart Poettering lenn...@poettering.net Date: Fri Feb 7 16:12:09 2014 +0100 core: one step back again, for nspawn we actually can't wait for cgroups running empty since systemd will get exactly zero notifications about it The children of a cockpit login session all get SIGKILL immediately after SIGTERM (less than a tenth of a second apart). cockpit-agent and cockpit-session takes more than a tenth of a second to shutdown cleanly. The easiest way to reproduce this here, is a system shutdown. Even the 'reboot' that started the system shutdown (executed via ssh) gets a SIGKILL before it can exit(). Here's some output from a simple systemtap probe which demonstrates this: https://github.com/cockpit-project/cockpit/issues/1155#issuecomment-55374240 Here you can see how a cockpit unit, its login session scope, unit file, unit properties: https://github.com/cockpit-project/cockpit/issues/1155#issuecomment-55381385 This commit was introduced in v209, so (for example) the problem is present in Fedora 21. Reverting the commit resolves the problem. Cheers, Stef [1] Downstream bug: https://bugzilla.redhat.com/show_bug.cgi?id=1141137 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel