Re: [systemd-devel] Regression in v209: SIGKILL sent immediately after SIGTERM

2014-10-24 Thread Stef Walter
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 24.10.2014 01:51, Lennart Poettering wrote:
 On Fri, 12.09.14 11:57, Stef Walter (st...@redhat.com) wrote:
 
 This commit breaks cockpit orderly shutdown:
 
 commit 743970d2ea6d08aa7c7bff8220f6b7702f2b1db7 Author: Lennart
 Poettering lenn...@poettering.net Date:   Fri Feb 7 16:12:09
 2014 +0100
 
 core: one step back again, for nspawn we actually can't wait
 for cgroups running empty since systemd will get exactly zero 
 notifications about it
 
 The children of a cockpit login session all get SIGKILL
 immediately after SIGTERM (less than a tenth of a second apart).
 cockpit-agent and cockpit-session takes more than a tenth of a
 second to shutdown cleanly.
 
 The easiest way to reproduce this here, is a system shutdown.
 Even the 'reboot' that started the system shutdown (executed via
 ssh) gets a SIGKILL before it can exit().
 
 Here's some output from a simple systemtap probe which
 demonstrates this:
 
 https://github.com/cockpit-project/cockpit/issues/1155#issuecomment-55374240


 
Here you can see how a cockpit unit, its login session scope, unit file,
 unit properties:
 
 https://github.com/cockpit-project/cockpit/issues/1155#issuecomment-55381385


 
This commit was introduced in v209, so (for example) the problem is
 present in Fedora 21. Reverting the commit resolves the problem.
 
 Well, this is a whack-a-mole game: there's currently no reliable
 way to get notifications when scopes run empty. In some situations
 we get them in others we don't, hence we better not wait for them.
 
 I am not entirely sure though why this is a problem for cockpit?

Because our entire user session gets a SIGKILL immediately. Which
obviously could lead to data loss.

 Cockpit opens its own PAM sessions, has its own PAM session client 
 code? What's the current logic for ending such a session? Do you 
 properly invoke the PAM session end hooks? Can you elaborate on
 the way cockpit currently uses PAM?

Here's our call to pam_close_session().

https://github.com/cockpit-project/cockpit/blob/master/src/ws/session.c#L974

For local sessions, we use a process called cockpit-session to do
our PAM stack and switch to the right user.

The cockpit-session process starts calls pam_open_session() and then
forks cockpit-bridge, which in turn forks other user processes. When
cockpit-bridge exits or terminates on a signal, cockpit-session
session process calls pam_close_session(). Nothing fancy.

In fact the exact same issue happens when sshd opening/closing the
session and launching cockpit-bridge. So it's unlikely this has
anything to do with our PAM code.

With systemd v209 and later everything in the user session all its
children get SIGKILL immediately after SIGTERM. In fact the two
signals come so fast after each other that they sometimes seem to race
(well at least the logging of the events do ... hard to tell).

Stef
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iEYEARECAAYFAlRJ++oACgkQe/sRCNknZa+PfQCgrV4/3cktyUqxm+IpKvIdkVuV
V0MAnjtooH1SFXctiqHJm+M7aWPiX5eY
=BUCO
-END PGP SIGNATURE-
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Regression in v209: SIGKILL sent immediately after SIGTERM

2014-10-23 Thread Lennart Poettering
On Fri, 12.09.14 11:57, Stef Walter (st...@redhat.com) wrote:

 This commit breaks cockpit orderly shutdown:
 
  commit 743970d2ea6d08aa7c7bff8220f6b7702f2b1db7
  Author: Lennart Poettering lenn...@poettering.net
  Date:   Fri Feb 7 16:12:09 2014 +0100
 
  core: one step back again, for nspawn we actually can't wait for
  cgroups running empty since systemd will get exactly zero
  notifications about it
 
 The children of a cockpit login session all get SIGKILL immediately
 after SIGTERM (less than a tenth of a second apart). cockpit-agent and
 cockpit-session takes more than a tenth of a second to shutdown cleanly.
 
 The easiest way to reproduce this here, is a system shutdown. Even the
 'reboot' that started the system shutdown (executed via ssh) gets a
 SIGKILL before it can exit().
 
 Here's some output from a simple systemtap probe which demonstrates this:
 
 https://github.com/cockpit-project/cockpit/issues/1155#issuecomment-55374240
 
 Here you can see how a cockpit unit, its login session scope, unit file,
 unit properties:
 
 https://github.com/cockpit-project/cockpit/issues/1155#issuecomment-55381385
 
 This commit was introduced in v209, so (for example) the problem is
 present in Fedora 21. Reverting the commit resolves the problem.

Well, this is a whack-a-mole game: there's currently no reliable way
to get notifications when scopes run empty. In some situations we get
them in others we don't, hence we better not wait for them.

I am not entirely sure though why this is a problem for cockpit? 

Cockpit opens its own PAM sessions, has its own PAM session client
code? What's the current logic for ending such a session? Do you
properly invoke the PAM session end hooks? Can you elaborate on the
way cockpit currently uses PAM?

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Regression in v209: SIGKILL sent immediately after SIGTERM

2014-09-12 Thread Stef Walter
This commit breaks cockpit orderly shutdown:

 commit 743970d2ea6d08aa7c7bff8220f6b7702f2b1db7
 Author: Lennart Poettering lenn...@poettering.net
 Date:   Fri Feb 7 16:12:09 2014 +0100

 core: one step back again, for nspawn we actually can't wait for
 cgroups running empty since systemd will get exactly zero
 notifications about it

The children of a cockpit login session all get SIGKILL immediately
after SIGTERM (less than a tenth of a second apart). cockpit-agent and
cockpit-session takes more than a tenth of a second to shutdown cleanly.

The easiest way to reproduce this here, is a system shutdown. Even the
'reboot' that started the system shutdown (executed via ssh) gets a
SIGKILL before it can exit().

Here's some output from a simple systemtap probe which demonstrates this:

https://github.com/cockpit-project/cockpit/issues/1155#issuecomment-55374240

Here you can see how a cockpit unit, its login session scope, unit file,
unit properties:

https://github.com/cockpit-project/cockpit/issues/1155#issuecomment-55381385

This commit was introduced in v209, so (for example) the problem is
present in Fedora 21. Reverting the commit resolves the problem.

Cheers,

Stef

[1] Downstream bug: https://bugzilla.redhat.com/show_bug.cgi?id=1141137
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel