(2014/08/28 4:46), Lennart Poettering wrote:
On Wed, 27.08.14 09:47, HATAYAMA, Daisuke (d.hatay...@jp.fujitsu.com) wrote:

Sounds like the right option here... I have now added a slightly
different patch (1dedb74a2e1d840b531b76b01a76979f3b57456b) that does
this.


Thanks! But this could still hang in very rare case becuase the reset is
done in a child process after fork(). Please consider the case where the
child process continue sleeping after fork() before resetting signal
handlers until it receives SIGTERM. For such reason, my patch resets
SIGTERM signal handler in the parent systemctl side.

Hmm, there's indeed a race here. I add a commit now that will block all
signals before forking, and unblocks them afterwards. The new process
will hence be created with all signals blocked, and we will hence not
lose them until we after we reset the signal handlers...

Hope this makes sense?

(Blocking the signals temporarily, instead of resetting the signal
handlers has the benefit, that we signal masks are per-thread, and not
per-process, like signal handlers are. The code hence stays generic
enough to not break should the call ever get invoked from a threaded
process...)


Thanks! It's smarter than my patch.

Also, I verified your fix by artifically creating the problematic
case by making prctl() sleep 5 seconds using kprobes, and I think this
now works well.

For example, systemctl with the first patch applied only hangs like this

]# strace -ff -F -e 
trace=clone,execve,rt_sigprocmask,signalfd4,poll,kill,waitid systemctl stop 
kdump
execve("/usr/bin/systemctl", ["systemctl", "stop", "kdump"], [/* 29 vars */]) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
poll([{fd=4, events=POLLOUT}], 1, 90000) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=4, events=POLLIN}], 1, 90000) = 1 ([{fd=4, revents=POLLIN}])
poll([{fd=4, events=POLLOUT}], 1, 90000) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=4, events=POLLIN}], 1, 90000) = 1 ([{fd=4, revents=POLLIN}])
poll([{fd=4, events=POLLOUT}], 1, 90000) = 1 ([{fd=4, revents=POLLOUT}])
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
child_tidptr=0x7f41511c7b50) = 9723
poll([{fd=4, events=POLLIN}], 1, 25000) = 1 ([{fd=4, revents=POLLIN}])
poll([{fd=4, events=POLLIN}], 1, 25000) = 1 ([{fd=4, revents=POLLIN}])
poll([{fd=4, events=POLLIN}], 1, 25000) = 1 ([{fd=4, revents=POLLIN}])
poll([{fd=4, events=POLLIN}], 1, 25000) = 1 ([{fd=4, revents=POLLIN}])
poll([{fd=4, events=POLLIN}], 1, 25000) = 1 ([{fd=4, revents=POLLIN}])
kill(9723, SIGTERM)                     = 0
kill(9723, SIGCONT)                     = 0
waitid(P_PID, 9723, Process 9723 attached
 <unfinished ...>
[pid  9723] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=9722, 
si_uid=0} ---
[pid  9723] --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=9722, 
si_uid=0} ---
[pid  9723] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid  9723] execve("/usr/bin/systemd-tty-ask-password-agent", 
["/usr/bin/systemd-tty-ask-passwor"..., "--watch"], [/* 29 vars */]) = 0
[pid  9723] rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
[pid  9723] rt_sigprocmask(SIG_SETMASK, [INT TERM], NULL, 8) = 0
[pid  9723] signalfd4(-1, [INT TERM], 8, O_NONBLOCK|O_CLOEXEC) = 5
[pid  9723] poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 2, 4294967295

while systemctl with the second patch doesn't hang like this

]# strace -ff -F -e 
trace=clone,execve,rt_sigprocmask,signalfd4,poll,kill,waitid systemctl stop 
kdump
execve("/usr/bin/systemctl", ["systemctl", "stop", "kdump"], [/* 29 vars */]) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
poll([{fd=4, events=POLLOUT}], 1, 90000) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=4, events=POLLIN}], 1, 90000) = 1 ([{fd=4, revents=POLLIN}])
poll([{fd=4, events=POLLOUT}], 1, 90000) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=4, events=POLLIN}], 1, 90000) = 1 ([{fd=4, revents=POLLIN}])
poll([{fd=4, events=POLLOUT}], 1, 90000) = 1 ([{fd=4, revents=POLLOUT}])
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
child_tidptr=0x7fc2780dcb50) = 9794
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
poll([{fd=4, events=POLLIN}], 1, 25000) = 1 ([{fd=4, revents=POLLIN}])
poll([{fd=4, events=POLLIN}], 1, 25000) = 1 ([{fd=4, revents=POLLIN}])
poll([{fd=4, events=POLLIN}], 1, 25000) = 1 ([{fd=4, revents=POLLIN}])
poll([{fd=4, events=POLLIN}], 1, 25000) = 1 ([{fd=4, revents=POLLIN}])
poll([{fd=4, events=POLLIN}], 1, 25000) = 1 ([{fd=4, revents=POLLIN}])
kill(9794, SIGTERM)                     = 0
kill(9794, SIGCONT)                     = 0
waitid(P_PID, 9794, Process 9794 attached
 <unfinished ...>
[pid  9794] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid  9794] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=9793, 
si_uid=0} ---
[pid  9794] +++ killed by SIGTERM +++
<... waitid resumed> {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=9794, 
si_status=SIGTERM, si_utime=0, si_stime=0}, WEXITED, NULL) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=9794, 
si_status=SIGTERM, si_utime=0, si_stime=0} ---
+++ exited with 0 +++

--
Thanks.
HATAYAMA, Daisuke

_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

Reply via email to