Public bug reported:
For the purpose of the repro, my lxc init process is node.js v0.11.0
(built from source) with a single line:
process.exit(0);
When running it in lxc, sometimes lxc doesn't exit. lxc-start remains a
parent of a defunct node process without reaping it or exiting.
I've made a custom build of lxc 0.9.0 to extract more information about
this, adding only an INFO line, as follows:
start.c:
if (ret != sizeof(siginfo)) {
ERROR("unexpected siginfo size");
return -1;
}
+ INFO("got signal %d from pid %d while expecting SIGCHLD(17) from pid
%d | uid = %d, status = %d", siginfo.ssi_signo, siginfo.ssi_pid, *pid,
siginfo.ssi_uid, siginfo.ssi_status);
if (siginfo.ssi_signo != SIGCHLD) {
kill(*pid, siginfo.ssi_signo);
INFO("forwarded signal %d to pid %d", siginfo.ssi_signo, *pid);
return 0;
}
I've tried this with a 3 official kernels. There is one difference in
output.
Kernels 3.7.9, 3.8.6:
Successful case:
lxc-start 1365724008.446 NOTICE lxc_start - '/usr/local/bin/node'
started with pid '19458'
lxc-start 1365724008.446 INFO lxc_console - no console will be used
lxc-start 1365724008.446 INFO lxc_start - got signal 17 from pid
18165 while expecting SIGCHLD(17) from pid 19458 | uid = 0, status = 1
lxc-start 1365724008.446 WARN lxc_start - invalid pid for SIGCHLD
lxc-start 1365724038.306 INFO lxc_start - got signal 17 from pid
19458 while expecting SIGCHLD(17) from pid 19458 | uid = 0, status = 0
lxc-start 1365724038.306 DEBUG lxc_start - container init process
exited
Hanging case:
lxc-start 1365795195.358 NOTICE lxc_start - '/usr/local/bin/node'
started with pid '8650'
lxc-start 1365795195.358 INFO lxc_console - no console will be used
lxc-start 1365795195.358 INFO lxc_start - got signal 17 from pid 8626
while expecting SIGCHLD(17) from pid 8650 | uid = 0, status = 1
lxc-start 1365795195.358 WARN lxc_start - invalid pid for SIGCHLD
lxc-start 1365795333.347 INFO lxc_start - got signal 2 from pid 0
while expecting SIGCHLD(17) from pid 8650 | uid = 0, status = 0
lxc-start 1365795333.347 INFO lxc_start - forwarded signal 2 to pid
8650
Kernel 3.9.0-rc6:
Successful case is the same, but the hanging case changes to just:
lxc-start 1365794343.870 NOTICE lxc_start - '/usr/local/bin/node'
started with pid '3432'
lxc-start 1365794343.870 INFO lxc_console - no console will be used
lxc-start 1365794343.870 INFO lxc_start - got signal 17 from pid 2851
while expecting SIGCHLD(17) from pid 3432 | uid = 0, status = 1
lxc-start 1365794343.870 WARN lxc_start - invalid pid for SIGCHLD
... without forwarding signal 2 (SIGINT).
Notes:
- I'm on Mint 14 Nadia with raring packages, if that helps.
- In all cases, there is signal 17 (SIGCHLD) coming in to lxc-start, but it
comes from a different pid and is ignored by lxc. Any idea what this could be?
This process seems to have been cleaned up and no longer appears in ps aux.
- The lxc-start process should be getting notified with a SIGCHLD from the
child's pid when the child (init process) exits.
- This could be a kernel bug, but it's probably something unique that lxc is
doing to trigger it.
- I've tried other init processes (node.js without the process.exit and a
custom c++ app with a stdout write and exit 0), which greatly reduce the
frequency of this happening.
** Affects: lxc (Ubuntu)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1168526
Title:
race condition causing lxc to not detect container init process exit
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs