[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-09-12 Thread Launchpad Bug Tracker
This bug was fixed in the package lxc - 1.0.0~alpha1-0ubuntu2 --- lxc (1.0.0~alpha1-0ubuntu2) saucy; urgency=low * Add allow-stderr to autopkgtst restrictions as the Ubuntu template uses policy-rc.d to disable some daemons and that causes a message to be printed on stderr

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-08-29 Thread Serge Hallyn
The following kernel patches fixes it for me, will send to lkml: diff --git a/debian.master/changelog b/debian.master/changelog index f8f7a35a..081e666 100644 --- a/debian.master/changelog +++ b/debian.master/changelog @@ -1,3 +1,9 @@ +linux (3.11.0-4.9debug1) saucy; urgency=low + + * debug 1 +

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-08-29 Thread Serge Hallyn
Separately, a patch has been committed to upstream lxc to eliminate any chance of a race stopping the lxc monitor from seeing the container init exit. Note that this doesn't stop the kernel bug from happening. ** Changed in: linux (Ubuntu) Status: Incomplete = Confirmed ** Changed in:

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-07-10 Thread Serge Hallyn
This bug is introduced between v3.7 and v3.8, by commit: af4b8a83add95ef40716401395b44a1b579965f4 pidns: Wait in zap_pid_ns_processes until pid_ns-nr_hashed == 1 ** Also affects: linux (Ubuntu) Importance: Undecided Status: New ** Tags added: bot-stop-nagging -- You received this

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-07-08 Thread Serge Hallyn
Ok, for the record I cannot reproduce this no precise - I've tried with both an 8 core fast precise host, and a 4 core precise kvm guest - but using the same lxc version from the daily ubuntu-lxc ppa. So it appears to be a kernel regression between 3.2 and 3.8. -- You received this bug

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-07-03 Thread Serge Hallyn
There is at least one fundamental bug in start.c's signal_handler, as should be fixed by the below. However, this alone did not fix it for me, so more is wrong. There is a minimal testcase at http://people.canonical.com/~serge/signalfd.c which originally reproduced this bug, then was fixed by

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-07-02 Thread Serge Hallyn
Ok - thanks - I was able to reproduce this in a raring VM with 4 cores. I thought this might be a tiny race window between us blocking signals and creating the signalfd, so I reversed those (a patch which I may yet send upstream) - but that didn't solve the issue. -- You received this bug

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-07-02 Thread Serge Hallyn
It does seem like this must be a kernel bug in epoll+signalfd (or a hard to spot misuse thereof in lxc). When I instrument the signal_handler which is executed when epoll_wait returns a signalfd event, I do get a sigchld for the very first task which is spawned (a test to see if kernel supports

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-06-28 Thread Pavel Bennett
Hey Serge, let me know if that repro worked for you or when you're planning to give it a try. I'm keeping the VM image around in case you need it. What's odd is that I can't even reproduce it with the daily ppa build, which doesn't have the workaround which is in the ubuntu package. Did you

Re: [Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-06-21 Thread Serge Hallyn
Quoting Pavel Bennett (launch...@pavelbennett.com): Hey Serge, were you able to get a reliable repro for this? I have a No, I wasn't (assuming you mean with our workaround in place in the ubuntu package). In comment #8 you said node.js was able to reproduce this. Regarding that, 1. Does that

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-06-21 Thread Pavel Bennett
I can't try Saucy right now, but the repro instructions with kernel versions are in the original post and in #2. We've tried node v0.11.2 as well on Raring and got the repro. Repro summary: Install any of the above kernels, such as the one with the Raring installer, then install lxc from apt.

Re: [Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-06-21 Thread Serge Hallyn
Quoting Pavel Bennett (launch...@pavelbennett.com): I can't try Saucy right now, but the repro instructions with kernel versions are in the original post and in #2. We've tried node v0.11.2 as well on Raring and got the repro. Repro summary: Install any of the above kernels, such as the

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-06-21 Thread Pavel Bennett
Sure, run these inside the container: git clone https://github.com/joyent/node.git --depth 1 cd node ./configure make -j9 sudo make install Then the binary will be at /usr/local/bin/node It's v0.11.3-pre, but should still repro. -- You received this bug notification because you are a member

Re: [Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-06-21 Thread Serge Hallyn
Quoting Pavel Bennett (launch...@pavelbennett.com): Sure, run these inside the container: git clone https://github.com/joyent/node.git --depth 1 cd node ./configure make -j9 sudo make install Then the binary will be at /usr/local/bin/node It's v0.11.3-pre, but should still repro.

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-06-21 Thread Pavel Bennett
I created a VM with Ubuntu Server 13.04 just for this bug. At first, I was able to run the steps outlined above 50 times with no issues. What was I missing? Concurrency! I rebooted the VM after adding 1 more core, and... bingo! Zombies on the 3rd try. The VM disk image I have here should be

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-06-20 Thread Pavel Bennett
Hey Serge, were you able to get a reliable repro for this? I have a reason to upgrade to Raring, and this seems to be the only blocker. We've reproduced the issue with the stock Linux Mint 15. -- You received this bug notification because you are a member of Ubuntu Server Team, which is

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-06-06 Thread Serge Hallyn
** Changed in: lxc (Ubuntu) Status: Incomplete = Confirmed -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1168526 Title: race condition causing lxc to not detect container init

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-04-20 Thread Pavel Bennett
I've also tried it with a C++ app very similar to yours and was unable to repro. There is something about having node.js as the init process running a process.exit(0); js. The init process (node v0.11.0) does exit as ps faux shows it as a zombie and a child of lxc-start. I went back to kernel

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-04-18 Thread Serge Hallyn
The newest kernel I've tests on is 3.8.0-17-generic. I'll need to set up a system with the daily upstream build and re-test. ** Changed in: lxc (Ubuntu) Importance: Undecided = Medium ** Changed in: lxc (Ubuntu) Assignee: (unassigned) = Serge Hallyn (serge-hallyn) -- You received this

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-04-18 Thread Serge Hallyn
No, I cannot reproduce this with the latest upstream kernel build (3.9.0-030900rc7-generic #201304171402) What I did: sudo lxc-create -t ubuntu- n r1 cat exit0.c EOF #include stdlib.h int main() { exit(0); } EOF make exit0 sudo cp exit0 /var/lib/lxc/r0/rootfs/bin/ sudo lxc-start -n r1 --

Re: [Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-04-16 Thread Serge Hallyn
Quoting Pavel Bennett (launch...@pavelbennett.com): Btw, that queueing mode would simply mean not calling epoll_wait until the pid is available. This shouldn't require managing a queue ourselves. Can you think of anything that this would break? No - I had wanted to do this originally, but

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-04-15 Thread Pavel Bennett
Btw, that queueing mode would simply mean not calling epoll_wait until the pid is available. This shouldn't require managing a queue ourselves. Can you think of anything that this would break? Or we could go with the patch you've written, although I haven't looked into why the problem appears to

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-04-13 Thread Pavel Bennett
I should add that these forwarded signal 2 lines are due to me pressing Ctrl+C and are not actually relevant. Have you been able to repro this bug on kernel 3.8.6? I'm thinking how to fix this as lxc_spawn is what gets the pid which is needed by lxc_poll to listen for SIGCHLD from the correct

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-04-12 Thread Pavel Bennett
Precisely which version of lxc were you using? I just put back version 0.9.0-0ubuntu2 (as opposed to the 0.9.0 I built from source) while on kernel 3.7.9-030709-generic and haven't yet run into this issue (I assume that's the patch you mentioned). However, when I update to kernel