Re: init's children list is long and slows reaping children.

2007-04-07 Thread Oleg Nesterov
On 04/06, Oleg Nesterov wrote: Perhaps, --- t/kernel/exit.c~ 2007-04-06 23:31:31.0 +0400 +++ t/kernel/exit.c 2007-04-06 23:31:57.0 +0400 @@ -275,10 +275,7 @@ static void reparent_to_init(void) remove_parent(current); current-parent = child_reaper(current);

Re: init's children list is long and slows reaping children.

2007-04-07 Thread Eric W. Biederman
Oleg Nesterov [EMAIL PROTECTED] writes: On 04/06, Oleg Nesterov wrote: Perhaps, --- t/kernel/exit.c~ 2007-04-06 23:31:31.0 +0400 +++ t/kernel/exit.c 2007-04-06 23:31:57.0 +0400 @@ -275,10 +275,7 @@ static void reparent_to_init(void) remove_parent(current);

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Jeff Garzik
Linus Torvalds wrote: On Fri, 6 Apr 2007, Jeff Garzik wrote: I would rather change the implementation under the hood to start per-CPU threads on demand, similar to a thread-pool implementation. Boxes with $BigNum CPUs probably won't ever use half of those threads. The counter-argument is

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Davide Libenzi
On Fri, 6 Apr 2007, Linus Torvalds wrote: > > > On Fri, 6 Apr 2007, Davide Libenzi wrote: > > > On Fri, 6 Apr 2007, Linus Torvalds wrote: > > > > > > I don't really see the point. It's not even *true*. A "process" includes > > > more than the shared signal-handling - it would include files

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Linus Torvalds
On Fri, 6 Apr 2007, Jeff Garzik wrote: > > I would rather change the implementation under the hood to start per-CPU > threads on demand, similar to a thread-pool implementation. > > Boxes with $BigNum CPUs probably won't ever use half of those threads. The counter-argument is that boxes with

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Linus Torvalds
On Fri, 6 Apr 2007, Davide Libenzi wrote: > On Fri, 6 Apr 2007, Linus Torvalds wrote: > > > > I don't really see the point. It's not even *true*. A "process" includes > > more than the shared signal-handling - it would include files and fs etc > > too. > > > > So it's actually *more*

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Jeff Garzik
Robin Holt wrote: We have been testing a new larger configuration and we are seeing a very large scan time of init's tsk->children list. In the cases we are seeing, there are numerous kernel processes created for each cpu (ie: events/0 ... events/, xfslogd/0 ... xfslogd/). These are all on the

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Davide Libenzi
On Fri, 6 Apr 2007, Linus Torvalds wrote: > On Fri, 6 Apr 2007, Davide Libenzi wrote: > > > > > > or lets just face it and name it what it is: process_struct ;-) > > > > That'd be fine too! Wonder if Linus would swallow a rename patch like > > that... > > I don't really see the point. It's

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Linus Torvalds
On Fri, 6 Apr 2007, Davide Libenzi wrote: > > > > or lets just face it and name it what it is: process_struct ;-) > > That'd be fine too! Wonder if Linus would swallow a rename patch like > that... I don't really see the point. It's not even *true*. A "process" includes more than the shared

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Davide Libenzi
On Fri, 6 Apr 2007, Ingo Molnar wrote: > > * Davide Libenzi wrote: > > > > > Ohhh, the "signal" struct! Funny name for something that nowadays > > > > has probably no more than a 5% affinity with signal-related tasks > > > > :/ > > > > > > Hmm. I wonder if we should just rename it the

Re: init's children list is long and slows reaping children.

2007-04-06 Thread H. Peter Anvin
Jeremy Fitzhardinge wrote: Eric W. Biederman wrote: I'm guessing the issue is nash just calls wait and doesn't check the returned pid value, assuming it is the only child it forked returning. Which is valid except when you are running as pid == 1. Hm, that's always a bug; a process can

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Jeremy Fitzhardinge
Eric W. Biederman wrote: > I'm guessing the issue is nash just calls wait and doesn't check the > returned pid value, assuming it is the only child it forked returning. > Which is valid except when you are running as pid == 1. > Hm, that's always a bug; a process can always have children it

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Ingo Molnar
* Oleg Nesterov <[EMAIL PROTECTED]> wrote: > Probably it is I who missed something :) > > But why can't we do both changes? I think it is just ugly to use init > to reap the kernel thread. Ok, wait4() can find zombie quickly if we > do the ->children split. But /sbin/init could be swapped

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Oleg Nesterov
On 04/06, Ingo Molnar wrote: > > * Oleg Nesterov <[EMAIL PROTECTED]> wrote: > > > > I'd almost prefer to just not add kernel threads to any parent > > > process list *at*all*. > > > > Yes sure, I didn't argue with that. However, "->exit_state = -1" does > > matter, we can't detach process

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Eric W. Biederman
Oleg Nesterov <[EMAIL PROTECTED]> writes: > On 04/06, Oleg Nesterov wrote: >> >> --- t/kernel/exit.c~ 2007-04-06 23:31:31.0 +0400 >> +++ t/kernel/exit.c 2007-04-06 23:31:57.0 +0400 >> @@ -275,10 +275,7 @@ static void reparent_to_init(void) >> remove_parent(current); >>

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Oleg Nesterov
On 04/06, Oleg Nesterov wrote: > > --- t/kernel/exit.c~ 2007-04-06 23:31:31.0 +0400 > +++ t/kernel/exit.c 2007-04-06 23:31:57.0 +0400 > @@ -275,10 +275,7 @@ static void reparent_to_init(void) > remove_parent(current); > current->parent = child_reaper(current); >

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Ingo Molnar
* Oleg Nesterov <[EMAIL PROTECTED]> wrote: > > I'd almost prefer to just not add kernel threads to any parent > > process list *at*all*. > > Yes sure, I didn't argue with that. However, "->exit_state = -1" does > matter, we can't detach process unless we make it auto-reap. > Off course, we

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Eric W. Biederman
Christoph Hellwig <[EMAIL PROTECTED]> writes: > As all kernel thread (1) should be converted to kthread anyway for > proper containers support and general "let's get rid of a crappy API' > cleanups I think that's enough. It would be nice to have SGI helping > to convert more drivers over to the

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Oleg Nesterov
On 04/06, Linus Torvalds wrote: > > On Fri, 6 Apr 2007, Oleg Nesterov wrote: > > > > Oops. I misread stop_machine(), it does kernel_thread(), not > > kthread_create(). > > So "stopmachine" threads are all re-parented to init when the caller exits. > > I think it makes sense to set ->exit_state

Re: init's children list is long and slows reaping children.

2007-04-06 Thread H. Peter Anvin
Eric W. Biederman wrote: Are you saying waitpid() (wait4) *with a pid specified* can return another pid? That definitely sounds like a bug. No. For the full context look back a couple of messages. I'm guessing the issue is nash just calls wait and doesn't check the returned pid value,

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Ingo Molnar
* Davide Libenzi wrote: > > > Ohhh, the "signal" struct! Funny name for something that nowadays > > > has probably no more than a 5% affinity with signal-related tasks > > > :/ > > > > Hmm. I wonder if we should just rename it the struct thread_group, > > or struct task_group. Those seem

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Davide Libenzi
On Fri, 6 Apr 2007, Eric W. Biederman wrote: > Davide Libenzi writes: > > > On Fri, 6 Apr 2007, Oleg Nesterov wrote: > > > >> Sure. It would be nice to move ->children into signal_struct at first. > >> Except this change breaks (in fact fixes) ->pdeath_signal behaviour. > > > > Ohhh, the

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Eric W. Biederman
"H. Peter Anvin" <[EMAIL PROTECTED]> writes: > Eric W. Biederman wrote: >> >> Oleg is coming from a different case where it was found that exiting kernel >> threads were causing problems for nash when nash was run as init in an >> initramfs. While I think that case is likely a user space bug

Re: init's children list is long and slows reaping children.

2007-04-06 Thread H. Peter Anvin
Eric W. Biederman wrote: Oleg is coming from a different case where it was found that exiting kernel threads were causing problems for nash when nash was run as init in an initramfs. While I think that case is likely a user space bug because nash should check the pid from waidpid before

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Eric W. Biederman
Davide Libenzi writes: > On Fri, 6 Apr 2007, Oleg Nesterov wrote: > >> Sure. It would be nice to move ->children into signal_struct at first. >> Except this change breaks (in fact fixes) ->pdeath_signal behaviour. > > Ohhh, the "signal" struct! Funny name for something that nowadays has >

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Eric W. Biederman
Ingo Molnar <[EMAIL PROTECTED]> writes: > no. Two _completely separate_ lists. > > i.e. a to-be-reaped task will still be on the main list _too_. The main > list is for all the PID semantics rules. The reap-list is just for > wait4() processing. The two would be completely separate. And what

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Davide Libenzi
On Fri, 6 Apr 2007, Oleg Nesterov wrote: > Sure. It would be nice to move ->children into signal_struct at first. > Except this change breaks (in fact fixes) ->pdeath_signal behaviour. Ohhh, the "signal" struct! Funny name for something that nowadays has probably no more than a 5% affinity with

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Eric W. Biederman
Roland Dreier <[EMAIL PROTECTED]> writes: > > no. Two _completely separate_ lists. > > > > i.e. a to-be-reaped task will still be on the main list _too_. The main > > list is for all the PID semantics rules. The reap-list is just for > > wait4() processing. The two would be completely

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Christoph Hellwig
On Thu, Apr 05, 2007 at 06:29:16PM -0700, Linus Torvalds wrote: > > The support angel on my shoulder says we should just put all the kernel > > threads under a kthread subtree to shorten init's child list and minimize > > impact. > > A number are already there, of course, since they use the

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Eric W. Biederman
Oleg Nesterov <[EMAIL PROTECTED]> writes: > On 04/06, Eric W. Biederman wrote: >> >> Thinking about it I do agree with Linus that two lists sounds like the >> right solution because it ensures we always have O(1) time when >> waiting for a zombie. > > Well. I bet this will be painful, and will

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Roland Dreier
> no. Two _completely separate_ lists. > > i.e. a to-be-reaped task will still be on the main list _too_. The main > list is for all the PID semantics rules. The reap-list is just for > wait4() processing. The two would be completely separate. I guess this means we add another list head

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Eric W. Biederman
Linus Torvalds <[EMAIL PROTECTED]> writes: > On Fri, 6 Apr 2007, Oleg Nesterov wrote: >> >> Oops. I misread stop_machine(), it does kernel_thread(), not >> kthread_create(). >> So "stopmachine" threads are all re-parented to init when the caller exits. >> I think it makes sense to set

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Ingo Molnar
* Oleg Nesterov <[EMAIL PROTECTED]> wrote: > > Thinking about it I do agree with Linus that two lists sounds like > > the right solution because it ensures we always have O(1) time when > > waiting for a zombie. > > Well. I bet this will be painful, and will uglify the code even more. > >

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Ingo Molnar
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > putting the freshly reaped tasks at the 'head' of the list is just a > fancy (and incomplete) way of splitting the list up into two lists, and > i'd advocate a clean split. Just like have have split the ptrace_list

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Ingo Molnar
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > I'd almost prefer to just not add kernel threads to any parent process > list *at*all*. i think part of the problem is the legacy that the list is artificially unified: tasks that 'will possibly exit' are on the same list as tasks that 'have

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Oleg Nesterov
On 04/06, Eric W. Biederman wrote: > > Thinking about it I do agree with Linus that two lists sounds like the > right solution because it ensures we always have O(1) time when > waiting for a zombie. Well. I bet this will be painful, and will uglify the code even more. do_wait() has to iterate

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Linus Torvalds
On Fri, 6 Apr 2007, Oleg Nesterov wrote: > > Oops. I misread stop_machine(), it does kernel_thread(), not kthread_create(). > So "stopmachine" threads are all re-parented to init when the caller exits. > I think it makes sense to set ->exit_state = -1 in stopmachine(), regadless > of any other

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Oleg Nesterov
On 04/06, Eric W. Biederman wrote: > > Oleg Nesterov <[EMAIL PROTECTED]> writes: > > >> At first glance your patch looks reasonable. > >> > >> Unfortunately it only applies to the rare thread that calls daemonize, > >> and not also to kernel/kthread/kthread() which means it will miss many of >

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Robin Holt
On Fri, Apr 06, 2007 at 09:38:24AM -0600, Eric W. Biederman wrote: > How hard is tasklist_lock hit on these systems? The major hold-off we are seeing is from tasks reaping children, especially tasks with very large children lists. > How hard is the pid hash hit on these systems? In the little

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Eric W. Biederman
Oleg Nesterov <[EMAIL PROTECTED]> writes: >> At first glance your patch looks reasonable. >> >> Unfortunately it only applies to the rare thread that calls daemonize, >> and not also to kernel/kthread/kthread() which means it will miss many of >> our current kernel threads. > > Note that a

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Eric W. Biederman
Robin Holt <[EMAIL PROTECTED]> writes: >> So I think we have some options once we get the kernel threads out >> of the way. Getting the kernel threads out of the way would seem >> to be the first priority. > > I think both avenues would probably be the right way to proceeed. > Getting kthreads

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Robin Holt
> So I think we have some options once we get the kernel threads out > of the way. Getting the kernel threads out of the way would seem > to be the first priority. I think both avenues would probably be the right way to proceeed. Getting kthreads to not be parented by init would be an

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Oleg Nesterov
On 04/06, Eric W. Biederman wrote: > > Oleg Nesterov <[EMAIL PROTECTED]> writes: > > > Robin Holt wrote: > >> > >> wait_task_zombie() is taking many seconds to get through the list. > >> For the case of a modprobe, stop_machine creates one thread per cpu > >> (remember big number). All are

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Eric W. Biederman
Oleg Nesterov <[EMAIL PROTECTED]> writes: > Robin Holt wrote: >> >> wait_task_zombie() is taking many seconds to get through the list. >> For the case of a modprobe, stop_machine creates one thread per cpu >> (remember big number). All are parented to init and their exit will >> cause

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Oleg Nesterov
Robin Holt wrote: > > wait_task_zombie() is taking many seconds to get through the list. > For the case of a modprobe, stop_machine creates one thread per cpu > (remember big number). All are parented to init and their exit will > cause wait_task_zombie to scan multiple times most of the way

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Oleg Nesterov
Robin Holt wrote: wait_task_zombie() is taking many seconds to get through the list. For the case of a modprobe, stop_machine creates one thread per cpu (remember big number). All are parented to init and their exit will cause wait_task_zombie to scan multiple times most of the way through

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Eric W. Biederman
Oleg Nesterov [EMAIL PROTECTED] writes: Robin Holt wrote: wait_task_zombie() is taking many seconds to get through the list. For the case of a modprobe, stop_machine creates one thread per cpu (remember big number). All are parented to init and their exit will cause wait_task_zombie to scan

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Oleg Nesterov
On 04/06, Eric W. Biederman wrote: Oleg Nesterov [EMAIL PROTECTED] writes: Robin Holt wrote: wait_task_zombie() is taking many seconds to get through the list. For the case of a modprobe, stop_machine creates one thread per cpu (remember big number). All are parented to init and

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Robin Holt
So I think we have some options once we get the kernel threads out of the way. Getting the kernel threads out of the way would seem to be the first priority. I think both avenues would probably be the right way to proceeed. Getting kthreads to not be parented by init would be an opportunity

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Eric W. Biederman
Robin Holt [EMAIL PROTECTED] writes: So I think we have some options once we get the kernel threads out of the way. Getting the kernel threads out of the way would seem to be the first priority. I think both avenues would probably be the right way to proceeed. Getting kthreads to not be

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Eric W. Biederman
Oleg Nesterov [EMAIL PROTECTED] writes: At first glance your patch looks reasonable. Unfortunately it only applies to the rare thread that calls daemonize, and not also to kernel/kthread/kthread() which means it will miss many of our current kernel threads. Note that a thread created by

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Robin Holt
On Fri, Apr 06, 2007 at 09:38:24AM -0600, Eric W. Biederman wrote: How hard is tasklist_lock hit on these systems? The major hold-off we are seeing is from tasks reaping children, especially tasks with very large children lists. How hard is the pid hash hit on these systems? In the little bit

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Oleg Nesterov
On 04/06, Eric W. Biederman wrote: Oleg Nesterov [EMAIL PROTECTED] writes: At first glance your patch looks reasonable. Unfortunately it only applies to the rare thread that calls daemonize, and not also to kernel/kthread/kthread() which means it will miss many of our current kernel

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Linus Torvalds
On Fri, 6 Apr 2007, Oleg Nesterov wrote: Oops. I misread stop_machine(), it does kernel_thread(), not kthread_create(). So stopmachine threads are all re-parented to init when the caller exits. I think it makes sense to set -exit_state = -1 in stopmachine(), regadless of any other changes.

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Oleg Nesterov
On 04/06, Eric W. Biederman wrote: Thinking about it I do agree with Linus that two lists sounds like the right solution because it ensures we always have O(1) time when waiting for a zombie. Well. I bet this will be painful, and will uglify the code even more. do_wait() has to iterate over

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Ingo Molnar
* Linus Torvalds [EMAIL PROTECTED] wrote: I'd almost prefer to just not add kernel threads to any parent process list *at*all*. i think part of the problem is the legacy that the list is artificially unified: tasks that 'will possibly exit' are on the same list as tasks that 'have already

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Ingo Molnar
* Ingo Molnar [EMAIL PROTECTED] wrote: putting the freshly reaped tasks at the 'head' of the list is just a fancy (and incomplete) way of splitting the list up into two lists, and i'd advocate a clean split. Just like have have split the ptrace_list

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Ingo Molnar
* Oleg Nesterov [EMAIL PROTECTED] wrote: Thinking about it I do agree with Linus that two lists sounds like the right solution because it ensures we always have O(1) time when waiting for a zombie. Well. I bet this will be painful, and will uglify the code even more. do_wait() has

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Eric W. Biederman
Linus Torvalds [EMAIL PROTECTED] writes: On Fri, 6 Apr 2007, Oleg Nesterov wrote: Oops. I misread stop_machine(), it does kernel_thread(), not kthread_create(). So stopmachine threads are all re-parented to init when the caller exits. I think it makes sense to set -exit_state = -1 in

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Roland Dreier
no. Two _completely separate_ lists. i.e. a to-be-reaped task will still be on the main list _too_. The main list is for all the PID semantics rules. The reap-list is just for wait4() processing. The two would be completely separate. I guess this means we add another list head to

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Eric W. Biederman
Oleg Nesterov [EMAIL PROTECTED] writes: On 04/06, Eric W. Biederman wrote: Thinking about it I do agree with Linus that two lists sounds like the right solution because it ensures we always have O(1) time when waiting for a zombie. Well. I bet this will be painful, and will uglify the code

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Eric W. Biederman
Roland Dreier [EMAIL PROTECTED] writes: no. Two _completely separate_ lists. i.e. a to-be-reaped task will still be on the main list _too_. The main list is for all the PID semantics rules. The reap-list is just for wait4() processing. The two would be completely separate. I

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Christoph Hellwig
On Thu, Apr 05, 2007 at 06:29:16PM -0700, Linus Torvalds wrote: The support angel on my shoulder says we should just put all the kernel threads under a kthread subtree to shorten init's child list and minimize impact. A number are already there, of course, since they use the kthread

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Davide Libenzi
On Fri, 6 Apr 2007, Oleg Nesterov wrote: Sure. It would be nice to move -children into signal_struct at first. Except this change breaks (in fact fixes) -pdeath_signal behaviour. Ohhh, the signal struct! Funny name for something that nowadays has probably no more than a 5% affinity with

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Eric W. Biederman
Ingo Molnar [EMAIL PROTECTED] writes: no. Two _completely separate_ lists. i.e. a to-be-reaped task will still be on the main list _too_. The main list is for all the PID semantics rules. The reap-list is just for wait4() processing. The two would be completely separate. And what pray

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Eric W. Biederman
Davide Libenzi davidel@xmailserver.org writes: On Fri, 6 Apr 2007, Oleg Nesterov wrote: Sure. It would be nice to move -children into signal_struct at first. Except this change breaks (in fact fixes) -pdeath_signal behaviour. Ohhh, the signal struct! Funny name for something that nowadays

Re: init's children list is long and slows reaping children.

2007-04-06 Thread H. Peter Anvin
Eric W. Biederman wrote: Oleg is coming from a different case where it was found that exiting kernel threads were causing problems for nash when nash was run as init in an initramfs. While I think that case is likely a user space bug because nash should check the pid from waidpid before

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Eric W. Biederman
H. Peter Anvin [EMAIL PROTECTED] writes: Eric W. Biederman wrote: Oleg is coming from a different case where it was found that exiting kernel threads were causing problems for nash when nash was run as init in an initramfs. While I think that case is likely a user space bug because nash

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Davide Libenzi
On Fri, 6 Apr 2007, Eric W. Biederman wrote: Davide Libenzi davidel@xmailserver.org writes: On Fri, 6 Apr 2007, Oleg Nesterov wrote: Sure. It would be nice to move -children into signal_struct at first. Except this change breaks (in fact fixes) -pdeath_signal behaviour. Ohhh, the

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Ingo Molnar
* Davide Libenzi davidel@xmailserver.org wrote: Ohhh, the signal struct! Funny name for something that nowadays has probably no more than a 5% affinity with signal-related tasks :/ Hmm. I wonder if we should just rename it the struct thread_group, or struct task_group. Those

Re: init's children list is long and slows reaping children.

2007-04-06 Thread H. Peter Anvin
Eric W. Biederman wrote: Are you saying waitpid() (wait4) *with a pid specified* can return another pid? That definitely sounds like a bug. No. For the full context look back a couple of messages. I'm guessing the issue is nash just calls wait and doesn't check the returned pid value,

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Oleg Nesterov
On 04/06, Linus Torvalds wrote: On Fri, 6 Apr 2007, Oleg Nesterov wrote: Oops. I misread stop_machine(), it does kernel_thread(), not kthread_create(). So stopmachine threads are all re-parented to init when the caller exits. I think it makes sense to set -exit_state = -1 in

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Eric W. Biederman
Christoph Hellwig [EMAIL PROTECTED] writes: As all kernel thread (1) should be converted to kthread anyway for proper containers support and general let's get rid of a crappy API' cleanups I think that's enough. It would be nice to have SGI helping to convert more drivers over to the proper

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Ingo Molnar
* Oleg Nesterov [EMAIL PROTECTED] wrote: I'd almost prefer to just not add kernel threads to any parent process list *at*all*. Yes sure, I didn't argue with that. However, -exit_state = -1 does matter, we can't detach process unless we make it auto-reap. Off course, we also need to

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Oleg Nesterov
On 04/06, Oleg Nesterov wrote: --- t/kernel/exit.c~ 2007-04-06 23:31:31.0 +0400 +++ t/kernel/exit.c 2007-04-06 23:31:57.0 +0400 @@ -275,10 +275,7 @@ static void reparent_to_init(void) remove_parent(current); current-parent = child_reaper(current);

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Eric W. Biederman
Oleg Nesterov [EMAIL PROTECTED] writes: On 04/06, Oleg Nesterov wrote: --- t/kernel/exit.c~ 2007-04-06 23:31:31.0 +0400 +++ t/kernel/exit.c 2007-04-06 23:31:57.0 +0400 @@ -275,10 +275,7 @@ static void reparent_to_init(void) remove_parent(current); current-parent

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Oleg Nesterov
On 04/06, Ingo Molnar wrote: * Oleg Nesterov [EMAIL PROTECTED] wrote: I'd almost prefer to just not add kernel threads to any parent process list *at*all*. Yes sure, I didn't argue with that. However, -exit_state = -1 does matter, we can't detach process unless we make it

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Ingo Molnar
* Oleg Nesterov [EMAIL PROTECTED] wrote: Probably it is I who missed something :) But why can't we do both changes? I think it is just ugly to use init to reap the kernel thread. Ok, wait4() can find zombie quickly if we do the -children split. But /sbin/init could be swapped out, we

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Jeremy Fitzhardinge
Eric W. Biederman wrote: I'm guessing the issue is nash just calls wait and doesn't check the returned pid value, assuming it is the only child it forked returning. Which is valid except when you are running as pid == 1. Hm, that's always a bug; a process can always have children it

Re: init's children list is long and slows reaping children.

2007-04-06 Thread H. Peter Anvin
Jeremy Fitzhardinge wrote: Eric W. Biederman wrote: I'm guessing the issue is nash just calls wait and doesn't check the returned pid value, assuming it is the only child it forked returning. Which is valid except when you are running as pid == 1. Hm, that's always a bug; a process can

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Davide Libenzi
On Fri, 6 Apr 2007, Ingo Molnar wrote: * Davide Libenzi davidel@xmailserver.org wrote: Ohhh, the signal struct! Funny name for something that nowadays has probably no more than a 5% affinity with signal-related tasks :/ Hmm. I wonder if we should just rename it the

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Linus Torvalds
On Fri, 6 Apr 2007, Davide Libenzi wrote: or lets just face it and name it what it is: process_struct ;-) That'd be fine too! Wonder if Linus would swallow a rename patch like that... I don't really see the point. It's not even *true*. A process includes more than the shared

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Davide Libenzi
On Fri, 6 Apr 2007, Linus Torvalds wrote: On Fri, 6 Apr 2007, Davide Libenzi wrote: or lets just face it and name it what it is: process_struct ;-) That'd be fine too! Wonder if Linus would swallow a rename patch like that... I don't really see the point. It's not even *true*.

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Jeff Garzik
Robin Holt wrote: We have been testing a new larger configuration and we are seeing a very large scan time of init's tsk-children list. In the cases we are seeing, there are numerous kernel processes created for each cpu (ie: events/0 ... events/big number, xfslogd/0 ... xfslogd/big number).

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Linus Torvalds
On Fri, 6 Apr 2007, Davide Libenzi wrote: On Fri, 6 Apr 2007, Linus Torvalds wrote: I don't really see the point. It's not even *true*. A process includes more than the shared signal-handling - it would include files and fs etc too. So it's actually *more* correct to call it

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Linus Torvalds
On Fri, 6 Apr 2007, Jeff Garzik wrote: I would rather change the implementation under the hood to start per-CPU threads on demand, similar to a thread-pool implementation. Boxes with $BigNum CPUs probably won't ever use half of those threads. The counter-argument is that boxes with

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Davide Libenzi
On Fri, 6 Apr 2007, Linus Torvalds wrote: On Fri, 6 Apr 2007, Davide Libenzi wrote: On Fri, 6 Apr 2007, Linus Torvalds wrote: I don't really see the point. It's not even *true*. A process includes more than the shared signal-handling - it would include files and fs etc

Re: init's children list is long and slows reaping children.

2007-04-06 Thread Jeff Garzik
Linus Torvalds wrote: On Fri, 6 Apr 2007, Jeff Garzik wrote: I would rather change the implementation under the hood to start per-CPU threads on demand, similar to a thread-pool implementation. Boxes with $BigNum CPUs probably won't ever use half of those threads. The counter-argument is

Re: init's children list is long and slows reaping children.

2007-04-05 Thread Eric W. Biederman
Linus Torvalds <[EMAIL PROTECTED]> writes: > On Thu, 5 Apr 2007, Chris Snook wrote: > >> Linus Torvalds wrote: >> >> > Another thing we could do is to just make sure that kernel threads simply >> > don't end up as children of init. That whole thing is silly, they're really >> > not children of

Re: init's children list is long and slows reaping children.

2007-04-05 Thread Linus Torvalds
On Thu, 5 Apr 2007, Chris Snook wrote: > Linus Torvalds wrote: > > > Another thing we could do is to just make sure that kernel threads simply > > don't end up as children of init. That whole thing is silly, they're really > > not children of the user-space init anyway. Comments? > > Does

Re: init's children list is long and slows reaping children.

2007-04-05 Thread Chris Snook
Chris Snook wrote: Linus Torvalds wrote: On Thu, 5 Apr 2007, Robin Holt wrote: For testing, Jack Steiner create the following patch. All it does is moves tasks which are transitioning to the zombie state from where they are in the children list to the head of the list. In this way, they

Re: init's children list is long and slows reaping children.

2007-04-05 Thread Chris Snook
Linus Torvalds wrote: On Thu, 5 Apr 2007, Robin Holt wrote: For testing, Jack Steiner create the following patch. All it does is moves tasks which are transitioning to the zombie state from where they are in the children list to the head of the list. In this way, they will be the first found

Re: init's children list is long and slows reaping children.

2007-04-05 Thread Linus Torvalds
On Thu, 5 Apr 2007, Robin Holt wrote: > > For testing, Jack Steiner create the following patch. All it does > is moves tasks which are transitioning to the zombie state from where > they are in the children list to the head of the list. In this way, > they will be the first found and reaping

init's children list is long and slows reaping children.

2007-04-05 Thread Robin Holt
We have been testing a new larger configuration and we are seeing a very large scan time of init's tsk->children list. In the cases we are seeing, there are numerous kernel processes created for each cpu (ie: events/0 ... events/, xfslogd/0 ... xfslogd/). These are all on the list ahead of the

init's children list is long and slows reaping children.

2007-04-05 Thread Robin Holt
We have been testing a new larger configuration and we are seeing a very large scan time of init's tsk-children list. In the cases we are seeing, there are numerous kernel processes created for each cpu (ie: events/0 ... events/big number, xfslogd/0 ... xfslogd/big number). These are all on the

Re: init's children list is long and slows reaping children.

2007-04-05 Thread Linus Torvalds
On Thu, 5 Apr 2007, Robin Holt wrote: For testing, Jack Steiner create the following patch. All it does is moves tasks which are transitioning to the zombie state from where they are in the children list to the head of the list. In this way, they will be the first found and reaping does

Re: init's children list is long and slows reaping children.

2007-04-05 Thread Chris Snook
Linus Torvalds wrote: On Thu, 5 Apr 2007, Robin Holt wrote: For testing, Jack Steiner create the following patch. All it does is moves tasks which are transitioning to the zombie state from where they are in the children list to the head of the list. In this way, they will be the first found

Re: init's children list is long and slows reaping children.

2007-04-05 Thread Chris Snook
Chris Snook wrote: Linus Torvalds wrote: On Thu, 5 Apr 2007, Robin Holt wrote: For testing, Jack Steiner create the following patch. All it does is moves tasks which are transitioning to the zombie state from where they are in the children list to the head of the list. In this way, they

Re: init's children list is long and slows reaping children.

2007-04-05 Thread Linus Torvalds
On Thu, 5 Apr 2007, Chris Snook wrote: Linus Torvalds wrote: Another thing we could do is to just make sure that kernel threads simply don't end up as children of init. That whole thing is silly, they're really not children of the user-space init anyway. Comments? Does anyone

Re: init's children list is long and slows reaping children.

2007-04-05 Thread Eric W. Biederman
Linus Torvalds [EMAIL PROTECTED] writes: On Thu, 5 Apr 2007, Chris Snook wrote: Linus Torvalds wrote: Another thing we could do is to just make sure that kernel threads simply don't end up as children of init. That whole thing is silly, they're really not children of the user-space init

<    1   2   3   >