[Devel] Re: Re: Hang with fair cgroup scheduler (reproducer is attached.)

2007-12-17 Thread Dmitry Adamushko
On 17/12/2007, Steven Rostedt [EMAIL PROTECTED] wrote: Here's a little snippet of where things went wrong. [94359.652019] cpu:3 (hackbench:1658) pick_next_task_fair:1036 nr_running=1 [94359.652020] cpu:3 (hackbench:1658) pick_next_entity:625 se=810009020800 [94359.652021] cpu:0

[Devel] Re: Re: Hang with fair cgroup scheduler (reproducer is attached.)

2007-12-17 Thread Dmitry Adamushko
[ trimmed the cc' list ] On 17/12/2007, Steven Rostedt [EMAIL PROTECTED] wrote: On Mon, 17 Dec 2007, Dmitry Adamushko wrote: It may be related, maybe not. One 'abnormal' thing (at least, it occurs only once in this log. Should be checked wheather it happens when the system works fine)

[Devel] Re: Re: Hang with fair cgroup scheduler (reproducer is attached.)

2007-12-16 Thread Ingo Molnar
* Dmitry Adamushko [EMAIL PROTECTED] wrote: --- a/kernel/sched.c +++ b/kernel/sched.c @@ -7360,7 +7360,7 @@ void sched_move_task(struct task_struct *tsk) update_rq_clock(rq); - running = task_running(rq, tsk); + running = (rq-curr == tsk); on_rq =

[Devel] Re: Re: Hang with fair cgroup scheduler (reproducer is attached.)

2007-12-16 Thread Dmitry Adamushko
On 16/12/2007, Ingo Molnar [EMAIL PROTECTED] wrote: * Dmitry Adamushko [EMAIL PROTECTED] wrote: --- a/kernel/sched.c +++ b/kernel/sched.c @@ -7360,7 +7360,7 @@ void sched_move_task(struct task_struct *tsk) update_rq_clock(rq); - running = task_running(rq, tsk); +

[Devel] Re: Re: Hang with fair cgroup scheduler (reproducer is attached.)

2007-12-16 Thread Dmitry Adamushko
Ingo, what about the following patch instead? maybe task_is_current() would be a better name though. Steven, I guess, there is some analogue of UNLOCKED_CTXSW on -rt (to reduce contention for rq-lock). So there can be a race schedule() vs. rt_mutex_setprio() or sched_setscheduler() for some

[Devel] Re: Re: Hang with fair cgroup scheduler (reproducer is attached.)

2007-12-16 Thread Steven Rostedt
On Fri, 14 Dec 2007, Dmitry Adamushko wrote: argh... it's a consequence of the 'current is not kept within the tree indeed. Thanks Dmitry for tracking this down. Although I'm still not convinced we hit the same bug. But I'm going to go ahead and release 2.6.24-rc5-rt1 anyway. When you have

[Devel] Re: Re: Hang with fair cgroup scheduler (reproducer is attached.)

2007-12-16 Thread Steven Rostedt
On Sun, 16 Dec 2007, Dmitry Adamushko wrote: Steven, I guess, there is some analogue of UNLOCKED_CTXSW on -rt (to reduce contention for rq-lock). So there can be a race schedule() vs. rt_mutex_setprio() or sched_setscheduler() for some paths that might explain crashes you have been

[Devel] Re: Re: Hang with fair cgroup scheduler (reproducer is attached.)

2007-12-16 Thread Steven Rostedt
On Sun, 16 Dec 2007, Dmitry Adamushko wrote: Steven, I guess, there is some analogue of UNLOCKED_CTXSW on -rt (to reduce contention for rq-lock). So there can be a race schedule() vs. rt_mutex_setprio() or sched_setscheduler() for some paths that might explain crashes you have been

[Devel] Re: Re: Hang with fair cgroup scheduler (reproducer is attached.)

2007-12-15 Thread Dmitry Adamushko
On 14/12/2007, Steven Rostedt [EMAIL PROTECTED] wrote: On Fri, 14 Dec 2007, Dmitry Adamushko wrote: argh... it's a consequence of the 'current is not kept within the tree indeed. Thanks Dmitry for tracking this down. My analysis was flawed (hmm... me was under control of Belgium

[Devel] Re: Re: Hang with fair cgroup scheduler (reproducer is attached.)

2007-12-15 Thread Dhaval Giani
On Sat, Dec 15, 2007 at 11:22:08AM +0100, Dmitry Adamushko wrote: On 14/12/2007, Steven Rostedt [EMAIL PROTECTED] wrote: On Fri, 14 Dec 2007, Dmitry Adamushko wrote: argh... it's a consequence of the 'current is not kept within the tree indeed. Thanks Dmitry for tracking

[Devel] Re: Re: Hang with fair cgroup scheduler (reproducer is attached.)

2007-12-15 Thread Dmitry Adamushko
On 15/12/2007, Dhaval Giani [EMAIL PROTECTED] wrote: On Sat, Dec 15, 2007 at 11:22:08AM +0100, Dmitry Adamushko wrote: On 14/12/2007, Steven Rostedt [EMAIL PROTECTED] wrote: On Fri, 14 Dec 2007, Dmitry Adamushko wrote: argh... it's a consequence of the 'current is not kept

[Devel] Re: Re: Hang with fair cgroup scheduler (reproducer is attached.)

2007-12-15 Thread Dmitry Adamushko
Dhaval, so following the analysis in the previous mail... here is a test patch. Could you please give it a try? TIA, (enclosed non white-space broken version) --- --- a/kernel/sched.c +++ b/kernel/sched.c @@ -7360,7 +7360,7 @@ void sched_move_task(struct task_struct *tsk)

[Devel] Re: Re: Hang with fair cgroup scheduler (reproducer is attached.)

2007-12-15 Thread Dhaval Giani
On Sun, Dec 16, 2007 at 01:00:07AM +0100, Dmitry Adamushko wrote: Dhaval, so following the analysis in the previous mail... here is a test patch. Could you please give it a try? Yep, it works! Tested-by: Dhaval Giani [EMAIL PROTECTED] thanks, -- regards, Dhaval

[Devel] Re: Re: Hang with fair cgroup scheduler (reproducer is attached.)

2007-12-14 Thread kamezawa . hiroyu
just to be sure SMP does matter here (most likely yes, I guess). NUMA? I am not able to reproduce it here locally on an x86 8 CPU box. yes. I used NUMA. 2 Nodes/4CPU x 2 Hmm.. Thanks, -Kame ___ Containers mailing list [EMAIL PROTECTED]

[Devel] Re: Re: Hang with fair cgroup scheduler (reproducer is attached.)

2007-12-14 Thread Dhaval Giani
On Fri, Dec 14, 2007 at 09:06:07PM +0530, Dhaval Giani wrote: On Fri, Dec 14, 2007 at 11:24:28PM +0900, [EMAIL PROTECTED] wrote: just to be sure SMP does matter here (most likely yes, I guess). NUMA? I am not able to reproduce it here locally on an x86 8 CPU box. yes. I used

[Devel] Re: Re: Hang with fair cgroup scheduler (reproducer is attached.)

2007-12-14 Thread Dhaval Giani
On Fri, Dec 14, 2007 at 11:24:28PM +0900, [EMAIL PROTECTED] wrote: just to be sure SMP does matter here (most likely yes, I guess). NUMA? I am not able to reproduce it here locally on an x86 8 CPU box. yes. I used NUMA. 2 Nodes/4CPU x 2 OK, I got hold of an IA64 box, non numa and

[Devel] Re: Re: Hang with fair cgroup scheduler (reproducer is attached.)

2007-12-14 Thread kamezawa . hiroyu
- have you tried : [EMAIL PROTECTED] testpro]#taskset 01 ./batech-test.sh yes hang? no. just to be sure SMP does matter here (most likely yes, I guess). maybe. As far as I tested, there was no hang if the number of cpus is 1. Regards, -Kame ___

[Devel] Re: Re: Hang with fair cgroup scheduler (reproducer is attached.)

2007-12-14 Thread Dmitry Adamushko
On 14/12/2007, Dhaval Giani [EMAIL PROTECTED] Actually no, its another bug. Thanks for the program! Humm... this crash is very likely to be caused by the same bug. It just reveals itself in a different place, but effectivelly the pattern looks similar. Anyway, the rb-tree gets corrupted...

[Devel] Re: Re: Hang with fair cgroup scheduler (reproducer is attached.)

2007-12-14 Thread Dmitry Adamushko
[ ... ] [a001002e0480] rb_erase+0x300/0x7e0 [a00100076290] __dequeue_entity+0x70/0xa0 [a00100076300] set_next_entity+0x40/0xa0 [a001000763a0] set_curr_task_fair+0x40/0xa0 [a00100078d90] sched_move_task+0x2d0/0x340 [a00100078e20] cpu_cgroup_attach+0x20/0x40 [