On 17/12/2007, Steven Rostedt [EMAIL PROTECTED] wrote:
Here's a little snippet of where things went wrong.
[94359.652019] cpu:3 (hackbench:1658) pick_next_task_fair:1036 nr_running=1
[94359.652020] cpu:3 (hackbench:1658) pick_next_entity:625 se=810009020800
[94359.652021] cpu:0
[ trimmed the cc' list ]
On 17/12/2007, Steven Rostedt [EMAIL PROTECTED] wrote:
On Mon, 17 Dec 2007, Dmitry Adamushko wrote:
It may be related, maybe not. One 'abnormal' thing (at least, it
occurs only once in this log. Should be checked wheather it happens
when the system works fine)
* Dmitry Adamushko [EMAIL PROTECTED] wrote:
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -7360,7 +7360,7 @@ void sched_move_task(struct task_struct *tsk)
update_rq_clock(rq);
- running = task_running(rq, tsk);
+ running = (rq-curr == tsk);
on_rq =
On 16/12/2007, Ingo Molnar [EMAIL PROTECTED] wrote:
* Dmitry Adamushko [EMAIL PROTECTED] wrote:
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -7360,7 +7360,7 @@ void sched_move_task(struct task_struct *tsk)
update_rq_clock(rq);
- running = task_running(rq, tsk);
+
Ingo,
what about the following patch instead?
maybe task_is_current() would be a better name though.
Steven,
I guess, there is some analogue of UNLOCKED_CTXSW on -rt
(to reduce contention for rq-lock).
So there can be a race schedule() vs. rt_mutex_setprio() or sched_setscheduler()
for some
On Fri, 14 Dec 2007, Dmitry Adamushko wrote:
argh... it's a consequence of the 'current is not kept within the tree
indeed.
Thanks Dmitry for tracking this down. Although I'm still not convinced we
hit the same bug. But I'm going to go ahead and release 2.6.24-rc5-rt1
anyway. When you have
On Sun, 16 Dec 2007, Dmitry Adamushko wrote:
Steven,
I guess, there is some analogue of UNLOCKED_CTXSW on -rt
(to reduce contention for rq-lock).
So there can be a race schedule() vs. rt_mutex_setprio() or
sched_setscheduler()
for some paths that might explain crashes you have been
On Sun, 16 Dec 2007, Dmitry Adamushko wrote:
Steven,
I guess, there is some analogue of UNLOCKED_CTXSW on -rt
(to reduce contention for rq-lock).
So there can be a race schedule() vs. rt_mutex_setprio() or
sched_setscheduler()
for some paths that might explain crashes you have been
On 14/12/2007, Steven Rostedt [EMAIL PROTECTED] wrote:
On Fri, 14 Dec 2007, Dmitry Adamushko wrote:
argh... it's a consequence of the 'current is not kept within the tree
indeed.
Thanks Dmitry for tracking this down.
My analysis was flawed (hmm... me was under control of Belgium
On Sat, Dec 15, 2007 at 11:22:08AM +0100, Dmitry Adamushko wrote:
On 14/12/2007, Steven Rostedt [EMAIL PROTECTED] wrote:
On Fri, 14 Dec 2007, Dmitry Adamushko wrote:
argh... it's a consequence of the 'current is not kept within the tree
indeed.
Thanks Dmitry for tracking
On 15/12/2007, Dhaval Giani [EMAIL PROTECTED] wrote:
On Sat, Dec 15, 2007 at 11:22:08AM +0100, Dmitry Adamushko wrote:
On 14/12/2007, Steven Rostedt [EMAIL PROTECTED] wrote:
On Fri, 14 Dec 2007, Dmitry Adamushko wrote:
argh... it's a consequence of the 'current is not kept
Dhaval,
so following the analysis in the previous mail... here is a test
patch. Could you please give it a try?
TIA,
(enclosed non white-space broken version)
---
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -7360,7 +7360,7 @@ void sched_move_task(struct task_struct *tsk)
On Sun, Dec 16, 2007 at 01:00:07AM +0100, Dmitry Adamushko wrote:
Dhaval,
so following the analysis in the previous mail... here is a test
patch. Could you please give it a try?
Yep, it works!
Tested-by: Dhaval Giani [EMAIL PROTECTED]
thanks,
--
regards,
Dhaval
just to be sure SMP does matter here (most likely yes, I guess).
NUMA? I am not able to reproduce it here locally on an x86 8 CPU box.
yes. I used NUMA. 2 Nodes/4CPU x 2
Hmm..
Thanks,
-Kame
___
Containers mailing list
[EMAIL PROTECTED]
On Fri, Dec 14, 2007 at 09:06:07PM +0530, Dhaval Giani wrote:
On Fri, Dec 14, 2007 at 11:24:28PM +0900, [EMAIL PROTECTED] wrote:
just to be sure SMP does matter here (most likely yes, I guess).
NUMA? I am not able to reproduce it here locally on an x86 8 CPU box.
yes. I used
On Fri, Dec 14, 2007 at 11:24:28PM +0900, [EMAIL PROTECTED] wrote:
just to be sure SMP does matter here (most likely yes, I guess).
NUMA? I am not able to reproduce it here locally on an x86 8 CPU box.
yes. I used NUMA. 2 Nodes/4CPU x 2
OK, I got hold of an IA64 box, non numa and
-
have you tried :
[EMAIL PROTECTED] testpro]#taskset 01 ./batech-test.sh
yes
hang?
no.
just to be sure SMP does matter here (most likely yes, I guess).
maybe. As far as I tested, there was no hang if the number of cpus is 1.
Regards,
-Kame
___
On 14/12/2007, Dhaval Giani [EMAIL PROTECTED]
Actually no, its another bug. Thanks for the program!
Humm... this crash is very likely to be caused by the same bug. It
just reveals itself in a different place, but effectivelly the pattern
looks similar. Anyway, the rb-tree gets corrupted...
[ ... ]
[a001002e0480] rb_erase+0x300/0x7e0
[a00100076290] __dequeue_entity+0x70/0xa0
[a00100076300] set_next_entity+0x40/0xa0
[a001000763a0] set_curr_task_fair+0x40/0xa0
[a00100078d90] sched_move_task+0x2d0/0x340
[a00100078e20] cpu_cgroup_attach+0x20/0x40
[
19 matches
Mail list logo