Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
On 25/02/07, Thomas Gleixner <[EMAIL PROTECTED]> wrote: Michal, On Sat, 2007-02-24 at 23:45 +0100, Michal Piotrowski wrote: > Here is a dmesg after over 25 hours of uptime > http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc1/git-dmesg3 > nothing new. can you please turn off CONFIG_SCHED_SMT ? It seems all reports have one thing in common: CPU == P4 and CONFIG_SCHED_SMT turned on. tglx Ok, I think that this is a SMT scheduler problem. System works fine for fifteen hours. Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
On 25/02/07, Thomas Gleixner [EMAIL PROTECTED] wrote: Michal, On Sat, 2007-02-24 at 23:45 +0100, Michal Piotrowski wrote: Here is a dmesg after over 25 hours of uptime http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc1/git-dmesg3 nothing new. can you please turn off CONFIG_SCHED_SMT ? It seems all reports have one thing in common: CPU == P4 and CONFIG_SCHED_SMT turned on. tglx Ok, I think that this is a SMT scheduler problem. System works fine for fifteen hours. Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
Michal, On Sat, 2007-02-24 at 23:45 +0100, Michal Piotrowski wrote: > Here is a dmesg after over 25 hours of uptime > http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc1/git-dmesg3 > nothing new. can you please turn off CONFIG_SCHED_SMT ? It seems all reports have one thing in common: CPU == P4 and CONFIG_SCHED_SMT turned on. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
Michal, On Sat, 2007-02-24 at 23:45 +0100, Michal Piotrowski wrote: Here is a dmesg after over 25 hours of uptime http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc1/git-dmesg3 nothing new. can you please turn off CONFIG_SCHED_SMT ? It seems all reports have one thing in common: CPU == P4 and CONFIG_SCHED_SMT turned on. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
Hi Ingo, On 23/02/07, Ingo Molnar <[EMAIL PROTECTED]> wrote: Michal, * Michal Piotrowski <[EMAIL PROTECTED]> wrote: > Here is more > > hardirqs last enabled at (30787): [] syscall_exit_work+0x11/0x26 > hardirqs last disabled at (30788): [] ret_from_exception+0x9/0xc > softirqs last enabled at (30202): [] __do_softirq+0xe4/0xea > softirqs last disabled at (30193): [] do_softirq+0x64/0xd1 could you please try the patch below? This is pretty much the only condition under which we can silently 'leak' pending softirqs, and trigger the new warning: if something does cond_resched_softirq() in non-runnable state. (which is a no-no, but nothing enforced this, so it could in theory happen.) So the question is, with this patch applied, do you get these new warnings from sched.c? Here is a dmesg after over 25 hours of uptime http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc1/git-dmesg3 nothing new. Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
Hi Ingo, On 23/02/07, Ingo Molnar [EMAIL PROTECTED] wrote: Michal, * Michal Piotrowski [EMAIL PROTECTED] wrote: Here is more hardirqs last enabled at (30787): [c0104249] syscall_exit_work+0x11/0x26 hardirqs last disabled at (30788): [c0103fc9] ret_from_exception+0x9/0xc softirqs last enabled at (30202): [c01265df] __do_softirq+0xe4/0xea softirqs last disabled at (30193): [c0106a75] do_softirq+0x64/0xd1 could you please try the patch below? This is pretty much the only condition under which we can silently 'leak' pending softirqs, and trigger the new warning: if something does cond_resched_softirq() in non-runnable state. (which is a no-no, but nothing enforced this, so it could in theory happen.) So the question is, with this patch applied, do you get these new warnings from sched.c? Here is a dmesg after over 25 hours of uptime http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc1/git-dmesg3 nothing new. Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > could you please try the patch below? This is pretty much the only > condition under which we can silently 'leak' pending softirqs, and > trigger the new warning: if something does cond_resched_softirq() in > non-runnable state. (which is a no-no, but nothing enforced this, so > it could in theory happen.) So the question is, with this patch > applied, do you get these new warnings from sched.c? it just triggered on one of my boxes: BUG: at kernel/sched.c:4692 cond_resched_softirq() [] cond_resched_softirq+0x5f/0x7b [] release_sock+0x42/0x81 [] sk_wait_data+0x57/0x9d [] autoremove_wake_function+0x0/0x33 [] tcp_recvmsg+0x39c/0x973 [] sock_common_recvmsg+0x3e/0x54 [] sock_aio_read+0x106/0x112 [] do_sync_read+0xc8/0x105 [] autoremove_wake_function+0x0/0x33 [] vfs_read+0xc1/0x15a [] sys_read+0x41/0x67 [] syscall_call+0x7/0xb === so tcp_recvmsg() definitely gets into this condition. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
* Ingo Molnar [EMAIL PROTECTED] wrote: could you please try the patch below? This is pretty much the only condition under which we can silently 'leak' pending softirqs, and trigger the new warning: if something does cond_resched_softirq() in non-runnable state. (which is a no-no, but nothing enforced this, so it could in theory happen.) So the question is, with this patch applied, do you get these new warnings from sched.c? it just triggered on one of my boxes: BUG: at kernel/sched.c:4692 cond_resched_softirq() [c03ce128] cond_resched_softirq+0x5f/0x7b [c0369078] release_sock+0x42/0x81 [c03693bc] sk_wait_data+0x57/0x9d [c0129a00] autoremove_wake_function+0x0/0x33 [c03942ff] tcp_recvmsg+0x39c/0x973 [c0368e39] sock_common_recvmsg+0x3e/0x54 [c0367903] sock_aio_read+0x106/0x112 [c0159b0c] do_sync_read+0xc8/0x105 [c0129a00] autoremove_wake_function+0x0/0x33 [c0159e82] vfs_read+0xc1/0x15a [c015a7d2] sys_read+0x41/0x67 [c0103c10] syscall_call+0x7/0xb === so tcp_recvmsg() definitely gets into this condition. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
On Fri, 2007-02-23 at 07:08 +0100, Ingo Molnar wrote: > Index: linux/kernel/sched.c > === > --- linux.orig/kernel/sched.c > +++ linux/kernel/sched.c > @@ -4689,6 +4689,7 @@ int __sched cond_resched_softirq(void) > BUG_ON(!in_softirq()); > > if (need_resched() && system_state == SYSTEM_RUNNING) { > + WARN_ON(current->state != TASK_RUNNING); > raw_local_irq_disable(); > _local_bh_enable(); > raw_local_irq_enable(); I'm getting NOHZ: local_softirq_pending 02 or 10 every few seconds if the box is doing anything, and the WARN_ON() has yet to trigger. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
Michal, * Michal Piotrowski <[EMAIL PROTECTED]> wrote: > Here is more > > hardirqs last enabled at (30787): [] syscall_exit_work+0x11/0x26 > hardirqs last disabled at (30788): [] ret_from_exception+0x9/0xc > softirqs last enabled at (30202): [] __do_softirq+0xe4/0xea > softirqs last disabled at (30193): [] do_softirq+0x64/0xd1 could you please try the patch below? This is pretty much the only condition under which we can silently 'leak' pending softirqs, and trigger the new warning: if something does cond_resched_softirq() in non-runnable state. (which is a no-no, but nothing enforced this, so it could in theory happen.) So the question is, with this patch applied, do you get these new warnings from sched.c? Ingo --> Subject: [patch] add warning to cond_resched_softirq() From: Ingo Molnar <[EMAIL PROTECTED]> make sure that cond_resched_softirq() is always called with a runnable task - so that we do not leave softirq work pending indefinitely. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- kernel/sched.c |1 + 1 file changed, 1 insertion(+) Index: linux/kernel/sched.c === --- linux.orig/kernel/sched.c +++ linux/kernel/sched.c @@ -4689,6 +4689,7 @@ int __sched cond_resched_softirq(void) BUG_ON(!in_softirq()); if (need_resched() && system_state == SYSTEM_RUNNING) { + WARN_ON(current->state != TASK_RUNNING); raw_local_irq_disable(); _local_bh_enable(); raw_local_irq_enable(); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
Michal Piotrowski napisał(a): > Thomas Gleixner napisał(a): >> Michal, >> >> On Wed, 2007-02-21 at 16:38 +0100, Michal Piotrowski wrote: But you still have those softirq pending messages, right ? >>> Yes >>> >>> (+ new NOHZ: local_softirq_pending 02) >> Yike, that's the timer softirq. >> >> Can you add the patch below, maybe it gives us some useful info. Please >> enable lockdep (your last config had it already) >> > > I hope this helps. > > irq event stamp: 103630856 > hardirqs last enabled at (103630855): [] _spin_unlock_irq+0x22/0x43 > hardirqs last disabled at (103630856): [] > tick_nohz_stop_sched_tick+0x13/0x1fd > softirqs last enabled at (103630824): [] __do_softirq+0xe4/0xea > softirqs last disabled at (103630819): [] do_softirq+0x64/0xd1 > NOHZ: local_softirq_pending 20, 0001 > > (gdb) l *0xc031356a > 0xc031356a is in _spin_unlock_irq (include2/asm/irqflags.h:48). > 43 __asm__ __volatile__("cli" : : : "memory"); > 44 } > 45 > 46 static inline void raw_local_irq_enable(void) > 47 { > 48 __asm__ __volatile__("sti" : : : "memory"); > 49 } > 50 > 51 /* > 52 * Used in the idle loop; sti takes one instruction cycle > (gdb) l *0xc013a00e > 0xc013a00e is in tick_nohz_stop_sched_tick > (/mnt/md0/devel/linux-git/kernel/time/tick-sched.c:158). > 153 ktime_t last_update, expires, now, delta; > 154 int cpu; > 155 > 156 local_irq_save(flags); > 157 > 158 cpu = smp_processor_id(); > 159 ts = _cpu(tick_cpu_sched, cpu); > 160 > 161 if (unlikely(ts->nohz_mode == NOHZ_MODE_INACTIVE)) > 162 goto end; > (gdb) l *0xc01265df > 0xc01265df is in __do_softirq (/mnt/md0/devel/linux-git/kernel/softirq.c:251). > 246 > 247 trace_softirq_exit(); > 248 > 249 account_system_vtime(current); > 250 _local_bh_enable(); > 251 } > 252 > 253 #ifndef __ARCH_HAS_DO_SOFTIRQ > 254 > 255 asmlinkage void do_softirq(void) > (gdb) l *0xc0106a75 > 0xc0106a75 is in do_softirq > (/mnt/md0/devel/linux-git/arch/i386/kernel/irq.c:222). > 217 irqctx->tinfo.previous_esp = current_stack_pointer; > 218 > 219 /* build the stack frame on the softirq stack */ > 220 isp = (u32*) ((char*)irqctx + sizeof(*irqctx)); > 221 > 222 asm volatile( > 223 " xchgl %%ebx,%%esp \n" > 224 " call__do_softirq\n" > 225 " movl%%ebx,%%esp \n" > 226 : "=b"(isp) > Here is more BUG: at /mnt/md0/devel/linux-git/kernel/lockdep.c:2427 check_flags() [] show_trace_log_lvl+0x1a/0x2f [] show_trace+0x12/0x14 [] dump_stack+0x16/0x18 [] check_flags+0x95/0x143 [] lock_acquire+0x29/0x82 [] down_write+0x3a/0x54 [] sys_munmap+0x23/0x3f [] syscall_call+0x7/0xb === irq event stamp: 30788 hardirqs last enabled at (30787): [] syscall_exit_work+0x11/0x26 hardirqs last disabled at (30788): [] ret_from_exception+0x9/0xc softirqs last enabled at (30202): [] __do_softirq+0xe4/0xea softirqs last disabled at (30193): [] do_softirq+0x64/0xd1 irq event stamp: 214661966 hardirqs last enabled at (214661965): [] restore_nocheck+0x12/0x15 hardirqs last disabled at (214661966): [] _spin_lock_irq+0xf/0x48 softirqs last enabled at (214661782): [] __do_softirq+0xe4/0xea softirqs last disabled at (214661755): [] do_softirq+0x64/0xd1 NOHZ: local_softirq_pending 24, 0001 irq event stamp: 137191105 hardirqs last enabled at (137191105): [] default_idle+0x3e/0x59 hardirqs last disabled at (137191104): [] cpu_idle+0xba/0xfb softirqs last enabled at (137190592): [] __do_softirq+0xe4/0xea softirqs last disabled at (137190587): [] do_softirq+0x64/0xd1 NOHZ: local_softirq_pending 20, 0001 irq event stamp: 214661966 hardirqs last enabled at (214661965): [] restore_nocheck+0x12/0x15 hardirqs last disabled at (214661966): [] _spin_lock_irq+0xf/0x48 softirqs last enabled at (214661782): [] __do_softirq+0xe4/0xea softirqs last disabled at (214661755): [] do_softirq+0x64/0xd1 NOHZ: local_softirq_pending 24, 0001 irq event stamp: 137191105 hardirqs last enabled at (137191105): [] default_idle+0x3e/0x59 hardirqs last disabled at (137191104): [] cpu_idle+0xba/0xfb softirqs last enabled at (137190592): [] __do_softirq+0xe4/0xea softirqs last disabled at (137190587): [] do_softirq+0x64/0xd1 NOHZ: local_softirq_pending 02, 0001 irq event stamp: 214661966 hardirqs last enabled at (214661965): [] restore_nocheck+0x12/0x15 hardirqs last disabled at (214661966): [] _spin_lock_irq+0xf/0x48 softirqs last enabled at (214661782): [] __do_softirq+0xe4/0xea softirqs last disabled at (214661755): [] do_softirq+0x64/0xd1 NOHZ: local_softirq_pending 06, 0001 irq event stamp: 137191105 hardirqs last enabled at (137191105): [] default_idle+0x3e/0x59
Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
Michal Piotrowski napisał(a): Thomas Gleixner napisał(a): Michal, On Wed, 2007-02-21 at 16:38 +0100, Michal Piotrowski wrote: But you still have those softirq pending messages, right ? Yes (+ new NOHZ: local_softirq_pending 02) Yike, that's the timer softirq. Can you add the patch below, maybe it gives us some useful info. Please enable lockdep (your last config had it already) I hope this helps. irq event stamp: 103630856 hardirqs last enabled at (103630855): [c031356a] _spin_unlock_irq+0x22/0x43 hardirqs last disabled at (103630856): [c013a00e] tick_nohz_stop_sched_tick+0x13/0x1fd softirqs last enabled at (103630824): [c01265df] __do_softirq+0xe4/0xea softirqs last disabled at (103630819): [c0106a75] do_softirq+0x64/0xd1 NOHZ: local_softirq_pending 20, 0001 (gdb) l *0xc031356a 0xc031356a is in _spin_unlock_irq (include2/asm/irqflags.h:48). 43 __asm__ __volatile__(cli : : : memory); 44 } 45 46 static inline void raw_local_irq_enable(void) 47 { 48 __asm__ __volatile__(sti : : : memory); 49 } 50 51 /* 52 * Used in the idle loop; sti takes one instruction cycle (gdb) l *0xc013a00e 0xc013a00e is in tick_nohz_stop_sched_tick (/mnt/md0/devel/linux-git/kernel/time/tick-sched.c:158). 153 ktime_t last_update, expires, now, delta; 154 int cpu; 155 156 local_irq_save(flags); 157 158 cpu = smp_processor_id(); 159 ts = per_cpu(tick_cpu_sched, cpu); 160 161 if (unlikely(ts-nohz_mode == NOHZ_MODE_INACTIVE)) 162 goto end; (gdb) l *0xc01265df 0xc01265df is in __do_softirq (/mnt/md0/devel/linux-git/kernel/softirq.c:251). 246 247 trace_softirq_exit(); 248 249 account_system_vtime(current); 250 _local_bh_enable(); 251 } 252 253 #ifndef __ARCH_HAS_DO_SOFTIRQ 254 255 asmlinkage void do_softirq(void) (gdb) l *0xc0106a75 0xc0106a75 is in do_softirq (/mnt/md0/devel/linux-git/arch/i386/kernel/irq.c:222). 217 irqctx-tinfo.previous_esp = current_stack_pointer; 218 219 /* build the stack frame on the softirq stack */ 220 isp = (u32*) ((char*)irqctx + sizeof(*irqctx)); 221 222 asm volatile( 223xchgl %%ebx,%%esp \n 224call__do_softirq\n 225movl%%ebx,%%esp \n 226 : =b(isp) Here is more BUG: at /mnt/md0/devel/linux-git/kernel/lockdep.c:2427 check_flags() [c01050f9] show_trace_log_lvl+0x1a/0x2f [c01057e0] show_trace+0x12/0x14 [c0105892] dump_stack+0x16/0x18 [c013abf7] check_flags+0x95/0x143 [c013d978] lock_acquire+0x29/0x82 [c0136974] down_write+0x3a/0x54 [c01630f8] sys_munmap+0x23/0x3f [c0104120] syscall_call+0x7/0xb === irq event stamp: 30788 hardirqs last enabled at (30787): [c0104249] syscall_exit_work+0x11/0x26 hardirqs last disabled at (30788): [c0103fc9] ret_from_exception+0x9/0xc softirqs last enabled at (30202): [c01265df] __do_softirq+0xe4/0xea softirqs last disabled at (30193): [c0106a75] do_softirq+0x64/0xd1 irq event stamp: 214661966 hardirqs last enabled at (214661965): [c0104168] restore_nocheck+0x12/0x15 hardirqs last disabled at (214661966): [c0313174] _spin_lock_irq+0xf/0x48 softirqs last enabled at (214661782): [c01265df] __do_softirq+0xe4/0xea softirqs last disabled at (214661755): [c0106a75] do_softirq+0x64/0xd1 NOHZ: local_softirq_pending 24, 0001 irq event stamp: 137191105 hardirqs last enabled at (137191105): [c0102b80] default_idle+0x3e/0x59 hardirqs last disabled at (137191104): [c0102451] cpu_idle+0xba/0xfb softirqs last enabled at (137190592): [c01265df] __do_softirq+0xe4/0xea softirqs last disabled at (137190587): [c0106a75] do_softirq+0x64/0xd1 NOHZ: local_softirq_pending 20, 0001 irq event stamp: 214661966 hardirqs last enabled at (214661965): [c0104168] restore_nocheck+0x12/0x15 hardirqs last disabled at (214661966): [c0313174] _spin_lock_irq+0xf/0x48 softirqs last enabled at (214661782): [c01265df] __do_softirq+0xe4/0xea softirqs last disabled at (214661755): [c0106a75] do_softirq+0x64/0xd1 NOHZ: local_softirq_pending 24, 0001 irq event stamp: 137191105 hardirqs last enabled at (137191105): [c0102b80] default_idle+0x3e/0x59 hardirqs last disabled at (137191104): [c0102451] cpu_idle+0xba/0xfb softirqs last enabled at (137190592): [c01265df] __do_softirq+0xe4/0xea softirqs last disabled at (137190587): [c0106a75] do_softirq+0x64/0xd1 NOHZ: local_softirq_pending 02, 0001 irq event stamp: 214661966 hardirqs last enabled at (214661965): [c0104168] restore_nocheck+0x12/0x15 hardirqs last disabled at (214661966): [c0313174] _spin_lock_irq+0xf/0x48 softirqs last enabled at (214661782): [c01265df] __do_softirq+0xe4/0xea softirqs last disabled
Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
Michal, * Michal Piotrowski [EMAIL PROTECTED] wrote: Here is more hardirqs last enabled at (30787): [c0104249] syscall_exit_work+0x11/0x26 hardirqs last disabled at (30788): [c0103fc9] ret_from_exception+0x9/0xc softirqs last enabled at (30202): [c01265df] __do_softirq+0xe4/0xea softirqs last disabled at (30193): [c0106a75] do_softirq+0x64/0xd1 could you please try the patch below? This is pretty much the only condition under which we can silently 'leak' pending softirqs, and trigger the new warning: if something does cond_resched_softirq() in non-runnable state. (which is a no-no, but nothing enforced this, so it could in theory happen.) So the question is, with this patch applied, do you get these new warnings from sched.c? Ingo -- Subject: [patch] add warning to cond_resched_softirq() From: Ingo Molnar [EMAIL PROTECTED] make sure that cond_resched_softirq() is always called with a runnable task - so that we do not leave softirq work pending indefinitely. Signed-off-by: Ingo Molnar [EMAIL PROTECTED] --- kernel/sched.c |1 + 1 file changed, 1 insertion(+) Index: linux/kernel/sched.c === --- linux.orig/kernel/sched.c +++ linux/kernel/sched.c @@ -4689,6 +4689,7 @@ int __sched cond_resched_softirq(void) BUG_ON(!in_softirq()); if (need_resched() system_state == SYSTEM_RUNNING) { + WARN_ON(current-state != TASK_RUNNING); raw_local_irq_disable(); _local_bh_enable(); raw_local_irq_enable(); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
On Fri, 2007-02-23 at 07:08 +0100, Ingo Molnar wrote: Index: linux/kernel/sched.c === --- linux.orig/kernel/sched.c +++ linux/kernel/sched.c @@ -4689,6 +4689,7 @@ int __sched cond_resched_softirq(void) BUG_ON(!in_softirq()); if (need_resched() system_state == SYSTEM_RUNNING) { + WARN_ON(current-state != TASK_RUNNING); raw_local_irq_disable(); _local_bh_enable(); raw_local_irq_enable(); I'm getting NOHZ: local_softirq_pending 02 or 10 every few seconds if the box is doing anything, and the WARN_ON() has yet to trigger. -Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
Thomas Gleixner napisał(a): > Michal, > > On Wed, 2007-02-21 at 16:38 +0100, Michal Piotrowski wrote: >>> But you still have those softirq pending messages, right ? >> Yes >> >> (+ new NOHZ: local_softirq_pending 02) > > Yike, that's the timer softirq. > > Can you add the patch below, maybe it gives us some useful info. Please > enable lockdep (your last config had it already) > I hope this helps. irq event stamp: 103630856 hardirqs last enabled at (103630855): [] _spin_unlock_irq+0x22/0x43 hardirqs last disabled at (103630856): [] tick_nohz_stop_sched_tick+0x13/0x1fd softirqs last enabled at (103630824): [] __do_softirq+0xe4/0xea softirqs last disabled at (103630819): [] do_softirq+0x64/0xd1 NOHZ: local_softirq_pending 20, 0001 (gdb) l *0xc031356a 0xc031356a is in _spin_unlock_irq (include2/asm/irqflags.h:48). 43 __asm__ __volatile__("cli" : : : "memory"); 44 } 45 46 static inline void raw_local_irq_enable(void) 47 { 48 __asm__ __volatile__("sti" : : : "memory"); 49 } 50 51 /* 52 * Used in the idle loop; sti takes one instruction cycle (gdb) l *0xc013a00e 0xc013a00e is in tick_nohz_stop_sched_tick (/mnt/md0/devel/linux-git/kernel/time/tick-sched.c:158). 153 ktime_t last_update, expires, now, delta; 154 int cpu; 155 156 local_irq_save(flags); 157 158 cpu = smp_processor_id(); 159 ts = _cpu(tick_cpu_sched, cpu); 160 161 if (unlikely(ts->nohz_mode == NOHZ_MODE_INACTIVE)) 162 goto end; (gdb) l *0xc01265df 0xc01265df is in __do_softirq (/mnt/md0/devel/linux-git/kernel/softirq.c:251). 246 247 trace_softirq_exit(); 248 249 account_system_vtime(current); 250 _local_bh_enable(); 251 } 252 253 #ifndef __ARCH_HAS_DO_SOFTIRQ 254 255 asmlinkage void do_softirq(void) (gdb) l *0xc0106a75 0xc0106a75 is in do_softirq (/mnt/md0/devel/linux-git/arch/i386/kernel/irq.c:222). 217 irqctx->tinfo.previous_esp = current_stack_pointer; 218 219 /* build the stack frame on the softirq stack */ 220 isp = (u32*) ((char*)irqctx + sizeof(*irqctx)); 221 222 asm volatile( 223 " xchgl %%ebx,%%esp \n" 224 " call__do_softirq\n" 225 " movl%%ebx,%%esp \n" 226 : "=b"(isp) http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc1/git-config http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc1/git-dmesg Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
Michal, On Wed, 2007-02-21 at 16:38 +0100, Michal Piotrowski wrote: > > But you still have those softirq pending messages, right ? > > Yes > > (+ new NOHZ: local_softirq_pending 02) Yike, that's the timer softirq. Can you add the patch below, maybe it gives us some useful info. Please enable lockdep (your last config had it already) Thanks, tglx diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 512a4a9..cc705c7 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -165,9 +165,11 @@ void tick_nohz_stop_sched_tick(void) goto end; cpu = smp_processor_id(); - if (unlikely(local_softirq_pending())) - printk(KERN_ERR "NOHZ: local_softirq_pending %02x\n", - local_softirq_pending()); + if (unlikely(local_softirq_pending())) { + print_irqtrace_events(current); + printk(KERN_ERR "NOHZ: local_softirq_pending %02x, %08x\n", + local_softirq_pending(), preempt_count()); + } now = ktime_get(); /* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
On 21/02/07, Thomas Gleixner <[EMAIL PROTECTED]> wrote: On Tue, 2007-02-20 at 23:37 +0100, Michal Piotrowski wrote: > On 20/02/07, Thomas Gleixner <[EMAIL PROTECTED]> wrote: > > On Tue, 2007-02-20 at 19:54 +0100, Michal Piotrowski wrote: > > > > > > Might it be 6ba9b346e1e0eca65ec589d32de3a9fe32dc5de6 commit? > > > > I doubt that it is, but can you revert it ? > > I'm using the latest kernel without this patch since 3 hours. > > So far so good. But you still have those softirq pending messages, right ? Yes (+ new NOHZ: local_softirq_pending 02) http://www.ussg.iu.edu/hypermail/linux/kernel/0702.2/1944.html I think those are pointing to the root cause of this. Still no idea how to get hold of them. All my systems refuse to produce that. Hrmpf. tglx Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
On Tue, 2007-02-20 at 23:37 +0100, Michal Piotrowski wrote: > On 20/02/07, Thomas Gleixner <[EMAIL PROTECTED]> wrote: > > On Tue, 2007-02-20 at 19:54 +0100, Michal Piotrowski wrote: > > > > > > Might it be 6ba9b346e1e0eca65ec589d32de3a9fe32dc5de6 commit? > > > > I doubt that it is, but can you revert it ? > > I'm using the latest kernel without this patch since 3 hours. > > So far so good. But you still have those softirq pending messages, right ? I think those are pointing to the root cause of this. Still no idea how to get hold of them. All my systems refuse to produce that. Hrmpf. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
On Tue, 2007-02-20 at 23:37 +0100, Michal Piotrowski wrote: On 20/02/07, Thomas Gleixner [EMAIL PROTECTED] wrote: On Tue, 2007-02-20 at 19:54 +0100, Michal Piotrowski wrote: Might it be 6ba9b346e1e0eca65ec589d32de3a9fe32dc5de6 commit? I doubt that it is, but can you revert it ? I'm using the latest kernel without this patch since 3 hours. So far so good. But you still have those softirq pending messages, right ? I think those are pointing to the root cause of this. Still no idea how to get hold of them. All my systems refuse to produce that. Hrmpf. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
On 21/02/07, Thomas Gleixner [EMAIL PROTECTED] wrote: On Tue, 2007-02-20 at 23:37 +0100, Michal Piotrowski wrote: On 20/02/07, Thomas Gleixner [EMAIL PROTECTED] wrote: On Tue, 2007-02-20 at 19:54 +0100, Michal Piotrowski wrote: Might it be 6ba9b346e1e0eca65ec589d32de3a9fe32dc5de6 commit? I doubt that it is, but can you revert it ? I'm using the latest kernel without this patch since 3 hours. So far so good. But you still have those softirq pending messages, right ? Yes (+ new NOHZ: local_softirq_pending 02) http://www.ussg.iu.edu/hypermail/linux/kernel/0702.2/1944.html I think those are pointing to the root cause of this. Still no idea how to get hold of them. All my systems refuse to produce that. Hrmpf. tglx Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
Michal, On Wed, 2007-02-21 at 16:38 +0100, Michal Piotrowski wrote: But you still have those softirq pending messages, right ? Yes (+ new NOHZ: local_softirq_pending 02) Yike, that's the timer softirq. Can you add the patch below, maybe it gives us some useful info. Please enable lockdep (your last config had it already) Thanks, tglx diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 512a4a9..cc705c7 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -165,9 +165,11 @@ void tick_nohz_stop_sched_tick(void) goto end; cpu = smp_processor_id(); - if (unlikely(local_softirq_pending())) - printk(KERN_ERR NOHZ: local_softirq_pending %02x\n, - local_softirq_pending()); + if (unlikely(local_softirq_pending())) { + print_irqtrace_events(current); + printk(KERN_ERR NOHZ: local_softirq_pending %02x, %08x\n, + local_softirq_pending(), preempt_count()); + } now = ktime_get(); /* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
Thomas Gleixner napisał(a): Michal, On Wed, 2007-02-21 at 16:38 +0100, Michal Piotrowski wrote: But you still have those softirq pending messages, right ? Yes (+ new NOHZ: local_softirq_pending 02) Yike, that's the timer softirq. Can you add the patch below, maybe it gives us some useful info. Please enable lockdep (your last config had it already) I hope this helps. irq event stamp: 103630856 hardirqs last enabled at (103630855): [c031356a] _spin_unlock_irq+0x22/0x43 hardirqs last disabled at (103630856): [c013a00e] tick_nohz_stop_sched_tick+0x13/0x1fd softirqs last enabled at (103630824): [c01265df] __do_softirq+0xe4/0xea softirqs last disabled at (103630819): [c0106a75] do_softirq+0x64/0xd1 NOHZ: local_softirq_pending 20, 0001 (gdb) l *0xc031356a 0xc031356a is in _spin_unlock_irq (include2/asm/irqflags.h:48). 43 __asm__ __volatile__(cli : : : memory); 44 } 45 46 static inline void raw_local_irq_enable(void) 47 { 48 __asm__ __volatile__(sti : : : memory); 49 } 50 51 /* 52 * Used in the idle loop; sti takes one instruction cycle (gdb) l *0xc013a00e 0xc013a00e is in tick_nohz_stop_sched_tick (/mnt/md0/devel/linux-git/kernel/time/tick-sched.c:158). 153 ktime_t last_update, expires, now, delta; 154 int cpu; 155 156 local_irq_save(flags); 157 158 cpu = smp_processor_id(); 159 ts = per_cpu(tick_cpu_sched, cpu); 160 161 if (unlikely(ts-nohz_mode == NOHZ_MODE_INACTIVE)) 162 goto end; (gdb) l *0xc01265df 0xc01265df is in __do_softirq (/mnt/md0/devel/linux-git/kernel/softirq.c:251). 246 247 trace_softirq_exit(); 248 249 account_system_vtime(current); 250 _local_bh_enable(); 251 } 252 253 #ifndef __ARCH_HAS_DO_SOFTIRQ 254 255 asmlinkage void do_softirq(void) (gdb) l *0xc0106a75 0xc0106a75 is in do_softirq (/mnt/md0/devel/linux-git/arch/i386/kernel/irq.c:222). 217 irqctx-tinfo.previous_esp = current_stack_pointer; 218 219 /* build the stack frame on the softirq stack */ 220 isp = (u32*) ((char*)irqctx + sizeof(*irqctx)); 221 222 asm volatile( 223xchgl %%ebx,%%esp \n 224call__do_softirq\n 225movl%%ebx,%%esp \n 226 : =b(isp) http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc1/git-config http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc1/git-dmesg Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
On 20/02/07, Thomas Gleixner <[EMAIL PROTECTED]> wrote: On Tue, 2007-02-20 at 19:54 +0100, Michal Piotrowski wrote: > > Might it be 6ba9b346e1e0eca65ec589d32de3a9fe32dc5de6 commit? I doubt that it is, but can you revert it ? I'm using the latest kernel without this patch since 3 hours. So far so good. Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
On Tue, 2007-02-20 at 19:54 +0100, Michal Piotrowski wrote: > > Might it be 6ba9b346e1e0eca65ec589d32de3a9fe32dc5de6 commit? I doubt that it is, but can you revert it ? tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
Hi, I have a problem with timers. NOHZ: local_softirq_pending 20 NOHZ: local_softirq_pending 20 BUG: soft lockup detected on CPU#0! [] show_trace_log_lvl+0x1a/0x2f [] show_trace+0x12/0x14 [] dump_stack+0x16/0x18 [] softlockup_tick+0xa7/0xb9 [] run_local_timers+0x12/0x14 [] update_process_times+0x3e/0x63 [] tick_sched_timer+0x66/0xaa [] hrtimer_interrupt+0x113/0x19d [] smp_apic_timer_interrupt+0x77/0x89 [] apic_timer_interrupt+0x33/0x38 [] scsi_dispatch_cmd+0x1a8/0x1e5 [] scsi_request_fn+0x224/0x2ec [] __generic_unplug_device+0x20/0x23 [] generic_unplug_device+0x18/0x26 [] blk_backing_dev_unplug+0x76/0x7e [] sync_buffer+0x2e/0x37 [] __wait_on_bit+0x36/0x5d [] out_of_line_wait_on_bit+0x5b/0x63 [] __wait_on_buffer+0x27/0x2b [] sync_dirty_buffer+0xa0/0xd6 [] journal_commit_transaction+0xca4/0x11d3 [] kjournald+0xab/0x1e8 [] kthread+0xb5/0xe0 [] kernel_thread_helper+0x7/0x10 === Clocksource tsc unstable (delta = 18743203260 ns) Time: acpi_pm clocksource has been installed. http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.20-git15/git-dmesg http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.20-git15/git-config Might it be 6ba9b346e1e0eca65ec589d32de3a9fe32dc5de6 commit? Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
Hi, I have a problem with timers. NOHZ: local_softirq_pending 20 NOHZ: local_softirq_pending 20 BUG: soft lockup detected on CPU#0! [c01050f9] show_trace_log_lvl+0x1a/0x2f [c01057e0] show_trace+0x12/0x14 [c0105892] dump_stack+0x16/0x18 [c0151214] softlockup_tick+0xa7/0xb9 [c012a18f] run_local_timers+0x12/0x14 [c012a54b] update_process_times+0x3e/0x63 [c0139e34] tick_sched_timer+0x66/0xaa [c01366d2] hrtimer_interrupt+0x113/0x19d [c0114c45] smp_apic_timer_interrupt+0x77/0x89 [c0104beb] apic_timer_interrupt+0x33/0x38 [c026b945] scsi_dispatch_cmd+0x1a8/0x1e5 [c026ff86] scsi_request_fn+0x224/0x2ec [c01e1779] __generic_unplug_device+0x20/0x23 [c01e1b49] generic_unplug_device+0x18/0x26 [c01e2c48] blk_backing_dev_unplug+0x76/0x7e [c018e289] sync_buffer+0x2e/0x37 [c0311041] __wait_on_bit+0x36/0x5d [c03110c3] out_of_line_wait_on_bit+0x5b/0x63 [c018e1f9] __wait_on_buffer+0x27/0x2b [c018ec74] sync_dirty_buffer+0xa0/0xd6 [c01bc5c1] journal_commit_transaction+0xca4/0x11d3 [c01bff86] kjournald+0xab/0x1e8 [c0133639] kthread+0xb5/0xe0 [c0104d8f] kernel_thread_helper+0x7/0x10 === Clocksource tsc unstable (delta = 18743203260 ns) Time: acpi_pm clocksource has been installed. http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.20-git15/git-dmesg http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.20-git15/git-config Might it be 6ba9b346e1e0eca65ec589d32de3a9fe32dc5de6 commit? Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
On Tue, 2007-02-20 at 19:54 +0100, Michal Piotrowski wrote: Might it be 6ba9b346e1e0eca65ec589d32de3a9fe32dc5de6 commit? I doubt that it is, but can you revert it ? tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?
On 20/02/07, Thomas Gleixner [EMAIL PROTECTED] wrote: On Tue, 2007-02-20 at 19:54 +0100, Michal Piotrowski wrote: Might it be 6ba9b346e1e0eca65ec589d32de3a9fe32dc5de6 commit? I doubt that it is, but can you revert it ? I'm using the latest kernel without this patch since 3 hours. So far so good. Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/