Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Thu, 10 Jan 2008, Parag Warudkar wrote: > On Jan 9, 2008 6:56 AM, Thomas Gleixner <[EMAIL PROTECTED]> wrote: > > > Can you apply the patch below + the debug patch which prints the timer > > stats on softlockup and provide the output of this. > > Applied to today's git and running for 21 hours - no recurrence yet > even with 1.2 wakeups per second. > > I will re-open this thread if it bites again. Would be interesting which commit fixed that problem. I'm a bit worried about self healing bugs. tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Thu, 10 Jan 2008, Parag Warudkar wrote: On Jan 9, 2008 6:56 AM, Thomas Gleixner [EMAIL PROTECTED] wrote: Can you apply the patch below + the debug patch which prints the timer stats on softlockup and provide the output of this. Applied to today's git and running for 21 hours - no recurrence yet even with 1.2 wakeups per second. I will re-open this thread if it bites again. Would be interesting which commit fixed that problem. I'm a bit worried about self healing bugs. tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Jan 9, 2008 6:56 AM, Thomas Gleixner <[EMAIL PROTECTED]> wrote: > Can you apply the patch below + the debug patch which prints the timer > stats on softlockup and provide the output of this. Applied to today's git and running for 21 hours - no recurrence yet even with 1.2 wakeups per second. I will re-open this thread if it bites again. Thanks Parag -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Jan 9, 2008 6:56 AM, Thomas Gleixner [EMAIL PROTECTED] wrote: Can you apply the patch below + the debug patch which prints the timer stats on softlockup and provide the output of this. Applied to today's git and running for 21 hours - no recurrence yet even with 1.2 wakeups per second. I will re-open this thread if it bites again. Thanks Parag -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Mon, 17 Dec 2007, Thomas Gleixner wrote: > I try to come up with some more debug patches tomorrow. Sorry took a bit longer than a day :( Can you apply the patch below + the debug patch which prints the timer stats on softlockup and provide the output of this. Thanks, tglx diff --git a/include/linux/tick.h b/include/linux/tick.h index f4a1395..22d921f 100644 --- a/include/linux/tick.h +++ b/include/linux/tick.h @@ -39,6 +39,8 @@ enum tick_nohz_mode { * @idle_calls:Total number of idle calls * @idle_sleeps: Number of idle calls, where the sched tick was stopped * @idle_entrytime:Time when the idle call was entered + * @idle_waketime: Time when the idle was interrupted + * @idle_exittime: Time when the idle state was left * @idle_sleeptime:Sum of the time slept in idle with sched tick stopped * @sleep_length: Duration of the current idle sleep */ @@ -52,6 +54,8 @@ struct tick_sched { unsigned long idle_calls; unsigned long idle_sleeps; ktime_t idle_entrytime; + ktime_t idle_waketime; + ktime_t idle_exittime; ktime_t idle_sleeptime; ktime_t sleep_length; unsigned long last_jiffies; diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index cb89fa8..206a436 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -133,14 +133,15 @@ void tick_nohz_update_jiffies(void) if (!ts->tick_stopped) return; - touch_softlockup_watchdog(); - cpu_clear(cpu, nohz_cpu_mask); now = ktime_get(); local_irq_save(flags); tick_do_update_jiffies64(now); local_irq_restore(flags); + + ts->idle_waketime = now; + touch_softlockup_watchdog(); } /** @@ -369,6 +370,7 @@ void tick_nohz_restart_sched_tick(void) * Cancel the scheduled timer and restore the tick */ ts->tick_stopped = 0; + ts->idle_exittime = now; hrtimer_cancel(>sched_timer); ts->sched_timer.expires = ts->idle_tick; diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c index 12c5f4c..d3d94c1 100644 --- a/kernel/time/timer_list.c +++ b/kernel/time/timer_list.c @@ -166,6 +166,8 @@ static void print_cpu(struct seq_file *m, int cpu, u64 now) P(idle_calls); P(idle_sleeps); P_ns(idle_entrytime); + P_ns(idle_waketime); + P_ns(idle_exittime); P_ns(idle_sleeptime); P(last_jiffies); P(next_jiffies); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Mon, 17 Dec 2007, Thomas Gleixner wrote: I try to come up with some more debug patches tomorrow. Sorry took a bit longer than a day :( Can you apply the patch below + the debug patch which prints the timer stats on softlockup and provide the output of this. Thanks, tglx diff --git a/include/linux/tick.h b/include/linux/tick.h index f4a1395..22d921f 100644 --- a/include/linux/tick.h +++ b/include/linux/tick.h @@ -39,6 +39,8 @@ enum tick_nohz_mode { * @idle_calls:Total number of idle calls * @idle_sleeps: Number of idle calls, where the sched tick was stopped * @idle_entrytime:Time when the idle call was entered + * @idle_waketime: Time when the idle was interrupted + * @idle_exittime: Time when the idle state was left * @idle_sleeptime:Sum of the time slept in idle with sched tick stopped * @sleep_length: Duration of the current idle sleep */ @@ -52,6 +54,8 @@ struct tick_sched { unsigned long idle_calls; unsigned long idle_sleeps; ktime_t idle_entrytime; + ktime_t idle_waketime; + ktime_t idle_exittime; ktime_t idle_sleeptime; ktime_t sleep_length; unsigned long last_jiffies; diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index cb89fa8..206a436 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -133,14 +133,15 @@ void tick_nohz_update_jiffies(void) if (!ts-tick_stopped) return; - touch_softlockup_watchdog(); - cpu_clear(cpu, nohz_cpu_mask); now = ktime_get(); local_irq_save(flags); tick_do_update_jiffies64(now); local_irq_restore(flags); + + ts-idle_waketime = now; + touch_softlockup_watchdog(); } /** @@ -369,6 +370,7 @@ void tick_nohz_restart_sched_tick(void) * Cancel the scheduled timer and restore the tick */ ts-tick_stopped = 0; + ts-idle_exittime = now; hrtimer_cancel(ts-sched_timer); ts-sched_timer.expires = ts-idle_tick; diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c index 12c5f4c..d3d94c1 100644 --- a/kernel/time/timer_list.c +++ b/kernel/time/timer_list.c @@ -166,6 +166,8 @@ static void print_cpu(struct seq_file *m, int cpu, u64 now) P(idle_calls); P(idle_sleeps); P_ns(idle_entrytime); + P_ns(idle_waketime); + P_ns(idle_exittime); P_ns(idle_sleeptime); P(last_jiffies); P(next_jiffies); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Mon, 17 Dec 2007, Parag Warudkar wrote: > On Dec 17, 2007 3:05 AM, Thomas Gleixner <[EMAIL PROTECTED]> wrote: > > Sigh. You did not have the debug patch applied anymore, which printks > > the timer_list data ? Can you apply it again and provide the output > > please ? > > > > This keeps getting more and more weird - This time I was running with > CONFIG_CPU_IDLE=N and I have ton of soft lockups after 14hr uptime. It's more than weird. > now at 39080147128907 nsecs > .idle_entrytime : 39080384013047 nsecs The time, when we entered idle on CPU#0 is in the future. > .tick_stopped : 0 But the tick is not stopped, that means CPU#0 has work to do On CPU#1 idle entry just happened: > .idle_entrytime : 39079996603653 nsecs > now at 40490254040892 nsecs > .idle_entrytime : 40490492012833 nsecs Again, idle_entry on CPU#0 is in the future. On CPU#1 the idle entry was at: > .idle_entrytime : 40489996578090 nsecs which means: 0.257462802 sec. ago > now at 40700144217096 nsecs Aarg. On CPU#0 this is consistently in the future: > .idle_entrytime : 40700372012887 nsecs I'm really confused. > .idle_entrytime : 4066620694 nsecs > .idle_sleeptime : 40540536046589 nsecs > .last_jiffies : 1010 > .next_jiffies : 10100467 > .idle_expires : 4070186400 nsecs > jiffies: 10100158 > > > Tick Device: mode: 1 > Clock Event Device: hpet > max_delta_ns: 2147483647 > min_delta_ns: 3352 > mult: 61496110 > shift: 32 > mode: 3 > next_event: 4070069200 nsecs > set_next_event: hpet_legacy_next_event > set_mode: hpet_legacy_set_mode > event_handler: tick_handle_oneshot_broadcast > tick_broadcast_mask: 0003 Here is the next inconsistent data: > tick_broadcast_oneshot_mask: 0003 CPU#1 just woke up. That means the broadcast oneshot mask must be cleared for CPU#1. Some real strange thing is going on in your box. I try to come up with some more debug patches tomorrow. tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Dec 17, 2007 3:05 AM, Thomas Gleixner <[EMAIL PROTECTED]> wrote: > > On Sun, 16 Dec 2007, Parag Warudkar wrote: > > > On Dec 16, 2007 12:15 AM, Parag Warudkar <[EMAIL PROTECTED]> wrote: > > > On Sat, 15 Dec 2007, Parag Warudkar wrote: > > > > > > >> I will run it for a little longer just to be sure - but I don't think > > > >> it > > > >> will be a problem. > > > > > > No problems for last 10 hours - I consider this fixed. > > > > > > > Arghh - spoke 8 hours too soon. I left it running overnight and > > morning I see a bunch of softlockups - so NO NOT FIXED. > > > > Parag > > > > BUG: soft lockup - CPU#1 stuck for 13s! [swapper:0] > > > > Pid: 0, comm: swapper Not tainted (2.6.24-rc5 #21) > > EIP: 0060:[] EFLAGS: 0206 CPU: 1 > > EIP is at acpi_idle_enter_simple+0x166/0x1d0 > > EAX: f7829f88 EBX: 0dab ECX: 0266 EDX: > > ESI: EDI: 00c056e5 EBP: 00c0493a ESP: f7829f88 > > DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 > > CR0: 8005003b CR2: 080f6c78 CR3: 00718000 CR4: 06d0 > > DR0: DR1: DR2: DR3: > > DR6: 0ff0 DR7: 0400 > > Sigh. You did not have the debug patch applied anymore, which printks > the timer_list data ? Can you apply it again and provide the output > please ? > This keeps getting more and more weird - This time I was running with CONFIG_CPU_IDLE=N and I have ton of soft lockups after 14hr uptime. The timer_list data for a few lockups is below. If you need the full thing grab it from http://warudkars.net/messages.bz2 . BUG: soft lockup - CPU#1 stuck for 12s! [kjournald:424] Pid: 424, comm: kjournald Not tainted (2.6.24-rc5 #24) EIP: 0060:[] EFLAGS: 0206 CPU: 1 EIP is at __journal_remove_journal_head+0x18/0xe8 EAX: dc75e1c0 EBX: dc75e1c0 ECX: f6d539f0 EDX: dc75e1f8 ESI: dc75e1f8 EDI: 0009 EBP: f712a000 ESP: f7157f28 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: b7d18eb4 CR3: 37352000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 [] __journal_unfile_buffer+0x8/0x11 [] journal_put_journal_head+0x3d/0x4b [] journal_commit_transaction+0x75d/0xbc9 [] lock_timer_base+0x19/0x35 [] kjournald+0xa2/0x1d0 [] autoremove_wake_function+0x0/0x35 [] kjournald+0x0/0x1d0 [] kthread+0x38/0x5d [] kthread+0x0/0x5d [] kernel_thread_helper+0x7/0x10 === Timer List Version: v0.3 HRTIMER_MAX_CLOCK_BASES: 2 now at 39080147128907 nsecs cpu: 0 clock 0: .index: 0 .resolution: 1 nsecs .get_time: ktime_get_real .offset: 1197846065352886518 nsecs active timers: clock 1: .index: 1 .resolution: 1 nsecs .get_time: ktime_get .offset: 0 nsecs active timers: #0: , tick_sched_timer, S:01, tick_nohz_stop_sched_tick, swapper/0 # expires at 3908027200 nsecs [in 124871093 nsecs] #1: , it_real_fn, S:01, do_setitimer, syslogd/2522 # expires at 3908647554 nsecs [in 6328414426 nsecs] #2: , hrtimer_wakeup, S:01, do_nanosleep, atd/3039 # expires at 39332442305024 nsecs [in 252295176117 nsecs] .expires_next : 3908032000 nsecs .hres_active: 1 .nr_events : 81214 .nohz_mode : 2 .idle_tick : 390780 nsecs .tick_stopped : 0 .idle_jiffies : 9694501 .idle_calls : 171123 .idle_sleeps: 106664 .idle_entrytime : 39080384013047 nsecs .idle_sleeptime : 38923743508612 nsecs .last_jiffies : 9695101 .next_jiffies : 9695250 .idle_expires : 3907999600 nsecs jiffies: 9695106 cpu: 1 clock 0: .index: 0 .resolution: 1 nsecs .get_time: ktime_get_real .offset: 1197846065352886518 nsecs active timers: clock 1: .index: 1 .resolution: 1 nsecs .get_time: ktime_get .offset: 0 nsecs active timers: #0: , hrtimer_wakeup, S:01, do_nanosleep, crond/2946 # expires at 39115730211364 nsecs [in 35583082457 nsecs] #1: , hrtimer_wakeup, S:01, do_nanosleep, smartd/3480 # expires at 39641838017337 nsecs [in 561690888430 nsecs] .expires_next : 3907999700 nsecs .hres_active: 1 .nr_events : 48789 .nohz_mode : 2 .idle_tick : 3906686500 nsecs .tick_stopped : 0 .idle_jiffies : 9691717 .idle_calls : 125065 .idle_sleeps: 79815 .idle_entrytime : 39079996603653 nsecs .idle_sleeptime : 38923914717954 nsecs .last_jiffies : 9695000 .next_jiffies : 9695467 .idle_expires : 3908186400 nsecs jiffies: 9695162 Tick Device: mode: 1 Clock Event Device: hpet max_delta_ns: 2147483647 min_delta_ns: 3352 mult: 61496110 shift: 32 mode: 3 next_event: 3908070400 nsecs set_next_event: hpet_legacy_next_event set_mode: hpet_legacy_set_mode event_handler: tick_handle_oneshot_broadcast tick_broadcast_mask: 0003 tick_broadcast_oneshot_mask: 0001 Tick Device: mode: 1 Clock Event Device: lapic max_delta_ns: 805378776 min_delta_ns: 1440 mult: 44735215 shift: 32 mode:
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Sun, 16 Dec 2007, Parag Warudkar wrote: > On Dec 16, 2007 12:15 AM, Parag Warudkar <[EMAIL PROTECTED]> wrote: > > On Sat, 15 Dec 2007, Parag Warudkar wrote: > > > > >> I will run it for a little longer just to be sure - but I don't think it > > >> will be a problem. > > > > No problems for last 10 hours - I consider this fixed. > > > > Arghh - spoke 8 hours too soon. I left it running overnight and > morning I see a bunch of softlockups - so NO NOT FIXED. > > Parag > > BUG: soft lockup - CPU#1 stuck for 13s! [swapper:0] > > Pid: 0, comm: swapper Not tainted (2.6.24-rc5 #21) > EIP: 0060:[] EFLAGS: 0206 CPU: 1 > EIP is at acpi_idle_enter_simple+0x166/0x1d0 > EAX: f7829f88 EBX: 0dab ECX: 0266 EDX: > ESI: EDI: 00c056e5 EBP: 00c0493a ESP: f7829f88 > DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 > CR0: 8005003b CR2: 080f6c78 CR3: 00718000 CR4: 06d0 > DR0: DR1: DR2: DR3: > DR6: 0ff0 DR7: 0400 Sigh. You did not have the debug patch applied anymore, which printks the timer_list data ? Can you apply it again and provide the output please ? Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Sun, 16 Dec 2007, Parag Warudkar wrote: On Dec 16, 2007 12:15 AM, Parag Warudkar [EMAIL PROTECTED] wrote: On Sat, 15 Dec 2007, Parag Warudkar wrote: I will run it for a little longer just to be sure - but I don't think it will be a problem. No problems for last 10 hours - I consider this fixed. Arghh - spoke 8 hours too soon. I left it running overnight and morning I see a bunch of softlockups - so NO NOT FIXED. Parag BUG: soft lockup - CPU#1 stuck for 13s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-rc5 #21) EIP: 0060:[c0533ca6] EFLAGS: 0206 CPU: 1 EIP is at acpi_idle_enter_simple+0x166/0x1d0 EAX: f7829f88 EBX: 0dab ECX: 0266 EDX: ESI: EDI: 00c056e5 EBP: 00c0493a ESP: f7829f88 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: 080f6c78 CR3: 00718000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Sigh. You did not have the debug patch applied anymore, which printks the timer_list data ? Can you apply it again and provide the output please ? Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Dec 17, 2007 3:05 AM, Thomas Gleixner [EMAIL PROTECTED] wrote: On Sun, 16 Dec 2007, Parag Warudkar wrote: On Dec 16, 2007 12:15 AM, Parag Warudkar [EMAIL PROTECTED] wrote: On Sat, 15 Dec 2007, Parag Warudkar wrote: I will run it for a little longer just to be sure - but I don't think it will be a problem. No problems for last 10 hours - I consider this fixed. Arghh - spoke 8 hours too soon. I left it running overnight and morning I see a bunch of softlockups - so NO NOT FIXED. Parag BUG: soft lockup - CPU#1 stuck for 13s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-rc5 #21) EIP: 0060:[c0533ca6] EFLAGS: 0206 CPU: 1 EIP is at acpi_idle_enter_simple+0x166/0x1d0 EAX: f7829f88 EBX: 0dab ECX: 0266 EDX: ESI: EDI: 00c056e5 EBP: 00c0493a ESP: f7829f88 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: 080f6c78 CR3: 00718000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Sigh. You did not have the debug patch applied anymore, which printks the timer_list data ? Can you apply it again and provide the output please ? This keeps getting more and more weird - This time I was running with CONFIG_CPU_IDLE=N and I have ton of soft lockups after 14hr uptime. The timer_list data for a few lockups is below. If you need the full thing grab it from http://warudkars.net/messages.bz2 . BUG: soft lockup - CPU#1 stuck for 12s! [kjournald:424] Pid: 424, comm: kjournald Not tainted (2.6.24-rc5 #24) EIP: 0060:[c04bbd57] EFLAGS: 0206 CPU: 1 EIP is at __journal_remove_journal_head+0x18/0xe8 EAX: dc75e1c0 EBX: dc75e1c0 ECX: f6d539f0 EDX: dc75e1f8 ESI: dc75e1f8 EDI: 0009 EBP: f712a000 ESP: f7157f28 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: b7d18eb4 CR3: 37352000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 [c04b8680] __journal_unfile_buffer+0x8/0x11 [c04bbe64] journal_put_journal_head+0x3d/0x4b [c04ba254] journal_commit_transaction+0x75d/0xbc9 [c0428246] lock_timer_base+0x19/0x35 [c04bcda0] kjournald+0xa2/0x1d0 [c0430e29] autoremove_wake_function+0x0/0x35 [c04bccfe] kjournald+0x0/0x1d0 [c0430d63] kthread+0x38/0x5d [c0430d2b] kthread+0x0/0x5d [c0404a0f] kernel_thread_helper+0x7/0x10 === Timer List Version: v0.3 HRTIMER_MAX_CLOCK_BASES: 2 now at 39080147128907 nsecs cpu: 0 clock 0: .index: 0 .resolution: 1 nsecs .get_time: ktime_get_real .offset: 1197846065352886518 nsecs active timers: clock 1: .index: 1 .resolution: 1 nsecs .get_time: ktime_get .offset: 0 nsecs active timers: #0: f7157dd4, tick_sched_timer, S:01, tick_nohz_stop_sched_tick, swapper/0 # expires at 3908027200 nsecs [in 124871093 nsecs] #1: f7157dd4, it_real_fn, S:01, do_setitimer, syslogd/2522 # expires at 3908647554 nsecs [in 6328414426 nsecs] #2: f7157dd4, hrtimer_wakeup, S:01, do_nanosleep, atd/3039 # expires at 39332442305024 nsecs [in 252295176117 nsecs] .expires_next : 3908032000 nsecs .hres_active: 1 .nr_events : 81214 .nohz_mode : 2 .idle_tick : 390780 nsecs .tick_stopped : 0 .idle_jiffies : 9694501 .idle_calls : 171123 .idle_sleeps: 106664 .idle_entrytime : 39080384013047 nsecs .idle_sleeptime : 38923743508612 nsecs .last_jiffies : 9695101 .next_jiffies : 9695250 .idle_expires : 3907999600 nsecs jiffies: 9695106 cpu: 1 clock 0: .index: 0 .resolution: 1 nsecs .get_time: ktime_get_real .offset: 1197846065352886518 nsecs active timers: clock 1: .index: 1 .resolution: 1 nsecs .get_time: ktime_get .offset: 0 nsecs active timers: #0: f7157dd4, hrtimer_wakeup, S:01, do_nanosleep, crond/2946 # expires at 39115730211364 nsecs [in 35583082457 nsecs] #1: f7157dd4, hrtimer_wakeup, S:01, do_nanosleep, smartd/3480 # expires at 39641838017337 nsecs [in 561690888430 nsecs] .expires_next : 3907999700 nsecs .hres_active: 1 .nr_events : 48789 .nohz_mode : 2 .idle_tick : 3906686500 nsecs .tick_stopped : 0 .idle_jiffies : 9691717 .idle_calls : 125065 .idle_sleeps: 79815 .idle_entrytime : 39079996603653 nsecs .idle_sleeptime : 38923914717954 nsecs .last_jiffies : 9695000 .next_jiffies : 9695467 .idle_expires : 3908186400 nsecs jiffies: 9695162 Tick Device: mode: 1 Clock Event Device: hpet max_delta_ns: 2147483647 min_delta_ns: 3352 mult: 61496110 shift: 32 mode: 3 next_event: 3908070400 nsecs set_next_event: hpet_legacy_next_event set_mode: hpet_legacy_set_mode event_handler: tick_handle_oneshot_broadcast tick_broadcast_mask: 0003 tick_broadcast_oneshot_mask: 0001 Tick Device: mode: 1 Clock Event Device: lapic max_delta_ns: 805378776 min_delta_ns: 1440
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Mon, 17 Dec 2007, Parag Warudkar wrote: On Dec 17, 2007 3:05 AM, Thomas Gleixner [EMAIL PROTECTED] wrote: Sigh. You did not have the debug patch applied anymore, which printks the timer_list data ? Can you apply it again and provide the output please ? This keeps getting more and more weird - This time I was running with CONFIG_CPU_IDLE=N and I have ton of soft lockups after 14hr uptime. It's more than weird. now at 39080147128907 nsecs .idle_entrytime : 39080384013047 nsecs The time, when we entered idle on CPU#0 is in the future. .tick_stopped : 0 But the tick is not stopped, that means CPU#0 has work to do On CPU#1 idle entry just happened: .idle_entrytime : 39079996603653 nsecs now at 40490254040892 nsecs .idle_entrytime : 40490492012833 nsecs Again, idle_entry on CPU#0 is in the future. On CPU#1 the idle entry was at: .idle_entrytime : 40489996578090 nsecs which means: 0.257462802 sec. ago now at 40700144217096 nsecs Aarg. On CPU#0 this is consistently in the future: .idle_entrytime : 40700372012887 nsecs I'm really confused. .idle_entrytime : 4066620694 nsecs .idle_sleeptime : 40540536046589 nsecs .last_jiffies : 1010 .next_jiffies : 10100467 .idle_expires : 4070186400 nsecs jiffies: 10100158 Tick Device: mode: 1 Clock Event Device: hpet max_delta_ns: 2147483647 min_delta_ns: 3352 mult: 61496110 shift: 32 mode: 3 next_event: 4070069200 nsecs set_next_event: hpet_legacy_next_event set_mode: hpet_legacy_set_mode event_handler: tick_handle_oneshot_broadcast tick_broadcast_mask: 0003 Here is the next inconsistent data: tick_broadcast_oneshot_mask: 0003 CPU#1 just woke up. That means the broadcast oneshot mask must be cleared for CPU#1. Some real strange thing is going on in your box. I try to come up with some more debug patches tomorrow. tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Dec 16, 2007 12:15 AM, Parag Warudkar <[EMAIL PROTECTED]> wrote: > On Sat, 15 Dec 2007, Parag Warudkar wrote: > > >> I will run it for a little longer just to be sure - but I don't think it > >> will be a problem. > > No problems for last 10 hours - I consider this fixed. > Arghh - spoke 8 hours too soon. I left it running overnight and morning I see a bunch of softlockups - so NO NOT FIXED. Parag BUG: soft lockup - CPU#1 stuck for 13s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-rc5 #21) EIP: 0060:[] EFLAGS: 0206 CPU: 1 EIP is at acpi_idle_enter_simple+0x166/0x1d0 EAX: f7829f88 EBX: 0dab ECX: 0266 EDX: ESI: EDI: 00c056e5 EBP: 00c0493a ESP: f7829f88 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: 080f6c78 CR3: 00718000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Dec 16, 2007 12:15 AM, Parag Warudkar [EMAIL PROTECTED] wrote: On Sat, 15 Dec 2007, Parag Warudkar wrote: I will run it for a little longer just to be sure - but I don't think it will be a problem. No problems for last 10 hours - I consider this fixed. Arghh - spoke 8 hours too soon. I left it running overnight and morning I see a bunch of softlockups - so NO NOT FIXED. Parag BUG: soft lockup - CPU#1 stuck for 13s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-rc5 #21) EIP: 0060:[c0533ca6] EFLAGS: 0206 CPU: 1 EIP is at acpi_idle_enter_simple+0x166/0x1d0 EAX: f7829f88 EBX: 0dab ECX: 0266 EDX: ESI: EDI: 00c056e5 EBP: 00c0493a ESP: f7829f88 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: 080f6c78 CR3: 00718000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Sat, 15 Dec 2007, Parag Warudkar wrote: I will run it for a little longer just to be sure - but I don't think it will be a problem. No problems for last 10 hours - I consider this fixed. Parag -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Sat, 15 Dec 2007, Thomas Gleixner wrote: I have a patch staged for Linus, which fixes a thinko in the broadcast code. It might be related to your problem. Can you give it a try ? Yep - this looks promising. No soft lockups for last half an hour with 4-5 Wakeups from idle. It is almost smooth - jerkiness gone down by 99%. There are still very infrequent and quick spells where the key typed on the keyboard takes effect a little later - but nothing like earlier. I will run it for a little longer just to be sure - but I don't think it will be a problem. Thanks Thomas. Parag -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Fri, 14 Dec 2007, Parag Warudkar wrote: > On Dec 14, 2007 6:17 PM, Len Brown <[EMAIL PROTECTED]> wrote: > > does processor.max_cstate=1 make the failing configuration work? > > If yes, how about processor.max_cstate=2? > > Until now 2 things were necessary to reproduce the problem - > 1) CPU_IDLE=y and > 2) Wakeups from Idle = 5-7 Per second (== Longer/deeper C state residency) > > If I left the wakeups to high number (50-60) - there were no lockups > but it was very jerky over ssh. > (Typing keys had no effect for seconds etc.) > > CPU_IDLE=y > > With max_cstate=1 and CPU_IDLE=y things are pretty smooth - no lockups > for the last hour. (Soft lockups used to appear in minutes > previously.) > > With max_cstate=2 - old story repeats - it's very jerky and soft > lockups appear in under a minute after going to 3-5 wakeups from idle > per sec. I have a patch staged for Linus, which fixes a thinko in the broadcast code. It might be related to your problem. Can you give it a try ? Thanks, tglx --> Subject: clockevents: fix reprogramming decision in oneshot broadcast From: Thomas Gleixner <[EMAIL PROTECTED]> A previous version of the code did the reprogramming of the broadcast device in the return from idle code. This was removed, but the logic in tick_handle_oneshot_broadcast() was kept the same. When a broadcast interrupt happens we signal the expiry to all CPUs which have an expired event. If none of the CPUs has an expired event, which can happen in dyntick mode, then we reprogram the broadcast device. We do not reprogram otherwise, but this is only correct if all CPUs, which are in the idle broadcast state have been woken up. The code ignores, that there might be pending not yet expired events on other CPUs, which are in the idle broadcast state. So the delivery of those events can be delayed for quite a time. Change the tick_handle_oneshot_broadcast() function to check for CPUs, which are in broadcast state and are not woken up by the current event, and enforce the rearming of the broadcast device for those CPUs. Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- kernel/time/tick-broadcast.c | 56 --- 1 file changed, 21 insertions(+), 35 deletions(-) Index: linux-x86.q/kernel/time/tick-broadcast.c === --- linux-x86.q.orig/kernel/time/tick-broadcast.c +++ linux-x86.q/kernel/time/tick-broadcast.c @@ -384,45 +384,19 @@ int tick_resume_broadcast_oneshot(struct } /* - * Reprogram the broadcast device: - * - * Called with tick_broadcast_lock held and interrupts disabled. - */ -static int tick_broadcast_reprogram(void) -{ - ktime_t expires = { .tv64 = KTIME_MAX }; - struct tick_device *td; - int cpu; - - /* -* Find the event which expires next: -*/ - for (cpu = first_cpu(tick_broadcast_oneshot_mask); cpu != NR_CPUS; -cpu = next_cpu(cpu, tick_broadcast_oneshot_mask)) { - td = _cpu(tick_cpu_device, cpu); - if (td->evtdev->next_event.tv64 < expires.tv64) - expires = td->evtdev->next_event; - } - - if (expires.tv64 == KTIME_MAX) - return 0; - - return tick_broadcast_set_event(expires, 0); -} - -/* * Handle oneshot mode broadcasting */ static void tick_handle_oneshot_broadcast(struct clock_event_device *dev) { struct tick_device *td; cpumask_t mask; - ktime_t now; + ktime_t now, next_event; int cpu; spin_lock(_broadcast_lock); again: dev->next_event.tv64 = KTIME_MAX; + next_event.tv64 = KTIME_MAX; mask = CPU_MASK_NONE; now = ktime_get(); /* Find all expired events */ @@ -431,19 +405,31 @@ again: td = _cpu(tick_cpu_device, cpu); if (td->evtdev->next_event.tv64 <= now.tv64) cpu_set(cpu, mask); + else if (td->evtdev->next_event.tv64 < next_event.tv64) + next_event.tv64 = td->evtdev->next_event.tv64; } /* -* Wakeup the cpus which have an expired event. The broadcast -* device is reprogrammed in the return from idle code. +* Wakeup the cpus which have an expired event. +*/ + tick_do_broadcast(mask); + + /* +* Two reasons for reprogram: +* +* - The global event did not expire any CPU local +* events. This happens in dyntick mode, as the maximum PIT +* delta is quite small. +* +* - There are pending events on sleeping CPUs which were not +* in the event mask */ - if (!tick_do_broadcast(mask)) { + if (next_event.tv64 != KTIME_MAX) { /* -* The global event did not expire any CPU local -* events. This happens in dyntick mode, as
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Fri, 14 Dec 2007, Parag Warudkar wrote: On Dec 14, 2007 6:17 PM, Len Brown [EMAIL PROTECTED] wrote: does processor.max_cstate=1 make the failing configuration work? If yes, how about processor.max_cstate=2? Until now 2 things were necessary to reproduce the problem - 1) CPU_IDLE=y and 2) Wakeups from Idle = 5-7 Per second (== Longer/deeper C state residency) If I left the wakeups to high number (50-60) - there were no lockups but it was very jerky over ssh. (Typing keys had no effect for seconds etc.) CPU_IDLE=y With max_cstate=1 and CPU_IDLE=y things are pretty smooth - no lockups for the last hour. (Soft lockups used to appear in minutes previously.) With max_cstate=2 - old story repeats - it's very jerky and soft lockups appear in under a minute after going to 3-5 wakeups from idle per sec. I have a patch staged for Linus, which fixes a thinko in the broadcast code. It might be related to your problem. Can you give it a try ? Thanks, tglx -- Subject: clockevents: fix reprogramming decision in oneshot broadcast From: Thomas Gleixner [EMAIL PROTECTED] A previous version of the code did the reprogramming of the broadcast device in the return from idle code. This was removed, but the logic in tick_handle_oneshot_broadcast() was kept the same. When a broadcast interrupt happens we signal the expiry to all CPUs which have an expired event. If none of the CPUs has an expired event, which can happen in dyntick mode, then we reprogram the broadcast device. We do not reprogram otherwise, but this is only correct if all CPUs, which are in the idle broadcast state have been woken up. The code ignores, that there might be pending not yet expired events on other CPUs, which are in the idle broadcast state. So the delivery of those events can be delayed for quite a time. Change the tick_handle_oneshot_broadcast() function to check for CPUs, which are in broadcast state and are not woken up by the current event, and enforce the rearming of the broadcast device for those CPUs. Signed-off-by: Thomas Gleixner [EMAIL PROTECTED] Signed-off-by: Ingo Molnar [EMAIL PROTECTED] --- kernel/time/tick-broadcast.c | 56 --- 1 file changed, 21 insertions(+), 35 deletions(-) Index: linux-x86.q/kernel/time/tick-broadcast.c === --- linux-x86.q.orig/kernel/time/tick-broadcast.c +++ linux-x86.q/kernel/time/tick-broadcast.c @@ -384,45 +384,19 @@ int tick_resume_broadcast_oneshot(struct } /* - * Reprogram the broadcast device: - * - * Called with tick_broadcast_lock held and interrupts disabled. - */ -static int tick_broadcast_reprogram(void) -{ - ktime_t expires = { .tv64 = KTIME_MAX }; - struct tick_device *td; - int cpu; - - /* -* Find the event which expires next: -*/ - for (cpu = first_cpu(tick_broadcast_oneshot_mask); cpu != NR_CPUS; -cpu = next_cpu(cpu, tick_broadcast_oneshot_mask)) { - td = per_cpu(tick_cpu_device, cpu); - if (td-evtdev-next_event.tv64 expires.tv64) - expires = td-evtdev-next_event; - } - - if (expires.tv64 == KTIME_MAX) - return 0; - - return tick_broadcast_set_event(expires, 0); -} - -/* * Handle oneshot mode broadcasting */ static void tick_handle_oneshot_broadcast(struct clock_event_device *dev) { struct tick_device *td; cpumask_t mask; - ktime_t now; + ktime_t now, next_event; int cpu; spin_lock(tick_broadcast_lock); again: dev-next_event.tv64 = KTIME_MAX; + next_event.tv64 = KTIME_MAX; mask = CPU_MASK_NONE; now = ktime_get(); /* Find all expired events */ @@ -431,19 +405,31 @@ again: td = per_cpu(tick_cpu_device, cpu); if (td-evtdev-next_event.tv64 = now.tv64) cpu_set(cpu, mask); + else if (td-evtdev-next_event.tv64 next_event.tv64) + next_event.tv64 = td-evtdev-next_event.tv64; } /* -* Wakeup the cpus which have an expired event. The broadcast -* device is reprogrammed in the return from idle code. +* Wakeup the cpus which have an expired event. +*/ + tick_do_broadcast(mask); + + /* +* Two reasons for reprogram: +* +* - The global event did not expire any CPU local +* events. This happens in dyntick mode, as the maximum PIT +* delta is quite small. +* +* - There are pending events on sleeping CPUs which were not +* in the event mask */ - if (!tick_do_broadcast(mask)) { + if (next_event.tv64 != KTIME_MAX) { /* -* The global event did not expire any CPU local -* events. This happens in dyntick mode, as the -* maximum PIT
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Sat, 15 Dec 2007, Thomas Gleixner wrote: I have a patch staged for Linus, which fixes a thinko in the broadcast code. It might be related to your problem. Can you give it a try ? Yep - this looks promising. No soft lockups for last half an hour with 4-5 Wakeups from idle. It is almost smooth - jerkiness gone down by 99%. There are still very infrequent and quick spells where the key typed on the keyboard takes effect a little later - but nothing like earlier. I will run it for a little longer just to be sure - but I don't think it will be a problem. Thanks Thomas. Parag -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Sat, 15 Dec 2007, Parag Warudkar wrote: I will run it for a little longer just to be sure - but I don't think it will be a problem. No problems for last 10 hours - I consider this fixed. Parag -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Dec 14, 2007 6:17 PM, Len Brown <[EMAIL PROTECTED]> wrote: > does processor.max_cstate=1 make the failing configuration work? > If yes, how about processor.max_cstate=2? Until now 2 things were necessary to reproduce the problem - 1) CPU_IDLE=y and 2) Wakeups from Idle = 5-7 Per second (== Longer/deeper C state residency) If I left the wakeups to high number (50-60) - there were no lockups but it was very jerky over ssh. (Typing keys had no effect for seconds etc.) CPU_IDLE=y With max_cstate=1 and CPU_IDLE=y things are pretty smooth - no lockups for the last hour. (Soft lockups used to appear in minutes previously.) With max_cstate=2 - old story repeats - it's very jerky and soft lockups appear in under a minute after going to 3-5 wakeups from idle per sec. > > what do you see in /proc/acpi/processor/*/power? > Normally (without any max_cstate= parameter) I see this - (I admit I never thought there was a C8 - C3 was all I ever heard.) [EMAIL PROTECTED] router]$ cat /proc/acpi/processor/CPU0/power active state:C2 max_cstate: C8 bus master activity: maximum allowed latency: 8000 usec states: C1: type[C1] promotion[C2] demotion[--] latency[001] usage[1010] duration[] *C2: type[C2] promotion[--] demotion[C1] latency[001] usage[07181700] duration[379397304978] Thanks Parag -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
does processor.max_cstate=1 make the failing configuration work? If yes, how about processor.max_cstate=2? what do you see in /proc/acpi/processor/*/power? -Len -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
does processor.max_cstate=1 make the failing configuration work? If yes, how about processor.max_cstate=2? what do you see in /proc/acpi/processor/*/power? -Len -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Dec 14, 2007 6:17 PM, Len Brown [EMAIL PROTECTED] wrote: does processor.max_cstate=1 make the failing configuration work? If yes, how about processor.max_cstate=2? Until now 2 things were necessary to reproduce the problem - 1) CPU_IDLE=y and 2) Wakeups from Idle = 5-7 Per second (== Longer/deeper C state residency) If I left the wakeups to high number (50-60) - there were no lockups but it was very jerky over ssh. (Typing keys had no effect for seconds etc.) CPU_IDLE=y With max_cstate=1 and CPU_IDLE=y things are pretty smooth - no lockups for the last hour. (Soft lockups used to appear in minutes previously.) With max_cstate=2 - old story repeats - it's very jerky and soft lockups appear in under a minute after going to 3-5 wakeups from idle per sec. what do you see in /proc/acpi/processor/*/power? Normally (without any max_cstate= parameter) I see this - (I admit I never thought there was a C8 - C3 was all I ever heard.) [EMAIL PROTECTED] router]$ cat /proc/acpi/processor/CPU0/power active state:C2 max_cstate: C8 bus master activity: maximum allowed latency: 8000 usec states: C1: type[C1] promotion[C2] demotion[--] latency[001] usage[1010] duration[] *C2: type[C2] promotion[--] demotion[C1] latency[001] usage[07181700] duration[379397304978] Thanks Parag -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Sun, 9 Dec 2007, Parag Warudkar wrote: > On Dec 8, 2007 6:12 PM, Parag Warudkar <[EMAIL PROTECTED]> wrote: > > No problems after disabling CONFIG_HIGHRES_TIMERS , CONFIG_CPU_IDLE > > and CONFIG_NO_HZ. > > > > I will try enabling them one by one - HRT, NOHZ and CPU_IDLE last - > > that way we can at least tell what is required to be hit with this > > problem. > > Looks like CPU_IDLE=y is necessary for the problem to show up. > With CPU_IDLE=n HRT+NO_HZ+TICK_ONESHOT does not give soft lockup problems. > (Actually with HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=y I do see short > freezes on ssh - when I cannot type anything for may be a second even > under 100% idle. But Soft Lock up doesnt show up in dmesg with this > configuration.) Can you please apply the patch below ? It prints out the internal state of the clockevents/timer system when the softlockup is detected. Thanks, tglx diff --git a/kernel/softlockup.c b/kernel/softlockup.c index 11df812..82f1a05 100644 --- a/kernel/softlockup.c +++ b/kernel/softlockup.c @@ -118,6 +118,7 @@ void softlockup_tick(void) show_regs(regs); else dump_stack(); + sysrq_timer_list_show(); spin_unlock(_lock); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Sun, 9 Dec 2007 16:57:38 -0500 "Parag Warudkar" <[EMAIL PROTECTED]> wrote: > On Dec 8, 2007 6:12 PM, Parag Warudkar <[EMAIL PROTECTED]> > wrote: > > No problems after disabling CONFIG_HIGHRES_TIMERS , CONFIG_CPU_IDLE > > and CONFIG_NO_HZ. > > > > I will try enabling them one by one - HRT, NOHZ and CPU_IDLE last - > > that way we can at least tell what is required to be hit with this > > problem. > > Looks like CPU_IDLE=y is necessary for the problem to show up. > With CPU_IDLE=n HRT+NO_HZ+TICK_ONESHOT does not give soft lockup > problems. (Actually with HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=y I do > see short freezes on ssh - when I cannot type anything for may be a > second even under 100% idle. But Soft Lock up doesnt show up in dmesg > with this configuration.) starts to sound like something is turning off the clock during a somewhat deeper C-state... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Dec 8, 2007 6:12 PM, Parag Warudkar <[EMAIL PROTECTED]> wrote: > No problems after disabling CONFIG_HIGHRES_TIMERS , CONFIG_CPU_IDLE > and CONFIG_NO_HZ. > > I will try enabling them one by one - HRT, NOHZ and CPU_IDLE last - > that way we can at least tell what is required to be hit with this > problem. Looks like CPU_IDLE=y is necessary for the problem to show up. With CPU_IDLE=n HRT+NO_HZ+TICK_ONESHOT does not give soft lockup problems. (Actually with HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=y I do see short freezes on ssh - when I cannot type anything for may be a second even under 100% idle. But Soft Lock up doesnt show up in dmesg with this configuration.) No Soft Lockup - CPU_IDLE=n 5+ Hours uptime --- HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=n HIGH_RES_TIMERS=y NO_HZ=TICK_ONESHOT=n HIGH_RES_TIMERS=NO_HZ=y TICK_ONESHOT=n HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=y - short freezes. Soft Lockup - CPU_IDLE=y 2 hours uptime -- HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=y dmesg - BUG: soft lockup - CPU#1 stuck for 11s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #13) EIP: 0060:[] EFLAGS: 0202 CPU: 1 EIP is at _spin_lock_irqsave+0x16/0x27 EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293 ESI: 0005 EDI: f7873808 EBP: ESP: f7829f10 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: 800f5958 CR3: 37284000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 [] tick_broadcast_oneshot_control+0x10/0xda [] tick_notify+0x1d4/0x2eb [] get_next_timer_interrupt+0x143/0x1b4 [] notifier_call_chain+0x2a/0x47 [] raw_notifier_call_chain+0x17/0x1a [] clockevents_notify+0x19/0x4f [] acpi_idle_enter_simple+0x183/0x1d0 [] cpuidle_idle_call+0x53/0x78 [] cpuidle_idle_call+0x0/0x78 [] cpu_idle+0x97/0xb8 === Parag -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Dec 8, 2007 6:12 PM, Parag Warudkar [EMAIL PROTECTED] wrote: No problems after disabling CONFIG_HIGHRES_TIMERS , CONFIG_CPU_IDLE and CONFIG_NO_HZ. I will try enabling them one by one - HRT, NOHZ and CPU_IDLE last - that way we can at least tell what is required to be hit with this problem. Looks like CPU_IDLE=y is necessary for the problem to show up. With CPU_IDLE=n HRT+NO_HZ+TICK_ONESHOT does not give soft lockup problems. (Actually with HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=y I do see short freezes on ssh - when I cannot type anything for may be a second even under 100% idle. But Soft Lock up doesnt show up in dmesg with this configuration.) No Soft Lockup - CPU_IDLE=n 5+ Hours uptime --- HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=n HIGH_RES_TIMERS=y NO_HZ=TICK_ONESHOT=n HIGH_RES_TIMERS=NO_HZ=y TICK_ONESHOT=n HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=y - short freezes. Soft Lockup - CPU_IDLE=y 2 hours uptime -- HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=y dmesg - BUG: soft lockup - CPU#1 stuck for 11s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #13) EIP: 0060:[c0603eaa] EFLAGS: 0202 CPU: 1 EIP is at _spin_lock_irqsave+0x16/0x27 EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293 ESI: 0005 EDI: f7873808 EBP: ESP: f7829f10 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: 800f5958 CR3: 37284000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 [c0438293] tick_broadcast_oneshot_control+0x10/0xda [c0437ce2] tick_notify+0x1d4/0x2eb [c04281bc] get_next_timer_interrupt+0x143/0x1b4 [c06058a1] notifier_call_chain+0x2a/0x47 [c04345c0] raw_notifier_call_chain+0x17/0x1a [c043781e] clockevents_notify+0x19/0x4f [c0533d23] acpi_idle_enter_simple+0x183/0x1d0 [c058cf03] cpuidle_idle_call+0x53/0x78 [c058ceb0] cpuidle_idle_call+0x0/0x78 [c0402575] cpu_idle+0x97/0xb8 === Parag -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Sun, 9 Dec 2007 16:57:38 -0500 Parag Warudkar [EMAIL PROTECTED] wrote: On Dec 8, 2007 6:12 PM, Parag Warudkar [EMAIL PROTECTED] wrote: No problems after disabling CONFIG_HIGHRES_TIMERS , CONFIG_CPU_IDLE and CONFIG_NO_HZ. I will try enabling them one by one - HRT, NOHZ and CPU_IDLE last - that way we can at least tell what is required to be hit with this problem. Looks like CPU_IDLE=y is necessary for the problem to show up. With CPU_IDLE=n HRT+NO_HZ+TICK_ONESHOT does not give soft lockup problems. (Actually with HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=y I do see short freezes on ssh - when I cannot type anything for may be a second even under 100% idle. But Soft Lock up doesnt show up in dmesg with this configuration.) starts to sound like something is turning off the clock during a somewhat deeper C-state... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Sun, 9 Dec 2007, Parag Warudkar wrote: On Dec 8, 2007 6:12 PM, Parag Warudkar [EMAIL PROTECTED] wrote: No problems after disabling CONFIG_HIGHRES_TIMERS , CONFIG_CPU_IDLE and CONFIG_NO_HZ. I will try enabling them one by one - HRT, NOHZ and CPU_IDLE last - that way we can at least tell what is required to be hit with this problem. Looks like CPU_IDLE=y is necessary for the problem to show up. With CPU_IDLE=n HRT+NO_HZ+TICK_ONESHOT does not give soft lockup problems. (Actually with HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=y I do see short freezes on ssh - when I cannot type anything for may be a second even under 100% idle. But Soft Lock up doesnt show up in dmesg with this configuration.) Can you please apply the patch below ? It prints out the internal state of the clockevents/timer system when the softlockup is detected. Thanks, tglx diff --git a/kernel/softlockup.c b/kernel/softlockup.c index 11df812..82f1a05 100644 --- a/kernel/softlockup.c +++ b/kernel/softlockup.c @@ -118,6 +118,7 @@ void softlockup_tick(void) show_regs(regs); else dump_stack(); + sysrq_timer_list_show(); spin_unlock(print_lock); } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
No problems after disabling CONFIG_HIGHRES_TIMERS , CONFIG_CPU_IDLE and CONFIG_NO_HZ. I will try enabling them one by one - HRT, NOHZ and CPU_IDLE last - that way we can at least tell what is required to be hit with this problem. Parag -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Dec 8, 2007 3:51 PM, Arjan van de Ven <[EMAIL PROTECTED]> wrote: > what chipset is this? > (I wonder if there's one where we shouldn't be force-enabling the hpet) It's an Intel Mac Mini - Intel 945GM chipset. I believe OSX requires HPET and so there shouldn't be a need to force enable it on this chipset? Also I don't have problems under Centos 5.1 kernel (2.6.18-53.1.4.el5) and hpet is enabled there. Parag -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Sat, 8 Dec 2007 15:46:54 -0500 "Parag Warudkar" <[EMAIL PROTECTED]> wrote: > On Dec 8, 2007 3:11 PM, Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > > * Parag Warudkar <[EMAIL PROTECTED]> wrote: > > > > > But there are still fluctuations under 100% idle - > > > > do you have CONFIG_HIGH_RES_TIMERS=y? > > Yes - NO_HZ=y and HIGH_RES_TIMERS=y. > My ssh connection still died with hpet=disable although this time I > did not see soft lockup message. > Also it randomly gets stuck - suddenly there is a freeze and I can't > type for seconds. > > > these fluctuations would still be OK if they are due to HZ > > granularity. > > > > So are the hpet ones ok too or do they seem off even with HRTIMERS=y? > Just trying to figure out if I should be testing further with hpet > disabled or enabling it is ok. > what chipset is this? (I wonder if there's one where we shouldn't be force-enabling the hpet) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Dec 8, 2007 3:11 PM, Ingo Molnar <[EMAIL PROTECTED]> wrote: > > * Parag Warudkar <[EMAIL PROTECTED]> wrote: > > > But there are still fluctuations under 100% idle - > > do you have CONFIG_HIGH_RES_TIMERS=y? Yes - NO_HZ=y and HIGH_RES_TIMERS=y. My ssh connection still died with hpet=disable although this time I did not see soft lockup message. Also it randomly gets stuck - suddenly there is a freeze and I can't type for seconds. > these fluctuations would still be OK if they are due to HZ granularity. > So are the hpet ones ok too or do they seem off even with HRTIMERS=y? Just trying to figure out if I should be testing further with hpet disabled or enabling it is ok. Parag -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
* Parag Warudkar <[EMAIL PROTECTED]> wrote: > But there are still fluctuations under 100% idle - do you have CONFIG_HIGH_RES_TIMERS=y? > IDLE > real0m1.112s > real0m1.131s > real0m1.112s > real0m1.112s > real0m1.139s these fluctuations would still be OK if they are due to HZ granularity. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Dec 8, 2007 2:42 PM, Ingo Molnar <[EMAIL PROTECTED]> wrote: > > * Parag Warudkar <[EMAIL PROTECTED]> wrote: > > > Even on 100% idle I get variations that are approx in the same range > > when not idle. Clocksource is hpet if that matters. Next I think I > > will disable CPU_IDLE, Tickless > > also try the hpet=disable boot option. With hpet=disable (clocksource is acpi_pm ) the usleeps are somewhat more closer to correct - under idle as well as under kernel compilation load. But there are still fluctuations under 100% idle - Kernel compile -- real0m1.112s user0m0.004s sys 0m0.000s real0m1.112s user0m0.000s sys 0m0.000s real0m1.115s user0m0.000s sys 0m0.000s real0m1.112s user0m0.000s sys 0m0.000s real0m1.112s user0m0.000s sys 0m0.000s -- IDLE real0m1.112s user0m0.000s sys 0m0.000s real0m1.131s user0m0.000s sys 0m0.000s real0m1.112s user0m0.000s sys 0m0.000s real0m1.112s user0m0.000s sys 0m0.000s real0m1.139s user0m0.000s sys 0m0.000s > > > My ssh connection just died - another lockup in dmesg - > > > > BUG: soft lockup - CPU#1 stuck for 14s! [sshd:509] > > this would suggest that the softlockup watchdog is correct and indeed > there was some lockup of sorts? Yep, I was about to say that there are indeed real soft-lockups going on as opposed to the psuedo ones that we thought may be due to timer inconsistencies. BTW this is no-preempt config and 2.6.22 and latest Centos-5.1 kernels ran fine with HPET. Parag -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
* Parag Warudkar <[EMAIL PROTECTED]> wrote: > Even on 100% idle I get variations that are approx in the same range > when not idle. Clocksource is hpet if that matters. Next I think I > will disable CPU_IDLE, Tickless also try the hpet=disable boot option. > My ssh connection just died - another lockup in dmesg - > > BUG: soft lockup - CPU#1 stuck for 14s! [sshd:509] this would suggest that the softlockup watchdog is correct and indeed there was some lockup of sorts? > $> while :; do time usleep 111; done > real0m1.112s > real0m1.385s > real0m1.112s > real0m1.112s > real0m1.279s this, if done on an idle system, looks anomalous to me. Does hpet=disable help? Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Dec 8, 2007 2:13 PM, Ingo Molnar <[EMAIL PROTECTED]> wrote: > > * Parag Warudkar <[EMAIL PROTECTED]> wrote: > > > >while :; do time usleep 111; done > > > > > > or do these sleeps fluctuate? > > > > They seem to fluctuate - not sure if that's supposed to be exact or if > > below variations are normal - This is when my compiles are running - > > it's normal for them to fluctuate a little when load goes up and the > task gets delayed. > > 'bad' fluctuations are ones that happen when the system is idle. > Even on 100% idle I get variations that are approx in the same range when not idle. Clocksource is hpet if that matters. Next I think I will disable CPU_IDLE, Tickless My ssh connection just died - another lockup in dmesg - BUG: soft lockup - CPU#1 stuck for 14s! [sshd:509] Pid: 509, comm: sshd Not tainted (2.6.24-rc4 #6) EIP: 0060:[] EFLAGS: 00200206 CPU: 1 EIP is at do_page_fault+0x117/0x583 EAX: ef5aaa80 EBX: 0048cfc0 ECX: 007b EDX: 0003 ESI: ef691fb8 EDI: b7ca1ab4 EBP: ef47f050 ESP: ef691f70 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: b7ca1ab4 CR3: 2f555000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 [] do_page_fault+0x0/0x583 [] error_code+0x72/0x78 [] netlbl_domhsh_walk+0x94/0xba $> while :; do time usleep 111; done real0m1.112s user0m0.000s sys 0m0.000s real0m1.385s user0m0.000s sys 0m0.004s real0m1.112s user0m0.000s sys 0m0.000s real0m1.112s user0m0.000s sys 0m0.000s real0m1.279s user0m0.000s sys 0m0.000s -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
* Parag Warudkar <[EMAIL PROTECTED]> wrote: > >while :; do time usleep 111; done > > > > or do these sleeps fluctuate? > > They seem to fluctuate - not sure if that's supposed to be exact or if > below variations are normal - This is when my compiles are running - it's normal for them to fluctuate a little when load goes up and the task gets delayed. 'bad' fluctuations are ones that happen when the system is idle. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Dec 8, 2007 10:47 AM, Ingo Molnar <[EMAIL PROTECTED]> wrote: > > does the patch below help? But the root cause is likely some timer > problems - do you get consistent results from: > Haven't yet tried the patch - will try a little later. >while :; do time usleep 111; done > > or do these sleeps fluctuate? > They seem to fluctuate - not sure if that's supposed to be exact or if below variations are normal - This is when my compiles are running - [EMAIL PROTECTED] ~]$ while :; do time usleep 111; done real0m1.116s user0m0.000s sys 0m0.000s real0m1.112s user0m0.000s sys 0m0.000s real0m1.130s user0m0.000s sys 0m0.004s real0m1.144s user0m0.000s sys 0m0.000s Thanks Parag -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
* Parag Warudkar <[EMAIL PROTECTED]> wrote: > [] tick_broadcast_oneshot_control+0x10/0xda > [] tick_notify+0x1d4/0x2eb > [] get_next_timer_interrupt+0x143/0x1b4 > [] notifier_call_chain+0x2a/0x47 > [] raw_notifier_call_chain+0x17/0x1a > [] clockevents_notify+0x19/0x4f > [] acpi_idle_enter_simple+0x183/0x1d0 > [] cpuidle_idle_call+0x53/0x78 > [] cpuidle_idle_call+0x0/0x78 > [] cpu_idle+0x97/0xb8 > === > BUG: soft lockup - CPU#1 stuck for 11s! [vim:3736] does the patch below help? But the root cause is likely some timer problems - do you get consistent results from: while :; do time usleep 111; done or do these sleeps fluctuate? Ingo Index: linux/kernel/sched.c === --- linux.orig/kernel/sched.c +++ linux/kernel/sched.c @@ -774,6 +774,7 @@ void sched_clock_idle_wakeup_event(u64 d struct rq *rq = cpu_rq(smp_processor_id()); u64 now = sched_clock(); + touch_softlockup_watchdog(); rq->idle_clock += delta_ns; /* * Override the previous timestamp and ignore all -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Dec 8, 2007 10:10 AM, Parag Warudkar <[EMAIL PROTECTED]> wrote: > On Dec 7, 2007 9:56 PM, Thomas Gleixner <[EMAIL PROTECTED]> wrote: > > > > This looks pretty much like the problem I was solving yesterday. > > > > Parag, can you please try Linus latest and please check whether there > > is a stack trace with clockevents_program_event on the top in your > > dmesg. > > > > Just booted with latest git and there is no clockevents_program_event > in dmesg anywhere. > > Any way I have enabled CPU_IDLE again and the machine's got to do lot > of compiles today. > I'll see if the problem reproduces. > Whoa whoa whoa - this happens left and right with latest Linus +CPU_IDLE. As soon as I start compiling, dmesg starts filling up with - Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #5) EIP: 0060:[] EFLAGS: 0202 CPU: 1 EIP is at acpi_idle_enter_simple+0x166/0x1d0 EAX: f7829f88 EBX: 0da1 ECX: 0266 EDX: ESI: EDI: 00042070 EBP: 000412cf ESP: f7829f88 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: 080ce178 CR3: 00715000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 [] cpuidle_idle_call+0x53/0x78 [] cpuidle_idle_call+0x0/0x78 [] cpu_idle+0x97/0xb8 === BUG: soft lockup - CPU#1 stuck for 11s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #5) EIP: 0060:[] EFLAGS: 0202 CPU: 1 EIP is at _spin_lock_irqsave+0x16/0x27 EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293 ESI: 0005 EDI: f7873808 EBP: ESP: f7829f10 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: 080ff4b0 CR3: 37cf5000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 [] tick_broadcast_oneshot_control+0x10/0xda [] tick_notify+0x1d4/0x2eb [] get_next_timer_interrupt+0x143/0x1b4 [] notifier_call_chain+0x2a/0x47 [] raw_notifier_call_chain+0x17/0x1a [] clockevents_notify+0x19/0x4f [] acpi_idle_enter_simple+0x183/0x1d0 [] cpuidle_idle_call+0x53/0x78 [] cpuidle_idle_call+0x0/0x78 [] cpu_idle+0x97/0xb8 === BUG: soft lockup - CPU#1 stuck for 11s! [vim:3736] Pid: 3736, comm: vim Not tainted (2.6.24-rc4 #5) EIP: 0060:[] EFLAGS: 0206 CPU: 1 EIP is at do_page_fault+0x117/0x583 EAX: f6ec1380 EBX: 005d0ff4 ECX: 007b EDX: 0003 ESI: f6dd1fb8 EDI: 004c2000 EBP: f149eae0 ESP: f6dd1f70 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 CR0: 8005003b CR2: 004c2000 CR3: 3736a000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 [] vfs_read+0x105/0x115 [] do_page_fault+0x0/0x583 [] error_code+0x72/0x78 [] netlbl_domhsh_walk+0x94/0xba === BUG: soft lockup - CPU#1 stuck for 11s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #5) EIP: 0060:[] EFLAGS: 0202 CPU: 1 EIP is at acpi_idle_enter_simple+0x166/0x1d0 EAX: f7829f88 EBX: 000a ECX: 0266 EDX: ESI: EDI: 008e8c88 EBP: 008e8c7e ESP: f7829f88 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: 005da044 CR3: 37cf5000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 [] cpuidle_idle_call+0x53/0x78 [] cpuidle_idle_call+0x0/0x78 [] cpu_idle+0x97/0xb8 === BUG: soft lockup - CPU#1 stuck for 22s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #5) EIP: 0060:[] EFLAGS: 0206 CPU: 1 EIP is at acpi_idle_enter_simple+0x166/0x1d0 EAX: f7829f88 EBX: 0d9b ECX: 0266 EDX: ESI: EDI: 007d1802 EBP: 007d0a67 ESP: f7829f88 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: 005da044 CR3: 37cf5000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 [] cpuidle_idle_call+0x53/0x78 [] cpuidle_idle_call+0x0/0x78 [] cpu_idle+0x97/0xb8 === BUG: soft lockup - CPU#1 stuck for 18s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #5) EIP: 0060:[] EFLAGS: 0206 CPU: 1 EIP is at acpi_idle_enter_simple+0x166/0x1d0 EAX: f7829f88 EBX: 0da2 ECX: 0266 EDX: ESI: EDI: 00b651f2 EBP: 00b64450 ESP: f7829f88 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: 005da044 CR3: 37cf5000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 [] cpuidle_idle_call+0x53/0x78 [] cpuidle_idle_call+0x0/0x78 [] cpu_idle+0x97/0xb8 === BUG: soft lockup - CPU#1 stuck for 14s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #5) EIP: 0060:[] EFLAGS: 0202 CPU: 1 EIP is at _spin_lock_irqsave+0x16/0x27 EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293 ESI: 0005 EDI: f7873808 EBP: ESP: f7829f10 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: 080ff4b0 CR3: 37cf5000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Dec 7, 2007 9:56 PM, Thomas Gleixner <[EMAIL PROTECTED]> wrote: > > This looks pretty much like the problem I was solving yesterday. > > Parag, can you please try Linus latest and please check whether there > is a stack trace with clockevents_program_event on the top in your > dmesg. > Just booted with latest git and there is no clockevents_program_event in dmesg anywhere. Any way I have enabled CPU_IDLE again and the machine's got to do lot of compiles today. I'll see if the problem reproduces. Thanks Parag -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Dec 8, 2007 10:47 AM, Ingo Molnar [EMAIL PROTECTED] wrote: does the patch below help? But the root cause is likely some timer problems - do you get consistent results from: Haven't yet tried the patch - will try a little later. while :; do time usleep 111; done or do these sleeps fluctuate? They seem to fluctuate - not sure if that's supposed to be exact or if below variations are normal - This is when my compiles are running - [EMAIL PROTECTED] ~]$ while :; do time usleep 111; done real0m1.116s user0m0.000s sys 0m0.000s real0m1.112s user0m0.000s sys 0m0.000s real0m1.130s user0m0.000s sys 0m0.004s real0m1.144s user0m0.000s sys 0m0.000s Thanks Parag -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
* Parag Warudkar [EMAIL PROTECTED] wrote: [c0438293] tick_broadcast_oneshot_control+0x10/0xda [c0437ce2] tick_notify+0x1d4/0x2eb [c04281bc] get_next_timer_interrupt+0x143/0x1b4 [c06058a1] notifier_call_chain+0x2a/0x47 [c04345c0] raw_notifier_call_chain+0x17/0x1a [c043781e] clockevents_notify+0x19/0x4f [c0533d23] acpi_idle_enter_simple+0x183/0x1d0 [c058cf03] cpuidle_idle_call+0x53/0x78 [c058ceb0] cpuidle_idle_call+0x0/0x78 [c0402575] cpu_idle+0x97/0xb8 === BUG: soft lockup - CPU#1 stuck for 11s! [vim:3736] does the patch below help? But the root cause is likely some timer problems - do you get consistent results from: while :; do time usleep 111; done or do these sleeps fluctuate? Ingo Index: linux/kernel/sched.c === --- linux.orig/kernel/sched.c +++ linux/kernel/sched.c @@ -774,6 +774,7 @@ void sched_clock_idle_wakeup_event(u64 d struct rq *rq = cpu_rq(smp_processor_id()); u64 now = sched_clock(); + touch_softlockup_watchdog(); rq-idle_clock += delta_ns; /* * Override the previous timestamp and ignore all -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Dec 7, 2007 9:56 PM, Thomas Gleixner [EMAIL PROTECTED] wrote: This looks pretty much like the problem I was solving yesterday. Parag, can you please try Linus latest and please check whether there is a stack trace with clockevents_program_event on the top in your dmesg. Just booted with latest git and there is no clockevents_program_event in dmesg anywhere. Any way I have enabled CPU_IDLE again and the machine's got to do lot of compiles today. I'll see if the problem reproduces. Thanks Parag -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Dec 8, 2007 10:10 AM, Parag Warudkar [EMAIL PROTECTED] wrote: On Dec 7, 2007 9:56 PM, Thomas Gleixner [EMAIL PROTECTED] wrote: This looks pretty much like the problem I was solving yesterday. Parag, can you please try Linus latest and please check whether there is a stack trace with clockevents_program_event on the top in your dmesg. Just booted with latest git and there is no clockevents_program_event in dmesg anywhere. Any way I have enabled CPU_IDLE again and the machine's got to do lot of compiles today. I'll see if the problem reproduces. Whoa whoa whoa - this happens left and right with latest Linus +CPU_IDLE. As soon as I start compiling, dmesg starts filling up with - Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #5) EIP: 0060:[c0533d06] EFLAGS: 0202 CPU: 1 EIP is at acpi_idle_enter_simple+0x166/0x1d0 EAX: f7829f88 EBX: 0da1 ECX: 0266 EDX: ESI: EDI: 00042070 EBP: 000412cf ESP: f7829f88 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: 080ce178 CR3: 00715000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 [c058cf03] cpuidle_idle_call+0x53/0x78 [c058ceb0] cpuidle_idle_call+0x0/0x78 [c0402575] cpu_idle+0x97/0xb8 === BUG: soft lockup - CPU#1 stuck for 11s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #5) EIP: 0060:[c0603eaa] EFLAGS: 0202 CPU: 1 EIP is at _spin_lock_irqsave+0x16/0x27 EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293 ESI: 0005 EDI: f7873808 EBP: ESP: f7829f10 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: 080ff4b0 CR3: 37cf5000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 [c0438293] tick_broadcast_oneshot_control+0x10/0xda [c0437ce2] tick_notify+0x1d4/0x2eb [c04281bc] get_next_timer_interrupt+0x143/0x1b4 [c06058a1] notifier_call_chain+0x2a/0x47 [c04345c0] raw_notifier_call_chain+0x17/0x1a [c043781e] clockevents_notify+0x19/0x4f [c0533d23] acpi_idle_enter_simple+0x183/0x1d0 [c058cf03] cpuidle_idle_call+0x53/0x78 [c058ceb0] cpuidle_idle_call+0x0/0x78 [c0402575] cpu_idle+0x97/0xb8 === BUG: soft lockup - CPU#1 stuck for 11s! [vim:3736] Pid: 3736, comm: vim Not tainted (2.6.24-rc4 #5) EIP: 0060:[c060540b] EFLAGS: 0206 CPU: 1 EIP is at do_page_fault+0x117/0x583 EAX: f6ec1380 EBX: 005d0ff4 ECX: 007b EDX: 0003 ESI: f6dd1fb8 EDI: 004c2000 EBP: f149eae0 ESP: f6dd1f70 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 CR0: 8005003b CR2: 004c2000 CR3: 3736a000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 [c046d80d] vfs_read+0x105/0x115 [c06052f4] do_page_fault+0x0/0x583 [c06040aa] error_code+0x72/0x78 [c060] netlbl_domhsh_walk+0x94/0xba === BUG: soft lockup - CPU#1 stuck for 11s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #5) EIP: 0060:[c0533d06] EFLAGS: 0202 CPU: 1 EIP is at acpi_idle_enter_simple+0x166/0x1d0 EAX: f7829f88 EBX: 000a ECX: 0266 EDX: ESI: EDI: 008e8c88 EBP: 008e8c7e ESP: f7829f88 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: 005da044 CR3: 37cf5000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 [c058cf03] cpuidle_idle_call+0x53/0x78 [c058ceb0] cpuidle_idle_call+0x0/0x78 [c0402575] cpu_idle+0x97/0xb8 === BUG: soft lockup - CPU#1 stuck for 22s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #5) EIP: 0060:[c0533d06] EFLAGS: 0206 CPU: 1 EIP is at acpi_idle_enter_simple+0x166/0x1d0 EAX: f7829f88 EBX: 0d9b ECX: 0266 EDX: ESI: EDI: 007d1802 EBP: 007d0a67 ESP: f7829f88 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: 005da044 CR3: 37cf5000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 [c058cf03] cpuidle_idle_call+0x53/0x78 [c058ceb0] cpuidle_idle_call+0x0/0x78 [c0402575] cpu_idle+0x97/0xb8 === BUG: soft lockup - CPU#1 stuck for 18s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #5) EIP: 0060:[c0533d06] EFLAGS: 0206 CPU: 1 EIP is at acpi_idle_enter_simple+0x166/0x1d0 EAX: f7829f88 EBX: 0da2 ECX: 0266 EDX: ESI: EDI: 00b651f2 EBP: 00b64450 ESP: f7829f88 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: 005da044 CR3: 37cf5000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 [c058cf03] cpuidle_idle_call+0x53/0x78 [c058ceb0] cpuidle_idle_call+0x0/0x78 [c0402575] cpu_idle+0x97/0xb8 === BUG: soft lockup - CPU#1 stuck for 14s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #5) EIP: 0060:[c0603eaa] EFLAGS: 0202 CPU: 1 EIP is at _spin_lock_irqsave+0x16/0x27 EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293 ESI:
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
* Parag Warudkar [EMAIL PROTECTED] wrote: while :; do time usleep 111; done or do these sleeps fluctuate? They seem to fluctuate - not sure if that's supposed to be exact or if below variations are normal - This is when my compiles are running - it's normal for them to fluctuate a little when load goes up and the task gets delayed. 'bad' fluctuations are ones that happen when the system is idle. Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Dec 8, 2007 2:13 PM, Ingo Molnar [EMAIL PROTECTED] wrote: * Parag Warudkar [EMAIL PROTECTED] wrote: while :; do time usleep 111; done or do these sleeps fluctuate? They seem to fluctuate - not sure if that's supposed to be exact or if below variations are normal - This is when my compiles are running - it's normal for them to fluctuate a little when load goes up and the task gets delayed. 'bad' fluctuations are ones that happen when the system is idle. Even on 100% idle I get variations that are approx in the same range when not idle. Clocksource is hpet if that matters. Next I think I will disable CPU_IDLE, Tickless My ssh connection just died - another lockup in dmesg - BUG: soft lockup - CPU#1 stuck for 14s! [sshd:509] Pid: 509, comm: sshd Not tainted (2.6.24-rc4 #6) EIP: 0060:[c060540b] EFLAGS: 00200206 CPU: 1 EIP is at do_page_fault+0x117/0x583 EAX: ef5aaa80 EBX: 0048cfc0 ECX: 007b EDX: 0003 ESI: ef691fb8 EDI: b7ca1ab4 EBP: ef47f050 ESP: ef691f70 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: b7ca1ab4 CR3: 2f555000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 [c06052f4] do_page_fault+0x0/0x583 [c06040aa] error_code+0x72/0x78 [c060] netlbl_domhsh_walk+0x94/0xba $ while :; do time usleep 111; done real0m1.112s user0m0.000s sys 0m0.000s real0m1.385s user0m0.000s sys 0m0.004s real0m1.112s user0m0.000s sys 0m0.000s real0m1.112s user0m0.000s sys 0m0.000s real0m1.279s user0m0.000s sys 0m0.000s -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
* Parag Warudkar [EMAIL PROTECTED] wrote: Even on 100% idle I get variations that are approx in the same range when not idle. Clocksource is hpet if that matters. Next I think I will disable CPU_IDLE, Tickless also try the hpet=disable boot option. My ssh connection just died - another lockup in dmesg - BUG: soft lockup - CPU#1 stuck for 14s! [sshd:509] this would suggest that the softlockup watchdog is correct and indeed there was some lockup of sorts? $ while :; do time usleep 111; done real0m1.112s real0m1.385s real0m1.112s real0m1.112s real0m1.279s this, if done on an idle system, looks anomalous to me. Does hpet=disable help? Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Dec 8, 2007 2:42 PM, Ingo Molnar [EMAIL PROTECTED] wrote: * Parag Warudkar [EMAIL PROTECTED] wrote: Even on 100% idle I get variations that are approx in the same range when not idle. Clocksource is hpet if that matters. Next I think I will disable CPU_IDLE, Tickless also try the hpet=disable boot option. With hpet=disable (clocksource is acpi_pm ) the usleeps are somewhat more closer to correct - under idle as well as under kernel compilation load. But there are still fluctuations under 100% idle - Kernel compile -- real0m1.112s user0m0.004s sys 0m0.000s real0m1.112s user0m0.000s sys 0m0.000s real0m1.115s user0m0.000s sys 0m0.000s real0m1.112s user0m0.000s sys 0m0.000s real0m1.112s user0m0.000s sys 0m0.000s -- IDLE real0m1.112s user0m0.000s sys 0m0.000s real0m1.131s user0m0.000s sys 0m0.000s real0m1.112s user0m0.000s sys 0m0.000s real0m1.112s user0m0.000s sys 0m0.000s real0m1.139s user0m0.000s sys 0m0.000s My ssh connection just died - another lockup in dmesg - BUG: soft lockup - CPU#1 stuck for 14s! [sshd:509] this would suggest that the softlockup watchdog is correct and indeed there was some lockup of sorts? Yep, I was about to say that there are indeed real soft-lockups going on as opposed to the psuedo ones that we thought may be due to timer inconsistencies. BTW this is no-preempt config and 2.6.22 and latest Centos-5.1 kernels ran fine with HPET. Parag -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
* Parag Warudkar [EMAIL PROTECTED] wrote: But there are still fluctuations under 100% idle - do you have CONFIG_HIGH_RES_TIMERS=y? IDLE real0m1.112s real0m1.131s real0m1.112s real0m1.112s real0m1.139s these fluctuations would still be OK if they are due to HZ granularity. Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Dec 8, 2007 3:11 PM, Ingo Molnar [EMAIL PROTECTED] wrote: * Parag Warudkar [EMAIL PROTECTED] wrote: But there are still fluctuations under 100% idle - do you have CONFIG_HIGH_RES_TIMERS=y? Yes - NO_HZ=y and HIGH_RES_TIMERS=y. My ssh connection still died with hpet=disable although this time I did not see soft lockup message. Also it randomly gets stuck - suddenly there is a freeze and I can't type for seconds. these fluctuations would still be OK if they are due to HZ granularity. So are the hpet ones ok too or do they seem off even with HRTIMERS=y? Just trying to figure out if I should be testing further with hpet disabled or enabling it is ok. Parag -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Sat, 8 Dec 2007 15:46:54 -0500 Parag Warudkar [EMAIL PROTECTED] wrote: On Dec 8, 2007 3:11 PM, Ingo Molnar [EMAIL PROTECTED] wrote: * Parag Warudkar [EMAIL PROTECTED] wrote: But there are still fluctuations under 100% idle - do you have CONFIG_HIGH_RES_TIMERS=y? Yes - NO_HZ=y and HIGH_RES_TIMERS=y. My ssh connection still died with hpet=disable although this time I did not see soft lockup message. Also it randomly gets stuck - suddenly there is a freeze and I can't type for seconds. these fluctuations would still be OK if they are due to HZ granularity. So are the hpet ones ok too or do they seem off even with HRTIMERS=y? Just trying to figure out if I should be testing further with hpet disabled or enabling it is ok. what chipset is this? (I wonder if there's one where we shouldn't be force-enabling the hpet) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Dec 8, 2007 3:51 PM, Arjan van de Ven [EMAIL PROTECTED] wrote: what chipset is this? (I wonder if there's one where we shouldn't be force-enabling the hpet) It's an Intel Mac Mini - Intel 945GM chipset. I believe OSX requires HPET and so there shouldn't be a need to force enable it on this chipset? Also I don't have problems under Centos 5.1 kernel (2.6.18-53.1.4.el5) and hpet is enabled there. Parag -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
No problems after disabling CONFIG_HIGHRES_TIMERS , CONFIG_CPU_IDLE and CONFIG_NO_HZ. I will try enabling them one by one - HRT, NOHZ and CPU_IDLE last - that way we can at least tell what is required to be hit with this problem. Parag -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Fri, 7 Dec 2007, Pallipadi, Venkatesh wrote: > >-Original Message- > >From: Parag Warudkar [mailto:[EMAIL PROTECTED] > >Sent: Friday, December 07, 2007 2:54 PM > >To: LKML > >Cc: Andrew Morton; Pallipadi, Venkatesh; Linus Torvalds > >Subject: BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0] > > > >Got this on today's git (2.6.24-rc4) while compiling stuff - Looks > >like it is related to CpuIdle stuff. > >I chose CONFIG_CPU_IDLE for the first time so I don't know when this > >was introduced. > > > >This is on x86_32, SMP. > > > >BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0] > > > >Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #3) > >EIP: 0060:[] EFLAGS: 0202 CPU: 1 > >EIP is at _spin_lock_irqsave+0x16/0x27 > >EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293 > >ESI: 0005 EDI: f7873808 EBP: ESP: f7829f10 > > DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 > >CR0: 8005003b CR2: 004f5960 CR3: 372c5000 CR4: 06d0 > >DR0: DR1: DR2: DR3: > >DR6: 0ff0 DR7: 0400 > > [] tick_broadcast_oneshot_control+0x10/0xda > > [] tick_notify+0x1d4/0x2eb > > [] get_next_timer_interrupt+0x143/0x1b4 > > [] notifier_call_chain+0x2a/0x47 > > [] raw_notifier_call_chain+0x17/0x1a > > [] clockevents_notify+0x19/0x4f > > [] acpi_idle_enter_simple+0x183/0x1d0 > > [] cpuidle_idle_call+0x53/0x78 > > [] cpuidle_idle_call+0x0/0x78 > > [] cpu_idle+0x97/0xb8 > > > > Looks like tick_broadcast_lock did not get freed in some path. > You do not see this when you CPU_IDLE is not configured? This looks pretty much like the problem I was solving yesterday. Parag, can you please try Linus latest and please check whether there is a stack trace with clockevents_program_event on the top in your dmesg. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Saturday, 8 of December 2007, Andrew Morton wrote: > On Fri, 7 Dec 2007 17:53:47 -0500 > "Parag Warudkar" <[EMAIL PROTECTED]> wrote: > > > Got this on today's git (2.6.24-rc4) while compiling stuff - Looks > > like it is related to CpuIdle stuff. > > I chose CONFIG_CPU_IDLE for the first time so I don't know when this > > was introduced. > > > > This is on x86_32, SMP. > > > > BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0] > > > > Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #3) > > EIP: 0060:[] EFLAGS: 0202 CPU: 1 > > EIP is at _spin_lock_irqsave+0x16/0x27 > > EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293 > > ESI: 0005 EDI: f7873808 EBP: ESP: f7829f10 > > DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 > > CR0: 8005003b CR2: 004f5960 CR3: 372c5000 CR4: 06d0 > > DR0: DR1: DR2: DR3: > > DR6: 0ff0 DR7: 0400 > > [] tick_broadcast_oneshot_control+0x10/0xda > > [] tick_notify+0x1d4/0x2eb > > [] get_next_timer_interrupt+0x143/0x1b4 > > [] notifier_call_chain+0x2a/0x47 > > [] raw_notifier_call_chain+0x17/0x1a > > [] clockevents_notify+0x19/0x4f > > [] acpi_idle_enter_simple+0x183/0x1d0 > > [] cpuidle_idle_call+0x53/0x78 > > [] cpuidle_idle_call+0x0/0x78 > > [] cpu_idle+0x97/0xb8 > > OK, thanks. Another one for the regression list, please. Added, http://bugzilla.kernel.org/show_bug.cgi?id=9525, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Dec 7, 2007 6:12 PM, Pallipadi, Venkatesh <[EMAIL PROTECTED]> wrote: > Looks like tick_broadcast_lock did not get freed in some path. > You do not see this when you CPU_IDLE is not configured? > > Thanks, > Venki > No, I did not see this prior to enabling CPU_IDLE. All previous kernels without CPU_IDLE also had soft lockup detection enabled. Parag -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Fri, 7 Dec 2007 17:53:47 -0500 "Parag Warudkar" <[EMAIL PROTECTED]> wrote: > Got this on today's git (2.6.24-rc4) while compiling stuff - Looks > like it is related to CpuIdle stuff. > I chose CONFIG_CPU_IDLE for the first time so I don't know when this > was introduced. > > This is on x86_32, SMP. > > BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0] > > Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #3) > EIP: 0060:[] EFLAGS: 0202 CPU: 1 > EIP is at _spin_lock_irqsave+0x16/0x27 > EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293 > ESI: 0005 EDI: f7873808 EBP: ESP: f7829f10 > DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 > CR0: 8005003b CR2: 004f5960 CR3: 372c5000 CR4: 06d0 > DR0: DR1: DR2: DR3: > DR6: 0ff0 DR7: 0400 > [] tick_broadcast_oneshot_control+0x10/0xda > [] tick_notify+0x1d4/0x2eb > [] get_next_timer_interrupt+0x143/0x1b4 > [] notifier_call_chain+0x2a/0x47 > [] raw_notifier_call_chain+0x17/0x1a > [] clockevents_notify+0x19/0x4f > [] acpi_idle_enter_simple+0x183/0x1d0 > [] cpuidle_idle_call+0x53/0x78 > [] cpuidle_idle_call+0x0/0x78 > [] cpu_idle+0x97/0xb8 OK, thanks. Another one for the regression list, please. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: soft lockup - CPU#1 stuck for 15s! [swapper:0]
>-Original Message- >From: Parag Warudkar [mailto:[EMAIL PROTECTED] >Sent: Friday, December 07, 2007 2:54 PM >To: LKML >Cc: Andrew Morton; Pallipadi, Venkatesh; Linus Torvalds >Subject: BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0] > >Got this on today's git (2.6.24-rc4) while compiling stuff - Looks >like it is related to CpuIdle stuff. >I chose CONFIG_CPU_IDLE for the first time so I don't know when this >was introduced. > >This is on x86_32, SMP. > >BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0] > >Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #3) >EIP: 0060:[] EFLAGS: 0202 CPU: 1 >EIP is at _spin_lock_irqsave+0x16/0x27 >EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293 >ESI: 0005 EDI: f7873808 EBP: ESP: f7829f10 > DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 >CR0: 8005003b CR2: 004f5960 CR3: 372c5000 CR4: 06d0 >DR0: DR1: DR2: DR3: >DR6: 0ff0 DR7: 0400 > [] tick_broadcast_oneshot_control+0x10/0xda > [] tick_notify+0x1d4/0x2eb > [] get_next_timer_interrupt+0x143/0x1b4 > [] notifier_call_chain+0x2a/0x47 > [] raw_notifier_call_chain+0x17/0x1a > [] clockevents_notify+0x19/0x4f > [] acpi_idle_enter_simple+0x183/0x1d0 > [] cpuidle_idle_call+0x53/0x78 > [] cpuidle_idle_call+0x0/0x78 > [] cpu_idle+0x97/0xb8 > Looks like tick_broadcast_lock did not get freed in some path. You do not see this when you CPU_IDLE is not configured? Thanks, Venki -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0]
Got this on today's git (2.6.24-rc4) while compiling stuff - Looks like it is related to CpuIdle stuff. I chose CONFIG_CPU_IDLE for the first time so I don't know when this was introduced. This is on x86_32, SMP. BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #3) EIP: 0060:[] EFLAGS: 0202 CPU: 1 EIP is at _spin_lock_irqsave+0x16/0x27 EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293 ESI: 0005 EDI: f7873808 EBP: ESP: f7829f10 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: 004f5960 CR3: 372c5000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 [] tick_broadcast_oneshot_control+0x10/0xda [] tick_notify+0x1d4/0x2eb [] get_next_timer_interrupt+0x143/0x1b4 [] notifier_call_chain+0x2a/0x47 [] raw_notifier_call_chain+0x17/0x1a [] clockevents_notify+0x19/0x4f [] acpi_idle_enter_simple+0x183/0x1d0 [] cpuidle_idle_call+0x53/0x78 [] cpuidle_idle_call+0x0/0x78 [] cpu_idle+0x97/0xb8 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Fri, 7 Dec 2007 17:53:47 -0500 Parag Warudkar [EMAIL PROTECTED] wrote: Got this on today's git (2.6.24-rc4) while compiling stuff - Looks like it is related to CpuIdle stuff. I chose CONFIG_CPU_IDLE for the first time so I don't know when this was introduced. This is on x86_32, SMP. BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #3) EIP: 0060:[c0603e22] EFLAGS: 0202 CPU: 1 EIP is at _spin_lock_irqsave+0x16/0x27 EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293 ESI: 0005 EDI: f7873808 EBP: ESP: f7829f10 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: 004f5960 CR3: 372c5000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 [c0438233] tick_broadcast_oneshot_control+0x10/0xda [c0437c82] tick_notify+0x1d4/0x2eb [c04281b4] get_next_timer_interrupt+0x143/0x1b4 [c0605819] notifier_call_chain+0x2a/0x47 [c04345a0] raw_notifier_call_chain+0x17/0x1a [c04378b7] clockevents_notify+0x19/0x4f [c0533cc3] acpi_idle_enter_simple+0x183/0x1d0 [c058cea3] cpuidle_idle_call+0x53/0x78 [c058ce50] cpuidle_idle_call+0x0/0x78 [c0402575] cpu_idle+0x97/0xb8 OK, thanks. Another one for the regression list, please. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: soft lockup - CPU#1 stuck for 15s! [swapper:0]
-Original Message- From: Parag Warudkar [mailto:[EMAIL PROTECTED] Sent: Friday, December 07, 2007 2:54 PM To: LKML Cc: Andrew Morton; Pallipadi, Venkatesh; Linus Torvalds Subject: BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0] Got this on today's git (2.6.24-rc4) while compiling stuff - Looks like it is related to CpuIdle stuff. I chose CONFIG_CPU_IDLE for the first time so I don't know when this was introduced. This is on x86_32, SMP. BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #3) EIP: 0060:[c0603e22] EFLAGS: 0202 CPU: 1 EIP is at _spin_lock_irqsave+0x16/0x27 EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293 ESI: 0005 EDI: f7873808 EBP: ESP: f7829f10 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: 004f5960 CR3: 372c5000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 [c0438233] tick_broadcast_oneshot_control+0x10/0xda [c0437c82] tick_notify+0x1d4/0x2eb [c04281b4] get_next_timer_interrupt+0x143/0x1b4 [c0605819] notifier_call_chain+0x2a/0x47 [c04345a0] raw_notifier_call_chain+0x17/0x1a [c04378b7] clockevents_notify+0x19/0x4f [c0533cc3] acpi_idle_enter_simple+0x183/0x1d0 [c058cea3] cpuidle_idle_call+0x53/0x78 [c058ce50] cpuidle_idle_call+0x0/0x78 [c0402575] cpu_idle+0x97/0xb8 Looks like tick_broadcast_lock did not get freed in some path. You do not see this when you CPU_IDLE is not configured? Thanks, Venki -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Fri, 7 Dec 2007, Pallipadi, Venkatesh wrote: -Original Message- From: Parag Warudkar [mailto:[EMAIL PROTECTED] Sent: Friday, December 07, 2007 2:54 PM To: LKML Cc: Andrew Morton; Pallipadi, Venkatesh; Linus Torvalds Subject: BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0] Got this on today's git (2.6.24-rc4) while compiling stuff - Looks like it is related to CpuIdle stuff. I chose CONFIG_CPU_IDLE for the first time so I don't know when this was introduced. This is on x86_32, SMP. BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #3) EIP: 0060:[c0603e22] EFLAGS: 0202 CPU: 1 EIP is at _spin_lock_irqsave+0x16/0x27 EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293 ESI: 0005 EDI: f7873808 EBP: ESP: f7829f10 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: 004f5960 CR3: 372c5000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 [c0438233] tick_broadcast_oneshot_control+0x10/0xda [c0437c82] tick_notify+0x1d4/0x2eb [c04281b4] get_next_timer_interrupt+0x143/0x1b4 [c0605819] notifier_call_chain+0x2a/0x47 [c04345a0] raw_notifier_call_chain+0x17/0x1a [c04378b7] clockevents_notify+0x19/0x4f [c0533cc3] acpi_idle_enter_simple+0x183/0x1d0 [c058cea3] cpuidle_idle_call+0x53/0x78 [c058ce50] cpuidle_idle_call+0x0/0x78 [c0402575] cpu_idle+0x97/0xb8 Looks like tick_broadcast_lock did not get freed in some path. You do not see this when you CPU_IDLE is not configured? This looks pretty much like the problem I was solving yesterday. Parag, can you please try Linus latest and please check whether there is a stack trace with clockevents_program_event on the top in your dmesg. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Saturday, 8 of December 2007, Andrew Morton wrote: On Fri, 7 Dec 2007 17:53:47 -0500 Parag Warudkar [EMAIL PROTECTED] wrote: Got this on today's git (2.6.24-rc4) while compiling stuff - Looks like it is related to CpuIdle stuff. I chose CONFIG_CPU_IDLE for the first time so I don't know when this was introduced. This is on x86_32, SMP. BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #3) EIP: 0060:[c0603e22] EFLAGS: 0202 CPU: 1 EIP is at _spin_lock_irqsave+0x16/0x27 EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293 ESI: 0005 EDI: f7873808 EBP: ESP: f7829f10 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: 004f5960 CR3: 372c5000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 [c0438233] tick_broadcast_oneshot_control+0x10/0xda [c0437c82] tick_notify+0x1d4/0x2eb [c04281b4] get_next_timer_interrupt+0x143/0x1b4 [c0605819] notifier_call_chain+0x2a/0x47 [c04345a0] raw_notifier_call_chain+0x17/0x1a [c04378b7] clockevents_notify+0x19/0x4f [c0533cc3] acpi_idle_enter_simple+0x183/0x1d0 [c058cea3] cpuidle_idle_call+0x53/0x78 [c058ce50] cpuidle_idle_call+0x0/0x78 [c0402575] cpu_idle+0x97/0xb8 OK, thanks. Another one for the regression list, please. Added, http://bugzilla.kernel.org/show_bug.cgi?id=9525, thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]
On Dec 7, 2007 6:12 PM, Pallipadi, Venkatesh [EMAIL PROTECTED] wrote: Looks like tick_broadcast_lock did not get freed in some path. You do not see this when you CPU_IDLE is not configured? Thanks, Venki No, I did not see this prior to enabling CPU_IDLE. All previous kernels without CPU_IDLE also had soft lockup detection enabled. Parag -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0]
Got this on today's git (2.6.24-rc4) while compiling stuff - Looks like it is related to CpuIdle stuff. I chose CONFIG_CPU_IDLE for the first time so I don't know when this was introduced. This is on x86_32, SMP. BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #3) EIP: 0060:[c0603e22] EFLAGS: 0202 CPU: 1 EIP is at _spin_lock_irqsave+0x16/0x27 EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293 ESI: 0005 EDI: f7873808 EBP: ESP: f7829f10 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 CR0: 8005003b CR2: 004f5960 CR3: 372c5000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 [c0438233] tick_broadcast_oneshot_control+0x10/0xda [c0437c82] tick_notify+0x1d4/0x2eb [c04281b4] get_next_timer_interrupt+0x143/0x1b4 [c0605819] notifier_call_chain+0x2a/0x47 [c04345a0] raw_notifier_call_chain+0x17/0x1a [c04378b7] clockevents_notify+0x19/0x4f [c0533cc3] acpi_idle_enter_simple+0x183/0x1d0 [c058cea3] cpuidle_idle_call+0x53/0x78 [c058ce50] cpuidle_idle_call+0x0/0x78 [c0402575] cpu_idle+0x97/0xb8 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/