Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2008-01-11 Thread Thomas Gleixner
On Thu, 10 Jan 2008, Parag Warudkar wrote:

> On Jan 9, 2008 6:56 AM, Thomas Gleixner <[EMAIL PROTECTED]> wrote:
> 
> > Can you apply the patch below + the debug patch which prints the timer
> > stats on softlockup and provide the output of this.
> 
> Applied to today's git and running for 21 hours - no recurrence yet
> even with 1.2 wakeups per second.
> 
> I will re-open this thread if it bites again.

Would be interesting which commit fixed that problem. I'm a bit
worried about self healing bugs.

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2008-01-11 Thread Thomas Gleixner
On Thu, 10 Jan 2008, Parag Warudkar wrote:

 On Jan 9, 2008 6:56 AM, Thomas Gleixner [EMAIL PROTECTED] wrote:
 
  Can you apply the patch below + the debug patch which prints the timer
  stats on softlockup and provide the output of this.
 
 Applied to today's git and running for 21 hours - no recurrence yet
 even with 1.2 wakeups per second.
 
 I will re-open this thread if it bites again.

Would be interesting which commit fixed that problem. I'm a bit
worried about self healing bugs.

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2008-01-10 Thread Parag Warudkar
On Jan 9, 2008 6:56 AM, Thomas Gleixner <[EMAIL PROTECTED]> wrote:

> Can you apply the patch below + the debug patch which prints the timer
> stats on softlockup and provide the output of this.

Applied to today's git and running for 21 hours - no recurrence yet
even with 1.2 wakeups per second.

I will re-open this thread if it bites again.

Thanks

Parag
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2008-01-10 Thread Parag Warudkar
On Jan 9, 2008 6:56 AM, Thomas Gleixner [EMAIL PROTECTED] wrote:

 Can you apply the patch below + the debug patch which prints the timer
 stats on softlockup and provide the output of this.

Applied to today's git and running for 21 hours - no recurrence yet
even with 1.2 wakeups per second.

I will re-open this thread if it bites again.

Thanks

Parag
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2008-01-09 Thread Thomas Gleixner
On Mon, 17 Dec 2007, Thomas Gleixner wrote:
> I try to come up with some more debug patches tomorrow.

Sorry took a bit longer than a day :(

Can you apply the patch below + the debug patch which prints the timer
stats on softlockup and provide the output of this.

Thanks,

tglx

diff --git a/include/linux/tick.h b/include/linux/tick.h
index f4a1395..22d921f 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -39,6 +39,8 @@ enum tick_nohz_mode {
  * @idle_calls:Total number of idle calls
  * @idle_sleeps:   Number of idle calls, where the sched tick was stopped
  * @idle_entrytime:Time when the idle call was entered
+ * @idle_waketime: Time when the idle was interrupted
+ * @idle_exittime: Time when the idle state was left
  * @idle_sleeptime:Sum of the time slept in idle with sched tick stopped
  * @sleep_length:  Duration of the current idle sleep
  */
@@ -52,6 +54,8 @@ struct tick_sched {
unsigned long   idle_calls;
unsigned long   idle_sleeps;
ktime_t idle_entrytime;
+   ktime_t idle_waketime;
+   ktime_t idle_exittime;
ktime_t idle_sleeptime;
ktime_t sleep_length;
unsigned long   last_jiffies;
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index cb89fa8..206a436 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -133,14 +133,15 @@ void tick_nohz_update_jiffies(void)
if (!ts->tick_stopped)
return;
 
-   touch_softlockup_watchdog();
-
cpu_clear(cpu, nohz_cpu_mask);
now = ktime_get();
 
local_irq_save(flags);
tick_do_update_jiffies64(now);
local_irq_restore(flags);
+
+   ts->idle_waketime = now;
+   touch_softlockup_watchdog();
 }
 
 /**
@@ -369,6 +370,7 @@ void tick_nohz_restart_sched_tick(void)
 * Cancel the scheduled timer and restore the tick
 */
ts->tick_stopped  = 0;
+   ts->idle_exittime = now;
hrtimer_cancel(>sched_timer);
ts->sched_timer.expires = ts->idle_tick;
 
diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c
index 12c5f4c..d3d94c1 100644
--- a/kernel/time/timer_list.c
+++ b/kernel/time/timer_list.c
@@ -166,6 +166,8 @@ static void print_cpu(struct seq_file *m, int cpu, u64 now)
P(idle_calls);
P(idle_sleeps);
P_ns(idle_entrytime);
+   P_ns(idle_waketime);
+   P_ns(idle_exittime);
P_ns(idle_sleeptime);
P(last_jiffies);
P(next_jiffies);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2008-01-09 Thread Thomas Gleixner
On Mon, 17 Dec 2007, Thomas Gleixner wrote:
 I try to come up with some more debug patches tomorrow.

Sorry took a bit longer than a day :(

Can you apply the patch below + the debug patch which prints the timer
stats on softlockup and provide the output of this.

Thanks,

tglx

diff --git a/include/linux/tick.h b/include/linux/tick.h
index f4a1395..22d921f 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -39,6 +39,8 @@ enum tick_nohz_mode {
  * @idle_calls:Total number of idle calls
  * @idle_sleeps:   Number of idle calls, where the sched tick was stopped
  * @idle_entrytime:Time when the idle call was entered
+ * @idle_waketime: Time when the idle was interrupted
+ * @idle_exittime: Time when the idle state was left
  * @idle_sleeptime:Sum of the time slept in idle with sched tick stopped
  * @sleep_length:  Duration of the current idle sleep
  */
@@ -52,6 +54,8 @@ struct tick_sched {
unsigned long   idle_calls;
unsigned long   idle_sleeps;
ktime_t idle_entrytime;
+   ktime_t idle_waketime;
+   ktime_t idle_exittime;
ktime_t idle_sleeptime;
ktime_t sleep_length;
unsigned long   last_jiffies;
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index cb89fa8..206a436 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -133,14 +133,15 @@ void tick_nohz_update_jiffies(void)
if (!ts-tick_stopped)
return;
 
-   touch_softlockup_watchdog();
-
cpu_clear(cpu, nohz_cpu_mask);
now = ktime_get();
 
local_irq_save(flags);
tick_do_update_jiffies64(now);
local_irq_restore(flags);
+
+   ts-idle_waketime = now;
+   touch_softlockup_watchdog();
 }
 
 /**
@@ -369,6 +370,7 @@ void tick_nohz_restart_sched_tick(void)
 * Cancel the scheduled timer and restore the tick
 */
ts-tick_stopped  = 0;
+   ts-idle_exittime = now;
hrtimer_cancel(ts-sched_timer);
ts-sched_timer.expires = ts-idle_tick;
 
diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c
index 12c5f4c..d3d94c1 100644
--- a/kernel/time/timer_list.c
+++ b/kernel/time/timer_list.c
@@ -166,6 +166,8 @@ static void print_cpu(struct seq_file *m, int cpu, u64 now)
P(idle_calls);
P(idle_sleeps);
P_ns(idle_entrytime);
+   P_ns(idle_waketime);
+   P_ns(idle_exittime);
P_ns(idle_sleeptime);
P(last_jiffies);
P(next_jiffies);


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-17 Thread Thomas Gleixner
On Mon, 17 Dec 2007, Parag Warudkar wrote:
> On Dec 17, 2007 3:05 AM, Thomas Gleixner <[EMAIL PROTECTED]> wrote:
> > Sigh. You did not have the debug patch applied anymore, which printks
> > the timer_list data ? Can you apply it again and provide the output
> > please ?
> >
> 
> This keeps getting more and more weird - This time I was running with
> CONFIG_CPU_IDLE=N and I have ton of soft lockups after 14hr uptime.

It's more than weird.
 
> now at 39080147128907 nsecs

>   .idle_entrytime : 39080384013047 nsecs

The time, when we entered idle on CPU#0 is in the future. 

>   .tick_stopped   : 0

But the tick is not stopped, that means CPU#0 has work to do

On CPU#1 idle entry just happened:

>   .idle_entrytime : 39079996603653 nsecs

> now at 40490254040892 nsecs

>   .idle_entrytime : 40490492012833 nsecs

Again, idle_entry on CPU#0 is in the future.

On CPU#1 the idle entry was at: 
>   .idle_entrytime : 40489996578090 nsecs

which means: 0.257462802 sec. ago

> now at 40700144217096 nsecs

Aarg. On CPU#0 this is consistently in the future:

>   .idle_entrytime : 40700372012887 nsecs

I'm really confused.

>   .idle_entrytime : 4066620694 nsecs
>   .idle_sleeptime : 40540536046589 nsecs
>   .last_jiffies   : 1010
>   .next_jiffies   : 10100467
>   .idle_expires   : 4070186400 nsecs
> jiffies: 10100158
> 
> 
> Tick Device: mode: 1
> Clock Event Device: hpet
>  max_delta_ns:   2147483647
>  min_delta_ns:   3352
>  mult:   61496110
>  shift:  32
>  mode:   3
>  next_event: 4070069200 nsecs
>  set_next_event: hpet_legacy_next_event
>  set_mode:   hpet_legacy_set_mode
>  event_handler:  tick_handle_oneshot_broadcast
> tick_broadcast_mask: 0003

Here is the next inconsistent data:

> tick_broadcast_oneshot_mask: 0003

CPU#1 just woke up. That means the broadcast oneshot mask must be
cleared for CPU#1.

Some real strange thing is going on in your box.

I try to come up with some more debug patches tomorrow.

  tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-17 Thread Parag Warudkar
On Dec 17, 2007 3:05 AM, Thomas Gleixner <[EMAIL PROTECTED]> wrote:
>
> On Sun, 16 Dec 2007, Parag Warudkar wrote:
>
> > On Dec 16, 2007 12:15 AM, Parag Warudkar <[EMAIL PROTECTED]> wrote:
> > > On Sat, 15 Dec 2007, Parag Warudkar wrote:
> > >
> > > >> I will run it for a little longer just to be sure - but I don't think 
> > > >> it
> > > >> will be a problem.
> > >
> > > No problems for last 10 hours - I consider this fixed.
> > >
> >
> > Arghh - spoke 8 hours too soon. I left it running overnight and
> > morning I see a bunch of softlockups - so NO NOT FIXED.
> >
> > Parag
> >
> > BUG: soft lockup - CPU#1 stuck for 13s! [swapper:0]
> >
> > Pid: 0, comm: swapper Not tainted (2.6.24-rc5 #21)
> > EIP: 0060:[] EFLAGS: 0206 CPU: 1
> > EIP is at acpi_idle_enter_simple+0x166/0x1d0
> > EAX: f7829f88 EBX: 0dab ECX: 0266 EDX: 
> > ESI:  EDI: 00c056e5 EBP: 00c0493a ESP: f7829f88
> >  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
> > CR0: 8005003b CR2: 080f6c78 CR3: 00718000 CR4: 06d0
> > DR0:  DR1:  DR2:  DR3: 
> > DR6: 0ff0 DR7: 0400
>
> Sigh. You did not have the debug patch applied anymore, which printks
> the timer_list data ? Can you apply it again and provide the output
> please ?
>

This keeps getting more and more weird - This time I was running with
CONFIG_CPU_IDLE=N and I have ton of soft lockups after 14hr uptime.

The timer_list data for a few lockups is below. If you need the full
thing grab it from http://warudkars.net/messages.bz2 .

BUG: soft lockup - CPU#1 stuck for 12s! [kjournald:424]

Pid: 424, comm: kjournald Not tainted (2.6.24-rc5 #24)
EIP: 0060:[] EFLAGS: 0206 CPU: 1
EIP is at __journal_remove_journal_head+0x18/0xe8
EAX: dc75e1c0 EBX: dc75e1c0 ECX: f6d539f0 EDX: dc75e1f8
ESI: dc75e1f8 EDI: 0009 EBP: f712a000 ESP: f7157f28
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
CR0: 8005003b CR2: b7d18eb4 CR3: 37352000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [] __journal_unfile_buffer+0x8/0x11
 [] journal_put_journal_head+0x3d/0x4b
 [] journal_commit_transaction+0x75d/0xbc9
 [] lock_timer_base+0x19/0x35
 [] kjournald+0xa2/0x1d0
 [] autoremove_wake_function+0x0/0x35
 [] kjournald+0x0/0x1d0
 [] kthread+0x38/0x5d
 [] kthread+0x0/0x5d
 [] kernel_thread_helper+0x7/0x10
 ===
Timer List Version: v0.3
HRTIMER_MAX_CLOCK_BASES: 2
now at 39080147128907 nsecs

cpu: 0
 clock 0:
  .index:  0
  .resolution: 1 nsecs
  .get_time:   ktime_get_real
  .offset: 1197846065352886518 nsecs
active timers:
 clock 1:
  .index:  1
  .resolution: 1 nsecs
  .get_time:   ktime_get
  .offset: 0 nsecs
active timers:
 #0: , tick_sched_timer, S:01, tick_nohz_stop_sched_tick, swapper/0
 # expires at 3908027200 nsecs [in 124871093 nsecs]
 #1: , it_real_fn, S:01, do_setitimer, syslogd/2522
 # expires at 3908647554 nsecs [in 6328414426 nsecs]
 #2: , hrtimer_wakeup, S:01, do_nanosleep, atd/3039
 # expires at 39332442305024 nsecs [in 252295176117 nsecs]
  .expires_next   : 3908032000 nsecs
  .hres_active: 1
  .nr_events  : 81214
  .nohz_mode  : 2
  .idle_tick  : 390780 nsecs
  .tick_stopped   : 0
  .idle_jiffies   : 9694501
  .idle_calls : 171123
  .idle_sleeps: 106664
  .idle_entrytime : 39080384013047 nsecs
  .idle_sleeptime : 38923743508612 nsecs
  .last_jiffies   : 9695101
  .next_jiffies   : 9695250
  .idle_expires   : 3907999600 nsecs
jiffies: 9695106

cpu: 1
 clock 0:
  .index:  0
  .resolution: 1 nsecs
  .get_time:   ktime_get_real
  .offset: 1197846065352886518 nsecs
active timers:
 clock 1:
  .index:  1
  .resolution: 1 nsecs
  .get_time:   ktime_get
  .offset: 0 nsecs
active timers:
 #0: , hrtimer_wakeup, S:01, do_nanosleep, crond/2946
 # expires at 39115730211364 nsecs [in 35583082457 nsecs]
 #1: , hrtimer_wakeup, S:01, do_nanosleep, smartd/3480
 # expires at 39641838017337 nsecs [in 561690888430 nsecs]
  .expires_next   : 3907999700 nsecs
  .hres_active: 1
  .nr_events  : 48789
  .nohz_mode  : 2
  .idle_tick  : 3906686500 nsecs
  .tick_stopped   : 0
  .idle_jiffies   : 9691717
  .idle_calls : 125065
  .idle_sleeps: 79815
  .idle_entrytime : 39079996603653 nsecs
  .idle_sleeptime : 38923914717954 nsecs
  .last_jiffies   : 9695000
  .next_jiffies   : 9695467
  .idle_expires   : 3908186400 nsecs
jiffies: 9695162


Tick Device: mode: 1
Clock Event Device: hpet
 max_delta_ns:   2147483647
 min_delta_ns:   3352
 mult:   61496110
 shift:  32
 mode:   3
 next_event: 3908070400 nsecs
 set_next_event: hpet_legacy_next_event
 set_mode:   hpet_legacy_set_mode
 event_handler:  tick_handle_oneshot_broadcast
tick_broadcast_mask: 0003
tick_broadcast_oneshot_mask: 0001


Tick Device: mode: 1
Clock Event Device: lapic
 max_delta_ns:   805378776
 min_delta_ns:   1440
 mult:   44735215
 shift:  32
 mode: 

Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-17 Thread Thomas Gleixner
On Sun, 16 Dec 2007, Parag Warudkar wrote:

> On Dec 16, 2007 12:15 AM, Parag Warudkar <[EMAIL PROTECTED]> wrote:
> > On Sat, 15 Dec 2007, Parag Warudkar wrote:
> >
> > >> I will run it for a little longer just to be sure - but I don't think it
> > >> will be a problem.
> >
> > No problems for last 10 hours - I consider this fixed.
> >
> 
> Arghh - spoke 8 hours too soon. I left it running overnight and
> morning I see a bunch of softlockups - so NO NOT FIXED.
> 
> Parag
> 
> BUG: soft lockup - CPU#1 stuck for 13s! [swapper:0]
> 
> Pid: 0, comm: swapper Not tainted (2.6.24-rc5 #21)
> EIP: 0060:[] EFLAGS: 0206 CPU: 1
> EIP is at acpi_idle_enter_simple+0x166/0x1d0
> EAX: f7829f88 EBX: 0dab ECX: 0266 EDX: 
> ESI:  EDI: 00c056e5 EBP: 00c0493a ESP: f7829f88
>  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
> CR0: 8005003b CR2: 080f6c78 CR3: 00718000 CR4: 06d0
> DR0:  DR1:  DR2:  DR3: 
> DR6: 0ff0 DR7: 0400

Sigh. You did not have the debug patch applied anymore, which printks
the timer_list data ? Can you apply it again and provide the output
please ?

Thanks,

tglx


 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-17 Thread Thomas Gleixner
On Sun, 16 Dec 2007, Parag Warudkar wrote:

 On Dec 16, 2007 12:15 AM, Parag Warudkar [EMAIL PROTECTED] wrote:
  On Sat, 15 Dec 2007, Parag Warudkar wrote:
 
   I will run it for a little longer just to be sure - but I don't think it
   will be a problem.
 
  No problems for last 10 hours - I consider this fixed.
 
 
 Arghh - spoke 8 hours too soon. I left it running overnight and
 morning I see a bunch of softlockups - so NO NOT FIXED.
 
 Parag
 
 BUG: soft lockup - CPU#1 stuck for 13s! [swapper:0]
 
 Pid: 0, comm: swapper Not tainted (2.6.24-rc5 #21)
 EIP: 0060:[c0533ca6] EFLAGS: 0206 CPU: 1
 EIP is at acpi_idle_enter_simple+0x166/0x1d0
 EAX: f7829f88 EBX: 0dab ECX: 0266 EDX: 
 ESI:  EDI: 00c056e5 EBP: 00c0493a ESP: f7829f88
  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
 CR0: 8005003b CR2: 080f6c78 CR3: 00718000 CR4: 06d0
 DR0:  DR1:  DR2:  DR3: 
 DR6: 0ff0 DR7: 0400

Sigh. You did not have the debug patch applied anymore, which printks
the timer_list data ? Can you apply it again and provide the output
please ?

Thanks,

tglx


 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-17 Thread Parag Warudkar
On Dec 17, 2007 3:05 AM, Thomas Gleixner [EMAIL PROTECTED] wrote:

 On Sun, 16 Dec 2007, Parag Warudkar wrote:

  On Dec 16, 2007 12:15 AM, Parag Warudkar [EMAIL PROTECTED] wrote:
   On Sat, 15 Dec 2007, Parag Warudkar wrote:
  
I will run it for a little longer just to be sure - but I don't think 
it
will be a problem.
  
   No problems for last 10 hours - I consider this fixed.
  
 
  Arghh - spoke 8 hours too soon. I left it running overnight and
  morning I see a bunch of softlockups - so NO NOT FIXED.
 
  Parag
 
  BUG: soft lockup - CPU#1 stuck for 13s! [swapper:0]
 
  Pid: 0, comm: swapper Not tainted (2.6.24-rc5 #21)
  EIP: 0060:[c0533ca6] EFLAGS: 0206 CPU: 1
  EIP is at acpi_idle_enter_simple+0x166/0x1d0
  EAX: f7829f88 EBX: 0dab ECX: 0266 EDX: 
  ESI:  EDI: 00c056e5 EBP: 00c0493a ESP: f7829f88
   DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
  CR0: 8005003b CR2: 080f6c78 CR3: 00718000 CR4: 06d0
  DR0:  DR1:  DR2:  DR3: 
  DR6: 0ff0 DR7: 0400

 Sigh. You did not have the debug patch applied anymore, which printks
 the timer_list data ? Can you apply it again and provide the output
 please ?


This keeps getting more and more weird - This time I was running with
CONFIG_CPU_IDLE=N and I have ton of soft lockups after 14hr uptime.

The timer_list data for a few lockups is below. If you need the full
thing grab it from http://warudkars.net/messages.bz2 .

BUG: soft lockup - CPU#1 stuck for 12s! [kjournald:424]

Pid: 424, comm: kjournald Not tainted (2.6.24-rc5 #24)
EIP: 0060:[c04bbd57] EFLAGS: 0206 CPU: 1
EIP is at __journal_remove_journal_head+0x18/0xe8
EAX: dc75e1c0 EBX: dc75e1c0 ECX: f6d539f0 EDX: dc75e1f8
ESI: dc75e1f8 EDI: 0009 EBP: f712a000 ESP: f7157f28
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
CR0: 8005003b CR2: b7d18eb4 CR3: 37352000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [c04b8680] __journal_unfile_buffer+0x8/0x11
 [c04bbe64] journal_put_journal_head+0x3d/0x4b
 [c04ba254] journal_commit_transaction+0x75d/0xbc9
 [c0428246] lock_timer_base+0x19/0x35
 [c04bcda0] kjournald+0xa2/0x1d0
 [c0430e29] autoremove_wake_function+0x0/0x35
 [c04bccfe] kjournald+0x0/0x1d0
 [c0430d63] kthread+0x38/0x5d
 [c0430d2b] kthread+0x0/0x5d
 [c0404a0f] kernel_thread_helper+0x7/0x10
 ===
Timer List Version: v0.3
HRTIMER_MAX_CLOCK_BASES: 2
now at 39080147128907 nsecs

cpu: 0
 clock 0:
  .index:  0
  .resolution: 1 nsecs
  .get_time:   ktime_get_real
  .offset: 1197846065352886518 nsecs
active timers:
 clock 1:
  .index:  1
  .resolution: 1 nsecs
  .get_time:   ktime_get
  .offset: 0 nsecs
active timers:
 #0: f7157dd4, tick_sched_timer, S:01, tick_nohz_stop_sched_tick, swapper/0
 # expires at 3908027200 nsecs [in 124871093 nsecs]
 #1: f7157dd4, it_real_fn, S:01, do_setitimer, syslogd/2522
 # expires at 3908647554 nsecs [in 6328414426 nsecs]
 #2: f7157dd4, hrtimer_wakeup, S:01, do_nanosleep, atd/3039
 # expires at 39332442305024 nsecs [in 252295176117 nsecs]
  .expires_next   : 3908032000 nsecs
  .hres_active: 1
  .nr_events  : 81214
  .nohz_mode  : 2
  .idle_tick  : 390780 nsecs
  .tick_stopped   : 0
  .idle_jiffies   : 9694501
  .idle_calls : 171123
  .idle_sleeps: 106664
  .idle_entrytime : 39080384013047 nsecs
  .idle_sleeptime : 38923743508612 nsecs
  .last_jiffies   : 9695101
  .next_jiffies   : 9695250
  .idle_expires   : 3907999600 nsecs
jiffies: 9695106

cpu: 1
 clock 0:
  .index:  0
  .resolution: 1 nsecs
  .get_time:   ktime_get_real
  .offset: 1197846065352886518 nsecs
active timers:
 clock 1:
  .index:  1
  .resolution: 1 nsecs
  .get_time:   ktime_get
  .offset: 0 nsecs
active timers:
 #0: f7157dd4, hrtimer_wakeup, S:01, do_nanosleep, crond/2946
 # expires at 39115730211364 nsecs [in 35583082457 nsecs]
 #1: f7157dd4, hrtimer_wakeup, S:01, do_nanosleep, smartd/3480
 # expires at 39641838017337 nsecs [in 561690888430 nsecs]
  .expires_next   : 3907999700 nsecs
  .hres_active: 1
  .nr_events  : 48789
  .nohz_mode  : 2
  .idle_tick  : 3906686500 nsecs
  .tick_stopped   : 0
  .idle_jiffies   : 9691717
  .idle_calls : 125065
  .idle_sleeps: 79815
  .idle_entrytime : 39079996603653 nsecs
  .idle_sleeptime : 38923914717954 nsecs
  .last_jiffies   : 9695000
  .next_jiffies   : 9695467
  .idle_expires   : 3908186400 nsecs
jiffies: 9695162


Tick Device: mode: 1
Clock Event Device: hpet
 max_delta_ns:   2147483647
 min_delta_ns:   3352
 mult:   61496110
 shift:  32
 mode:   3
 next_event: 3908070400 nsecs
 set_next_event: hpet_legacy_next_event
 set_mode:   hpet_legacy_set_mode
 event_handler:  tick_handle_oneshot_broadcast
tick_broadcast_mask: 0003
tick_broadcast_oneshot_mask: 0001


Tick Device: mode: 1
Clock Event Device: lapic
 max_delta_ns:   805378776
 min_delta_ns:   1440
 

Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-17 Thread Thomas Gleixner
On Mon, 17 Dec 2007, Parag Warudkar wrote:
 On Dec 17, 2007 3:05 AM, Thomas Gleixner [EMAIL PROTECTED] wrote:
  Sigh. You did not have the debug patch applied anymore, which printks
  the timer_list data ? Can you apply it again and provide the output
  please ?
 
 
 This keeps getting more and more weird - This time I was running with
 CONFIG_CPU_IDLE=N and I have ton of soft lockups after 14hr uptime.

It's more than weird.
 
 now at 39080147128907 nsecs

   .idle_entrytime : 39080384013047 nsecs

The time, when we entered idle on CPU#0 is in the future. 

   .tick_stopped   : 0

But the tick is not stopped, that means CPU#0 has work to do

On CPU#1 idle entry just happened:

   .idle_entrytime : 39079996603653 nsecs

 now at 40490254040892 nsecs

   .idle_entrytime : 40490492012833 nsecs

Again, idle_entry on CPU#0 is in the future.

On CPU#1 the idle entry was at: 
   .idle_entrytime : 40489996578090 nsecs

which means: 0.257462802 sec. ago

 now at 40700144217096 nsecs

Aarg. On CPU#0 this is consistently in the future:

   .idle_entrytime : 40700372012887 nsecs

I'm really confused.

   .idle_entrytime : 4066620694 nsecs
   .idle_sleeptime : 40540536046589 nsecs
   .last_jiffies   : 1010
   .next_jiffies   : 10100467
   .idle_expires   : 4070186400 nsecs
 jiffies: 10100158
 
 
 Tick Device: mode: 1
 Clock Event Device: hpet
  max_delta_ns:   2147483647
  min_delta_ns:   3352
  mult:   61496110
  shift:  32
  mode:   3
  next_event: 4070069200 nsecs
  set_next_event: hpet_legacy_next_event
  set_mode:   hpet_legacy_set_mode
  event_handler:  tick_handle_oneshot_broadcast
 tick_broadcast_mask: 0003

Here is the next inconsistent data:

 tick_broadcast_oneshot_mask: 0003

CPU#1 just woke up. That means the broadcast oneshot mask must be
cleared for CPU#1.

Some real strange thing is going on in your box.

I try to come up with some more debug patches tomorrow.

  tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-16 Thread Parag Warudkar
On Dec 16, 2007 12:15 AM, Parag Warudkar <[EMAIL PROTECTED]> wrote:
> On Sat, 15 Dec 2007, Parag Warudkar wrote:
>
> >> I will run it for a little longer just to be sure - but I don't think it
> >> will be a problem.
>
> No problems for last 10 hours - I consider this fixed.
>

Arghh - spoke 8 hours too soon. I left it running overnight and
morning I see a bunch of softlockups - so NO NOT FIXED.

Parag

BUG: soft lockup - CPU#1 stuck for 13s! [swapper:0]

Pid: 0, comm: swapper Not tainted (2.6.24-rc5 #21)
EIP: 0060:[] EFLAGS: 0206 CPU: 1
EIP is at acpi_idle_enter_simple+0x166/0x1d0
EAX: f7829f88 EBX: 0dab ECX: 0266 EDX: 
ESI:  EDI: 00c056e5 EBP: 00c0493a ESP: f7829f88
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
CR0: 8005003b CR2: 080f6c78 CR3: 00718000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-16 Thread Parag Warudkar
On Dec 16, 2007 12:15 AM, Parag Warudkar [EMAIL PROTECTED] wrote:
 On Sat, 15 Dec 2007, Parag Warudkar wrote:

  I will run it for a little longer just to be sure - but I don't think it
  will be a problem.

 No problems for last 10 hours - I consider this fixed.


Arghh - spoke 8 hours too soon. I left it running overnight and
morning I see a bunch of softlockups - so NO NOT FIXED.

Parag

BUG: soft lockup - CPU#1 stuck for 13s! [swapper:0]

Pid: 0, comm: swapper Not tainted (2.6.24-rc5 #21)
EIP: 0060:[c0533ca6] EFLAGS: 0206 CPU: 1
EIP is at acpi_idle_enter_simple+0x166/0x1d0
EAX: f7829f88 EBX: 0dab ECX: 0266 EDX: 
ESI:  EDI: 00c056e5 EBP: 00c0493a ESP: f7829f88
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
CR0: 8005003b CR2: 080f6c78 CR3: 00718000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-15 Thread Parag Warudkar

On Sat, 15 Dec 2007, Parag Warudkar wrote:

I will run it for a little longer just to be sure - but I don't think it 
will be a problem.


No problems for last 10 hours - I consider this fixed.

Parag
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-15 Thread Parag Warudkar

On Sat, 15 Dec 2007, Thomas Gleixner wrote:


I have a patch staged for Linus, which fixes a thinko in the broadcast
code. It might be related to your problem. Can you give it a try ?


Yep - this looks promising. No soft lockups for last half an hour 
with 4-5 Wakeups from idle. It is almost smooth - jerkiness gone down by 
99%. There are still very infrequent and quick spells where the key typed 
on the keyboard takes effect a little later - but nothing like earlier.


I will run it for a little longer just to be sure - but I don't think it 
will be a problem.


Thanks Thomas.

Parag
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-15 Thread Thomas Gleixner
On Fri, 14 Dec 2007, Parag Warudkar wrote:

> On Dec 14, 2007 6:17 PM, Len Brown <[EMAIL PROTECTED]> wrote:
> > does processor.max_cstate=1 make the failing configuration work?
> > If yes, how about processor.max_cstate=2?
> 
> Until now 2 things were necessary to reproduce the problem -
> 1) CPU_IDLE=y and
> 2) Wakeups from Idle = 5-7 Per second (== Longer/deeper C state residency)
> 
> If I left the wakeups to high number (50-60) - there were no lockups
> but it was very jerky over ssh.
> (Typing keys had no effect for seconds etc.)
> 
> CPU_IDLE=y
> 
> With max_cstate=1 and CPU_IDLE=y things are pretty smooth - no lockups
> for the last hour. (Soft lockups used to appear in minutes
> previously.)
> 
> With max_cstate=2 - old story repeats - it's very jerky and soft
> lockups appear in under a minute after going to 3-5 wakeups from idle
> per sec.

I have a patch staged for Linus, which fixes a thinko in the broadcast
code. It might be related to your problem. Can you give it a try ?

Thanks,

tglx

-->

Subject: clockevents: fix reprogramming decision in oneshot broadcast
From: Thomas Gleixner <[EMAIL PROTECTED]>

A previous version of the code did the reprogramming of the broadcast
device in the return from idle code. This was removed, but the logic in
tick_handle_oneshot_broadcast() was kept the same.

When a broadcast interrupt happens we signal the expiry to all CPUs
which have an expired event. If none of the CPUs has an expired event,
which can happen in dyntick mode, then we reprogram the broadcast
device. We do not reprogram otherwise, but this is only correct if all
CPUs, which are in the idle broadcast state have been woken up.

The code ignores, that there might be pending not yet expired events on
other CPUs, which are in the idle broadcast state. So the delivery of
those events can be delayed for quite a time.

Change the tick_handle_oneshot_broadcast() function to check for CPUs,
which are in broadcast state and are not woken up by the current event,
and enforce the rearming of the broadcast device for those CPUs.

Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 kernel/time/tick-broadcast.c |   56 ---
 1 file changed, 21 insertions(+), 35 deletions(-)

Index: linux-x86.q/kernel/time/tick-broadcast.c
===
--- linux-x86.q.orig/kernel/time/tick-broadcast.c
+++ linux-x86.q/kernel/time/tick-broadcast.c
@@ -384,45 +384,19 @@ int tick_resume_broadcast_oneshot(struct
 }
 
 /*
- * Reprogram the broadcast device:
- *
- * Called with tick_broadcast_lock held and interrupts disabled.
- */
-static int tick_broadcast_reprogram(void)
-{
-   ktime_t expires = { .tv64 = KTIME_MAX };
-   struct tick_device *td;
-   int cpu;
-
-   /*
-* Find the event which expires next:
-*/
-   for (cpu = first_cpu(tick_broadcast_oneshot_mask); cpu != NR_CPUS;
-cpu = next_cpu(cpu, tick_broadcast_oneshot_mask)) {
-   td = _cpu(tick_cpu_device, cpu);
-   if (td->evtdev->next_event.tv64 < expires.tv64)
-   expires = td->evtdev->next_event;
-   }
-
-   if (expires.tv64 == KTIME_MAX)
-   return 0;
-
-   return tick_broadcast_set_event(expires, 0);
-}
-
-/*
  * Handle oneshot mode broadcasting
  */
 static void tick_handle_oneshot_broadcast(struct clock_event_device *dev)
 {
struct tick_device *td;
cpumask_t mask;
-   ktime_t now;
+   ktime_t now, next_event;
int cpu;
 
spin_lock(_broadcast_lock);
 again:
dev->next_event.tv64 = KTIME_MAX;
+   next_event.tv64 = KTIME_MAX;
mask = CPU_MASK_NONE;
now = ktime_get();
/* Find all expired events */
@@ -431,19 +405,31 @@ again:
td = _cpu(tick_cpu_device, cpu);
if (td->evtdev->next_event.tv64 <= now.tv64)
cpu_set(cpu, mask);
+   else if (td->evtdev->next_event.tv64 < next_event.tv64)
+   next_event.tv64 = td->evtdev->next_event.tv64;
}
 
/*
-* Wakeup the cpus which have an expired event. The broadcast
-* device is reprogrammed in the return from idle code.
+* Wakeup the cpus which have an expired event.
+*/
+   tick_do_broadcast(mask);
+
+   /*
+* Two reasons for reprogram:
+*
+* - The global event did not expire any CPU local
+* events. This happens in dyntick mode, as the maximum PIT
+* delta is quite small.
+*
+* - There are pending events on sleeping CPUs which were not
+* in the event mask
 */
-   if (!tick_do_broadcast(mask)) {
+   if (next_event.tv64 != KTIME_MAX) {
/*
-* The global event did not expire any CPU local
-* events. This happens in dyntick mode, as 

Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-15 Thread Thomas Gleixner
On Fri, 14 Dec 2007, Parag Warudkar wrote:

 On Dec 14, 2007 6:17 PM, Len Brown [EMAIL PROTECTED] wrote:
  does processor.max_cstate=1 make the failing configuration work?
  If yes, how about processor.max_cstate=2?
 
 Until now 2 things were necessary to reproduce the problem -
 1) CPU_IDLE=y and
 2) Wakeups from Idle = 5-7 Per second (== Longer/deeper C state residency)
 
 If I left the wakeups to high number (50-60) - there were no lockups
 but it was very jerky over ssh.
 (Typing keys had no effect for seconds etc.)
 
 CPU_IDLE=y
 
 With max_cstate=1 and CPU_IDLE=y things are pretty smooth - no lockups
 for the last hour. (Soft lockups used to appear in minutes
 previously.)
 
 With max_cstate=2 - old story repeats - it's very jerky and soft
 lockups appear in under a minute after going to 3-5 wakeups from idle
 per sec.

I have a patch staged for Linus, which fixes a thinko in the broadcast
code. It might be related to your problem. Can you give it a try ?

Thanks,

tglx

--

Subject: clockevents: fix reprogramming decision in oneshot broadcast
From: Thomas Gleixner [EMAIL PROTECTED]

A previous version of the code did the reprogramming of the broadcast
device in the return from idle code. This was removed, but the logic in
tick_handle_oneshot_broadcast() was kept the same.

When a broadcast interrupt happens we signal the expiry to all CPUs
which have an expired event. If none of the CPUs has an expired event,
which can happen in dyntick mode, then we reprogram the broadcast
device. We do not reprogram otherwise, but this is only correct if all
CPUs, which are in the idle broadcast state have been woken up.

The code ignores, that there might be pending not yet expired events on
other CPUs, which are in the idle broadcast state. So the delivery of
those events can be delayed for quite a time.

Change the tick_handle_oneshot_broadcast() function to check for CPUs,
which are in broadcast state and are not woken up by the current event,
and enforce the rearming of the broadcast device for those CPUs.

Signed-off-by: Thomas Gleixner [EMAIL PROTECTED]
Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
---
 kernel/time/tick-broadcast.c |   56 ---
 1 file changed, 21 insertions(+), 35 deletions(-)

Index: linux-x86.q/kernel/time/tick-broadcast.c
===
--- linux-x86.q.orig/kernel/time/tick-broadcast.c
+++ linux-x86.q/kernel/time/tick-broadcast.c
@@ -384,45 +384,19 @@ int tick_resume_broadcast_oneshot(struct
 }
 
 /*
- * Reprogram the broadcast device:
- *
- * Called with tick_broadcast_lock held and interrupts disabled.
- */
-static int tick_broadcast_reprogram(void)
-{
-   ktime_t expires = { .tv64 = KTIME_MAX };
-   struct tick_device *td;
-   int cpu;
-
-   /*
-* Find the event which expires next:
-*/
-   for (cpu = first_cpu(tick_broadcast_oneshot_mask); cpu != NR_CPUS;
-cpu = next_cpu(cpu, tick_broadcast_oneshot_mask)) {
-   td = per_cpu(tick_cpu_device, cpu);
-   if (td-evtdev-next_event.tv64  expires.tv64)
-   expires = td-evtdev-next_event;
-   }
-
-   if (expires.tv64 == KTIME_MAX)
-   return 0;
-
-   return tick_broadcast_set_event(expires, 0);
-}
-
-/*
  * Handle oneshot mode broadcasting
  */
 static void tick_handle_oneshot_broadcast(struct clock_event_device *dev)
 {
struct tick_device *td;
cpumask_t mask;
-   ktime_t now;
+   ktime_t now, next_event;
int cpu;
 
spin_lock(tick_broadcast_lock);
 again:
dev-next_event.tv64 = KTIME_MAX;
+   next_event.tv64 = KTIME_MAX;
mask = CPU_MASK_NONE;
now = ktime_get();
/* Find all expired events */
@@ -431,19 +405,31 @@ again:
td = per_cpu(tick_cpu_device, cpu);
if (td-evtdev-next_event.tv64 = now.tv64)
cpu_set(cpu, mask);
+   else if (td-evtdev-next_event.tv64  next_event.tv64)
+   next_event.tv64 = td-evtdev-next_event.tv64;
}
 
/*
-* Wakeup the cpus which have an expired event. The broadcast
-* device is reprogrammed in the return from idle code.
+* Wakeup the cpus which have an expired event.
+*/
+   tick_do_broadcast(mask);
+
+   /*
+* Two reasons for reprogram:
+*
+* - The global event did not expire any CPU local
+* events. This happens in dyntick mode, as the maximum PIT
+* delta is quite small.
+*
+* - There are pending events on sleeping CPUs which were not
+* in the event mask
 */
-   if (!tick_do_broadcast(mask)) {
+   if (next_event.tv64 != KTIME_MAX) {
/*
-* The global event did not expire any CPU local
-* events. This happens in dyntick mode, as the
-* maximum PIT 

Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-15 Thread Parag Warudkar

On Sat, 15 Dec 2007, Thomas Gleixner wrote:


I have a patch staged for Linus, which fixes a thinko in the broadcast
code. It might be related to your problem. Can you give it a try ?


Yep - this looks promising. No soft lockups for last half an hour 
with 4-5 Wakeups from idle. It is almost smooth - jerkiness gone down by 
99%. There are still very infrequent and quick spells where the key typed 
on the keyboard takes effect a little later - but nothing like earlier.


I will run it for a little longer just to be sure - but I don't think it 
will be a problem.


Thanks Thomas.

Parag
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-15 Thread Parag Warudkar

On Sat, 15 Dec 2007, Parag Warudkar wrote:

I will run it for a little longer just to be sure - but I don't think it 
will be a problem.


No problems for last 10 hours - I consider this fixed.

Parag
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-14 Thread Parag Warudkar
On Dec 14, 2007 6:17 PM, Len Brown <[EMAIL PROTECTED]> wrote:
> does processor.max_cstate=1 make the failing configuration work?
> If yes, how about processor.max_cstate=2?

Until now 2 things were necessary to reproduce the problem -
1) CPU_IDLE=y and
2) Wakeups from Idle = 5-7 Per second (== Longer/deeper C state residency)

If I left the wakeups to high number (50-60) - there were no lockups
but it was very jerky over ssh.
(Typing keys had no effect for seconds etc.)

CPU_IDLE=y

With max_cstate=1 and CPU_IDLE=y things are pretty smooth - no lockups
for the last hour. (Soft lockups used to appear in minutes
previously.)

With max_cstate=2 - old story repeats - it's very jerky and soft
lockups appear in under a minute after going to 3-5 wakeups from idle
per sec.

>
> what do you see in /proc/acpi/processor/*/power?
>

Normally (without any max_cstate= parameter) I see this -
(I admit I never thought there was a C8 - C3 was all I ever heard.)

[EMAIL PROTECTED] router]$ cat /proc/acpi/processor/CPU0/power
active state:C2
max_cstate:  C8
bus master activity: 
maximum allowed latency: 8000 usec
states:
C1:  type[C1] promotion[C2] demotion[--]
latency[001] usage[1010] duration[]
   *C2:  type[C2] promotion[--] demotion[C1]
latency[001] usage[07181700] duration[379397304978]


Thanks
Parag
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-14 Thread Len Brown
does processor.max_cstate=1 make the failing configuration work?
If yes, how about processor.max_cstate=2?

what do you see in /proc/acpi/processor/*/power?

-Len
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-14 Thread Len Brown
does processor.max_cstate=1 make the failing configuration work?
If yes, how about processor.max_cstate=2?

what do you see in /proc/acpi/processor/*/power?

-Len
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-14 Thread Parag Warudkar
On Dec 14, 2007 6:17 PM, Len Brown [EMAIL PROTECTED] wrote:
 does processor.max_cstate=1 make the failing configuration work?
 If yes, how about processor.max_cstate=2?

Until now 2 things were necessary to reproduce the problem -
1) CPU_IDLE=y and
2) Wakeups from Idle = 5-7 Per second (== Longer/deeper C state residency)

If I left the wakeups to high number (50-60) - there were no lockups
but it was very jerky over ssh.
(Typing keys had no effect for seconds etc.)

CPU_IDLE=y

With max_cstate=1 and CPU_IDLE=y things are pretty smooth - no lockups
for the last hour. (Soft lockups used to appear in minutes
previously.)

With max_cstate=2 - old story repeats - it's very jerky and soft
lockups appear in under a minute after going to 3-5 wakeups from idle
per sec.


 what do you see in /proc/acpi/processor/*/power?


Normally (without any max_cstate= parameter) I see this -
(I admit I never thought there was a C8 - C3 was all I ever heard.)

[EMAIL PROTECTED] router]$ cat /proc/acpi/processor/CPU0/power
active state:C2
max_cstate:  C8
bus master activity: 
maximum allowed latency: 8000 usec
states:
C1:  type[C1] promotion[C2] demotion[--]
latency[001] usage[1010] duration[]
   *C2:  type[C2] promotion[--] demotion[C1]
latency[001] usage[07181700] duration[379397304978]


Thanks
Parag
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-09 Thread Thomas Gleixner
On Sun, 9 Dec 2007, Parag Warudkar wrote:

> On Dec 8, 2007 6:12 PM, Parag Warudkar <[EMAIL PROTECTED]> wrote:
> > No problems after disabling CONFIG_HIGHRES_TIMERS , CONFIG_CPU_IDLE
> > and CONFIG_NO_HZ.
> >
> > I will try enabling them one by one - HRT, NOHZ and CPU_IDLE last -
> > that way we can at least tell what is required to be hit with this
> > problem.
> 
> Looks like CPU_IDLE=y is necessary for the problem to show up.
> With CPU_IDLE=n HRT+NO_HZ+TICK_ONESHOT does not give soft lockup problems.
> (Actually with HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=y  I do see short
> freezes on ssh - when I cannot type anything for may be a second even
> under 100% idle. But Soft Lock up doesnt show up in dmesg with this
> configuration.)

Can you please apply the patch below ? It prints out the internal
state of the clockevents/timer system when the softlockup is detected.

Thanks,

tglx

diff --git a/kernel/softlockup.c b/kernel/softlockup.c
index 11df812..82f1a05 100644
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -118,6 +118,7 @@ void softlockup_tick(void)
show_regs(regs);
else
dump_stack();
+   sysrq_timer_list_show();
spin_unlock(_lock);
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-09 Thread Arjan van de Ven
On Sun, 9 Dec 2007 16:57:38 -0500
"Parag Warudkar" <[EMAIL PROTECTED]> wrote:

> On Dec 8, 2007 6:12 PM, Parag Warudkar <[EMAIL PROTECTED]>
> wrote:
> > No problems after disabling CONFIG_HIGHRES_TIMERS , CONFIG_CPU_IDLE
> > and CONFIG_NO_HZ.
> >
> > I will try enabling them one by one - HRT, NOHZ and CPU_IDLE last -
> > that way we can at least tell what is required to be hit with this
> > problem.
> 
> Looks like CPU_IDLE=y is necessary for the problem to show up.
> With CPU_IDLE=n HRT+NO_HZ+TICK_ONESHOT does not give soft lockup
> problems. (Actually with HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=y  I do
> see short freezes on ssh - when I cannot type anything for may be a
> second even under 100% idle. But Soft Lock up doesnt show up in dmesg
> with this configuration.)


starts to sound like something is turning off the clock during a somewhat 
deeper C-state...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-09 Thread Parag Warudkar
On Dec 8, 2007 6:12 PM, Parag Warudkar <[EMAIL PROTECTED]> wrote:
> No problems after disabling CONFIG_HIGHRES_TIMERS , CONFIG_CPU_IDLE
> and CONFIG_NO_HZ.
>
> I will try enabling them one by one - HRT, NOHZ and CPU_IDLE last -
> that way we can at least tell what is required to be hit with this
> problem.

Looks like CPU_IDLE=y is necessary for the problem to show up.
With CPU_IDLE=n HRT+NO_HZ+TICK_ONESHOT does not give soft lockup problems.
(Actually with HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=y  I do see short
freezes on ssh - when I cannot type anything for may be a second even
under 100% idle. But Soft Lock up doesnt show up in dmesg with this
configuration.)

No Soft Lockup - CPU_IDLE=n 5+ Hours uptime
---
HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=n
HIGH_RES_TIMERS=y NO_HZ=TICK_ONESHOT=n
HIGH_RES_TIMERS=NO_HZ=y TICK_ONESHOT=n
HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=y - short freezes.


Soft Lockup - CPU_IDLE=y 2 hours uptime
--
HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=y

dmesg -

BUG: soft lockup - CPU#1 stuck for 11s! [swapper:0]

Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #13)
EIP: 0060:[] EFLAGS: 0202 CPU: 1
EIP is at _spin_lock_irqsave+0x16/0x27
EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293
ESI: 0005 EDI: f7873808 EBP:  ESP: f7829f10
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
CR0: 8005003b CR2: 800f5958 CR3: 37284000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [] tick_broadcast_oneshot_control+0x10/0xda
 [] tick_notify+0x1d4/0x2eb
 [] get_next_timer_interrupt+0x143/0x1b4
 [] notifier_call_chain+0x2a/0x47
 [] raw_notifier_call_chain+0x17/0x1a
 [] clockevents_notify+0x19/0x4f
 [] acpi_idle_enter_simple+0x183/0x1d0
 [] cpuidle_idle_call+0x53/0x78
 [] cpuidle_idle_call+0x0/0x78
 [] cpu_idle+0x97/0xb8
 ===

Parag
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-09 Thread Parag Warudkar
On Dec 8, 2007 6:12 PM, Parag Warudkar [EMAIL PROTECTED] wrote:
 No problems after disabling CONFIG_HIGHRES_TIMERS , CONFIG_CPU_IDLE
 and CONFIG_NO_HZ.

 I will try enabling them one by one - HRT, NOHZ and CPU_IDLE last -
 that way we can at least tell what is required to be hit with this
 problem.

Looks like CPU_IDLE=y is necessary for the problem to show up.
With CPU_IDLE=n HRT+NO_HZ+TICK_ONESHOT does not give soft lockup problems.
(Actually with HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=y  I do see short
freezes on ssh - when I cannot type anything for may be a second even
under 100% idle. But Soft Lock up doesnt show up in dmesg with this
configuration.)

No Soft Lockup - CPU_IDLE=n 5+ Hours uptime
---
HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=n
HIGH_RES_TIMERS=y NO_HZ=TICK_ONESHOT=n
HIGH_RES_TIMERS=NO_HZ=y TICK_ONESHOT=n
HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=y - short freezes.


Soft Lockup - CPU_IDLE=y 2 hours uptime
--
HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=y

dmesg -

BUG: soft lockup - CPU#1 stuck for 11s! [swapper:0]

Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #13)
EIP: 0060:[c0603eaa] EFLAGS: 0202 CPU: 1
EIP is at _spin_lock_irqsave+0x16/0x27
EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293
ESI: 0005 EDI: f7873808 EBP:  ESP: f7829f10
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
CR0: 8005003b CR2: 800f5958 CR3: 37284000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [c0438293] tick_broadcast_oneshot_control+0x10/0xda
 [c0437ce2] tick_notify+0x1d4/0x2eb
 [c04281bc] get_next_timer_interrupt+0x143/0x1b4
 [c06058a1] notifier_call_chain+0x2a/0x47
 [c04345c0] raw_notifier_call_chain+0x17/0x1a
 [c043781e] clockevents_notify+0x19/0x4f
 [c0533d23] acpi_idle_enter_simple+0x183/0x1d0
 [c058cf03] cpuidle_idle_call+0x53/0x78
 [c058ceb0] cpuidle_idle_call+0x0/0x78
 [c0402575] cpu_idle+0x97/0xb8
 ===

Parag
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-09 Thread Arjan van de Ven
On Sun, 9 Dec 2007 16:57:38 -0500
Parag Warudkar [EMAIL PROTECTED] wrote:

 On Dec 8, 2007 6:12 PM, Parag Warudkar [EMAIL PROTECTED]
 wrote:
  No problems after disabling CONFIG_HIGHRES_TIMERS , CONFIG_CPU_IDLE
  and CONFIG_NO_HZ.
 
  I will try enabling them one by one - HRT, NOHZ and CPU_IDLE last -
  that way we can at least tell what is required to be hit with this
  problem.
 
 Looks like CPU_IDLE=y is necessary for the problem to show up.
 With CPU_IDLE=n HRT+NO_HZ+TICK_ONESHOT does not give soft lockup
 problems. (Actually with HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=y  I do
 see short freezes on ssh - when I cannot type anything for may be a
 second even under 100% idle. But Soft Lock up doesnt show up in dmesg
 with this configuration.)


starts to sound like something is turning off the clock during a somewhat 
deeper C-state...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-09 Thread Thomas Gleixner
On Sun, 9 Dec 2007, Parag Warudkar wrote:

 On Dec 8, 2007 6:12 PM, Parag Warudkar [EMAIL PROTECTED] wrote:
  No problems after disabling CONFIG_HIGHRES_TIMERS , CONFIG_CPU_IDLE
  and CONFIG_NO_HZ.
 
  I will try enabling them one by one - HRT, NOHZ and CPU_IDLE last -
  that way we can at least tell what is required to be hit with this
  problem.
 
 Looks like CPU_IDLE=y is necessary for the problem to show up.
 With CPU_IDLE=n HRT+NO_HZ+TICK_ONESHOT does not give soft lockup problems.
 (Actually with HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=y  I do see short
 freezes on ssh - when I cannot type anything for may be a second even
 under 100% idle. But Soft Lock up doesnt show up in dmesg with this
 configuration.)

Can you please apply the patch below ? It prints out the internal
state of the clockevents/timer system when the softlockup is detected.

Thanks,

tglx

diff --git a/kernel/softlockup.c b/kernel/softlockup.c
index 11df812..82f1a05 100644
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -118,6 +118,7 @@ void softlockup_tick(void)
show_regs(regs);
else
dump_stack();
+   sysrq_timer_list_show();
spin_unlock(print_lock);
 }
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-08 Thread Parag Warudkar
No problems after disabling CONFIG_HIGHRES_TIMERS , CONFIG_CPU_IDLE
and CONFIG_NO_HZ.

I will try enabling them one by one - HRT, NOHZ and CPU_IDLE last -
that way we can at least tell what is required to be hit with this
problem.

Parag
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-08 Thread Parag Warudkar
On Dec 8, 2007 3:51 PM, Arjan van de Ven <[EMAIL PROTECTED]> wrote:
> what chipset is this?
> (I wonder if there's one where we shouldn't be force-enabling the hpet)

It's an Intel Mac Mini - Intel 945GM chipset.
I believe OSX requires HPET and so there shouldn't be a need to force
enable it on this chipset?

Also I don't have problems under Centos 5.1 kernel (2.6.18-53.1.4.el5)
and hpet is enabled there.

Parag
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-08 Thread Arjan van de Ven
On Sat, 8 Dec 2007 15:46:54 -0500
"Parag Warudkar" <[EMAIL PROTECTED]> wrote:

> On Dec 8, 2007 3:11 PM, Ingo Molnar <[EMAIL PROTECTED]> wrote:
> >
> > * Parag Warudkar <[EMAIL PROTECTED]> wrote:
> >
> > > But there are still fluctuations under 100% idle -
> >
> > do you have CONFIG_HIGH_RES_TIMERS=y?
> 
> Yes - NO_HZ=y  and HIGH_RES_TIMERS=y.
> My ssh connection still died with hpet=disable although this time I
> did not see soft lockup message.
> Also it randomly gets stuck - suddenly there is a freeze and I can't
> type for seconds.
> 
> > these fluctuations would still be OK if they are due to HZ
> > granularity.
> >
> 
> So are the hpet ones ok too or do they seem off even with HRTIMERS=y?
> Just trying to figure out if I should be testing further with hpet
> disabled or enabling it is ok.
> 

what chipset is this?
(I wonder if there's one where we shouldn't be force-enabling the hpet)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-08 Thread Parag Warudkar
On Dec 8, 2007 3:11 PM, Ingo Molnar <[EMAIL PROTECTED]> wrote:
>
> * Parag Warudkar <[EMAIL PROTECTED]> wrote:
>
> > But there are still fluctuations under 100% idle -
>
> do you have CONFIG_HIGH_RES_TIMERS=y?

Yes - NO_HZ=y  and HIGH_RES_TIMERS=y.
My ssh connection still died with hpet=disable although this time I
did not see soft lockup message.
Also it randomly gets stuck - suddenly there is a freeze and I can't
type for seconds.

> these fluctuations would still be OK if they are due to HZ granularity.
>

So are the hpet ones ok too or do they seem off even with HRTIMERS=y?
Just trying to figure out if I should be testing further with hpet
disabled or enabling it is ok.

Parag
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-08 Thread Ingo Molnar

* Parag Warudkar <[EMAIL PROTECTED]> wrote:

> But there are still fluctuations under 100% idle -

do you have CONFIG_HIGH_RES_TIMERS=y?

> IDLE
> real0m1.112s
> real0m1.131s
> real0m1.112s
> real0m1.112s
> real0m1.139s

these fluctuations would still be OK if they are due to HZ granularity.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-08 Thread Parag Warudkar
On Dec 8, 2007 2:42 PM, Ingo Molnar <[EMAIL PROTECTED]> wrote:
>
> * Parag Warudkar <[EMAIL PROTECTED]> wrote:
>
> > Even on 100% idle I get variations that are approx in the same range
> > when not idle. Clocksource is hpet if that matters.  Next I think I
> > will disable CPU_IDLE, Tickless
>
> also try the hpet=disable boot option.

With hpet=disable (clocksource is acpi_pm ) the usleeps are somewhat
more closer to correct - under idle as well as under kernel
compilation load.
But there are still fluctuations under 100% idle -

Kernel compile
--
real0m1.112s
user0m0.004s
sys 0m0.000s

real0m1.112s
user0m0.000s
sys 0m0.000s

real0m1.115s
user0m0.000s
sys 0m0.000s

real0m1.112s
user0m0.000s
sys 0m0.000s

real0m1.112s
user0m0.000s
sys 0m0.000s
--
IDLE
real0m1.112s
user0m0.000s
sys 0m0.000s

real0m1.131s
user0m0.000s
sys 0m0.000s

real0m1.112s
user0m0.000s
sys 0m0.000s

real0m1.112s
user0m0.000s
sys 0m0.000s

real0m1.139s
user0m0.000s
sys 0m0.000s

>
> > My ssh connection just died - another lockup in dmesg -
> >
> > BUG: soft lockup - CPU#1 stuck for 14s! [sshd:509]
>
> this would suggest that the softlockup watchdog is correct and indeed
> there was some lockup of sorts?

Yep, I was about to say that there are indeed real soft-lockups going
on as opposed to the psuedo ones that we thought may be due to timer
inconsistencies.

BTW this is no-preempt config and 2.6.22  and latest Centos-5.1
kernels ran fine with HPET.

Parag
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-08 Thread Ingo Molnar

* Parag Warudkar <[EMAIL PROTECTED]> wrote:

> Even on 100% idle I get variations that are approx in the same range 
> when not idle. Clocksource is hpet if that matters.  Next I think I 
> will disable CPU_IDLE, Tickless

also try the hpet=disable boot option.

> My ssh connection just died - another lockup in dmesg -
> 
> BUG: soft lockup - CPU#1 stuck for 14s! [sshd:509]

this would suggest that the softlockup watchdog is correct and indeed 
there was some lockup of sorts?

> $> while :; do time usleep 111; done
> real0m1.112s
> real0m1.385s
> real0m1.112s
> real0m1.112s
> real0m1.279s

this, if done on an idle system, looks anomalous to me. Does 
hpet=disable help?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-08 Thread Parag Warudkar
On Dec 8, 2007 2:13 PM, Ingo Molnar <[EMAIL PROTECTED]> wrote:
>
> * Parag Warudkar <[EMAIL PROTECTED]> wrote:
>
> > >while :; do time usleep 111; done
> > >
> > > or do these sleeps fluctuate?
> >
> > They seem to fluctuate - not sure if that's supposed to be exact or if
> > below variations are normal - This is when my compiles are running -
>
> it's normal for them to fluctuate a little when load goes up and the
> task gets delayed.
>
> 'bad' fluctuations are ones that happen when the system is idle.
>

Even on 100% idle I get variations that are approx in the same range
when not idle.
Clocksource is hpet if that matters.  Next I think I will disable
CPU_IDLE, Tickless

My ssh connection just died - another lockup in dmesg -

BUG: soft lockup - CPU#1 stuck for 14s! [sshd:509]

Pid: 509, comm: sshd Not tainted (2.6.24-rc4 #6)
EIP: 0060:[] EFLAGS: 00200206 CPU: 1
EIP is at do_page_fault+0x117/0x583
EAX: ef5aaa80 EBX: 0048cfc0 ECX: 007b EDX: 0003
ESI: ef691fb8 EDI: b7ca1ab4 EBP: ef47f050 ESP: ef691f70
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
CR0: 8005003b CR2: b7ca1ab4 CR3: 2f555000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [] do_page_fault+0x0/0x583
 [] error_code+0x72/0x78
 [] netlbl_domhsh_walk+0x94/0xba


$> while :; do time usleep 111; done
real0m1.112s
user0m0.000s
sys 0m0.000s

real0m1.385s
user0m0.000s
sys 0m0.004s

real0m1.112s
user0m0.000s
sys 0m0.000s

real0m1.112s
user0m0.000s
sys 0m0.000s

real0m1.279s
user0m0.000s
sys 0m0.000s
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-08 Thread Ingo Molnar

* Parag Warudkar <[EMAIL PROTECTED]> wrote:

> >while :; do time usleep 111; done
> >
> > or do these sleeps fluctuate?
> 
> They seem to fluctuate - not sure if that's supposed to be exact or if 
> below variations are normal - This is when my compiles are running -

it's normal for them to fluctuate a little when load goes up and the 
task gets delayed.

'bad' fluctuations are ones that happen when the system is idle.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-08 Thread Parag Warudkar
On Dec 8, 2007 10:47 AM, Ingo Molnar <[EMAIL PROTECTED]> wrote:
>
> does the patch below help? But the root cause is likely some timer
> problems - do you get consistent results from:
>

Haven't yet tried the patch - will try a little later.

>while :; do time usleep 111; done
>
> or do these sleeps fluctuate?
>

They seem to fluctuate - not sure if that's supposed to be exact or if
below variations are normal -
This is when my compiles are running -

[EMAIL PROTECTED] ~]$ while :; do time usleep 111; done

real0m1.116s
user0m0.000s
sys 0m0.000s

real0m1.112s
user0m0.000s
sys 0m0.000s

real0m1.130s
user0m0.000s
sys 0m0.004s

real0m1.144s
user0m0.000s
sys 0m0.000s


Thanks
Parag
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-08 Thread Ingo Molnar

* Parag Warudkar <[EMAIL PROTECTED]> wrote:

>  [] tick_broadcast_oneshot_control+0x10/0xda
>  [] tick_notify+0x1d4/0x2eb
>  [] get_next_timer_interrupt+0x143/0x1b4
>  [] notifier_call_chain+0x2a/0x47
>  [] raw_notifier_call_chain+0x17/0x1a
>  [] clockevents_notify+0x19/0x4f
>  [] acpi_idle_enter_simple+0x183/0x1d0
>  [] cpuidle_idle_call+0x53/0x78
>  [] cpuidle_idle_call+0x0/0x78
>  [] cpu_idle+0x97/0xb8
>  ===
> BUG: soft lockup - CPU#1 stuck for 11s! [vim:3736]

does the patch below help? But the root cause is likely some timer 
problems - do you get consistent results from:

   while :; do time usleep 111; done

or do these sleeps fluctuate?

Ingo

Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -774,6 +774,7 @@ void sched_clock_idle_wakeup_event(u64 d
struct rq *rq = cpu_rq(smp_processor_id());
u64 now = sched_clock();
 
+   touch_softlockup_watchdog();
rq->idle_clock += delta_ns;
/*
 * Override the previous timestamp and ignore all
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-08 Thread Parag Warudkar
On Dec 8, 2007 10:10 AM, Parag Warudkar <[EMAIL PROTECTED]> wrote:
> On Dec 7, 2007 9:56 PM, Thomas Gleixner <[EMAIL PROTECTED]> wrote:
> >
> > This looks pretty much like the problem I was solving yesterday.
> >
> > Parag, can you please try Linus latest and please check whether there
> > is a stack trace with clockevents_program_event on the top in your
> > dmesg.
> >
>
> Just booted with latest git and there is no  clockevents_program_event
> in dmesg anywhere.
>
> Any way I have enabled CPU_IDLE again and the machine's got to do lot
> of compiles today.
> I'll see if the problem reproduces.
>

Whoa whoa whoa - this happens left and right with latest Linus +CPU_IDLE.

As soon as I start compiling, dmesg starts filling up with -

Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #5)
EIP: 0060:[] EFLAGS: 0202 CPU: 1
EIP is at acpi_idle_enter_simple+0x166/0x1d0
EAX: f7829f88 EBX: 0da1 ECX: 0266 EDX: 
ESI:  EDI: 00042070 EBP: 000412cf ESP: f7829f88
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
CR0: 8005003b CR2: 080ce178 CR3: 00715000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [] cpuidle_idle_call+0x53/0x78
 [] cpuidle_idle_call+0x0/0x78
 [] cpu_idle+0x97/0xb8
 ===
BUG: soft lockup - CPU#1 stuck for 11s! [swapper:0]

Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #5)
EIP: 0060:[] EFLAGS: 0202 CPU: 1
EIP is at _spin_lock_irqsave+0x16/0x27
EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293
ESI: 0005 EDI: f7873808 EBP:  ESP: f7829f10
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
CR0: 8005003b CR2: 080ff4b0 CR3: 37cf5000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [] tick_broadcast_oneshot_control+0x10/0xda
 [] tick_notify+0x1d4/0x2eb
 [] get_next_timer_interrupt+0x143/0x1b4
 [] notifier_call_chain+0x2a/0x47
 [] raw_notifier_call_chain+0x17/0x1a
 [] clockevents_notify+0x19/0x4f
 [] acpi_idle_enter_simple+0x183/0x1d0
 [] cpuidle_idle_call+0x53/0x78
 [] cpuidle_idle_call+0x0/0x78
 [] cpu_idle+0x97/0xb8
 ===
BUG: soft lockup - CPU#1 stuck for 11s! [vim:3736]

Pid: 3736, comm: vim Not tainted (2.6.24-rc4 #5)
EIP: 0060:[] EFLAGS: 0206 CPU: 1
EIP is at do_page_fault+0x117/0x583
EAX: f6ec1380 EBX: 005d0ff4 ECX: 007b EDX: 0003
ESI: f6dd1fb8 EDI: 004c2000 EBP: f149eae0 ESP: f6dd1f70
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 8005003b CR2: 004c2000 CR3: 3736a000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [] vfs_read+0x105/0x115
 [] do_page_fault+0x0/0x583
 [] error_code+0x72/0x78
 [] netlbl_domhsh_walk+0x94/0xba
 ===
BUG: soft lockup - CPU#1 stuck for 11s! [swapper:0]

Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #5)
EIP: 0060:[] EFLAGS: 0202 CPU: 1
EIP is at acpi_idle_enter_simple+0x166/0x1d0
EAX: f7829f88 EBX: 000a ECX: 0266 EDX: 
ESI:  EDI: 008e8c88 EBP: 008e8c7e ESP: f7829f88
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
CR0: 8005003b CR2: 005da044 CR3: 37cf5000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [] cpuidle_idle_call+0x53/0x78
 [] cpuidle_idle_call+0x0/0x78
 [] cpu_idle+0x97/0xb8
 ===
BUG: soft lockup - CPU#1 stuck for 22s! [swapper:0]

Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #5)
EIP: 0060:[] EFLAGS: 0206 CPU: 1
EIP is at acpi_idle_enter_simple+0x166/0x1d0
EAX: f7829f88 EBX: 0d9b ECX: 0266 EDX: 
ESI:  EDI: 007d1802 EBP: 007d0a67 ESP: f7829f88
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
CR0: 8005003b CR2: 005da044 CR3: 37cf5000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [] cpuidle_idle_call+0x53/0x78
 [] cpuidle_idle_call+0x0/0x78
 [] cpu_idle+0x97/0xb8
 ===
BUG: soft lockup - CPU#1 stuck for 18s! [swapper:0]

Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #5)
EIP: 0060:[] EFLAGS: 0206 CPU: 1
EIP is at acpi_idle_enter_simple+0x166/0x1d0
EAX: f7829f88 EBX: 0da2 ECX: 0266 EDX: 
ESI:  EDI: 00b651f2 EBP: 00b64450 ESP: f7829f88
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
CR0: 8005003b CR2: 005da044 CR3: 37cf5000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [] cpuidle_idle_call+0x53/0x78
 [] cpuidle_idle_call+0x0/0x78
 [] cpu_idle+0x97/0xb8
 ===
BUG: soft lockup - CPU#1 stuck for 14s! [swapper:0]

Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #5)
EIP: 0060:[] EFLAGS: 0202 CPU: 1
EIP is at _spin_lock_irqsave+0x16/0x27
EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293
ESI: 0005 EDI: f7873808 EBP:  ESP: f7829f10
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
CR0: 8005003b CR2: 080ff4b0 CR3: 37cf5000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 

Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-08 Thread Parag Warudkar
On Dec 7, 2007 9:56 PM, Thomas Gleixner <[EMAIL PROTECTED]> wrote:
>
> This looks pretty much like the problem I was solving yesterday.
>
> Parag, can you please try Linus latest and please check whether there
> is a stack trace with clockevents_program_event on the top in your
> dmesg.
>

Just booted with latest git and there is no  clockevents_program_event
in dmesg anywhere.

Any way I have enabled CPU_IDLE again and the machine's got to do lot
of compiles today.
I'll see if the problem reproduces.

Thanks

Parag
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-08 Thread Parag Warudkar
On Dec 8, 2007 10:47 AM, Ingo Molnar [EMAIL PROTECTED] wrote:

 does the patch below help? But the root cause is likely some timer
 problems - do you get consistent results from:


Haven't yet tried the patch - will try a little later.

while :; do time usleep 111; done

 or do these sleeps fluctuate?


They seem to fluctuate - not sure if that's supposed to be exact or if
below variations are normal -
This is when my compiles are running -

[EMAIL PROTECTED] ~]$ while :; do time usleep 111; done

real0m1.116s
user0m0.000s
sys 0m0.000s

real0m1.112s
user0m0.000s
sys 0m0.000s

real0m1.130s
user0m0.000s
sys 0m0.004s

real0m1.144s
user0m0.000s
sys 0m0.000s


Thanks
Parag
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-08 Thread Ingo Molnar

* Parag Warudkar [EMAIL PROTECTED] wrote:

  [c0438293] tick_broadcast_oneshot_control+0x10/0xda
  [c0437ce2] tick_notify+0x1d4/0x2eb
  [c04281bc] get_next_timer_interrupt+0x143/0x1b4
  [c06058a1] notifier_call_chain+0x2a/0x47
  [c04345c0] raw_notifier_call_chain+0x17/0x1a
  [c043781e] clockevents_notify+0x19/0x4f
  [c0533d23] acpi_idle_enter_simple+0x183/0x1d0
  [c058cf03] cpuidle_idle_call+0x53/0x78
  [c058ceb0] cpuidle_idle_call+0x0/0x78
  [c0402575] cpu_idle+0x97/0xb8
  ===
 BUG: soft lockup - CPU#1 stuck for 11s! [vim:3736]

does the patch below help? But the root cause is likely some timer 
problems - do you get consistent results from:

   while :; do time usleep 111; done

or do these sleeps fluctuate?

Ingo

Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -774,6 +774,7 @@ void sched_clock_idle_wakeup_event(u64 d
struct rq *rq = cpu_rq(smp_processor_id());
u64 now = sched_clock();
 
+   touch_softlockup_watchdog();
rq-idle_clock += delta_ns;
/*
 * Override the previous timestamp and ignore all
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-08 Thread Parag Warudkar
On Dec 7, 2007 9:56 PM, Thomas Gleixner [EMAIL PROTECTED] wrote:

 This looks pretty much like the problem I was solving yesterday.

 Parag, can you please try Linus latest and please check whether there
 is a stack trace with clockevents_program_event on the top in your
 dmesg.


Just booted with latest git and there is no  clockevents_program_event
in dmesg anywhere.

Any way I have enabled CPU_IDLE again and the machine's got to do lot
of compiles today.
I'll see if the problem reproduces.

Thanks

Parag
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-08 Thread Parag Warudkar
On Dec 8, 2007 10:10 AM, Parag Warudkar [EMAIL PROTECTED] wrote:
 On Dec 7, 2007 9:56 PM, Thomas Gleixner [EMAIL PROTECTED] wrote:
 
  This looks pretty much like the problem I was solving yesterday.
 
  Parag, can you please try Linus latest and please check whether there
  is a stack trace with clockevents_program_event on the top in your
  dmesg.
 

 Just booted with latest git and there is no  clockevents_program_event
 in dmesg anywhere.

 Any way I have enabled CPU_IDLE again and the machine's got to do lot
 of compiles today.
 I'll see if the problem reproduces.


Whoa whoa whoa - this happens left and right with latest Linus +CPU_IDLE.

As soon as I start compiling, dmesg starts filling up with -

Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #5)
EIP: 0060:[c0533d06] EFLAGS: 0202 CPU: 1
EIP is at acpi_idle_enter_simple+0x166/0x1d0
EAX: f7829f88 EBX: 0da1 ECX: 0266 EDX: 
ESI:  EDI: 00042070 EBP: 000412cf ESP: f7829f88
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
CR0: 8005003b CR2: 080ce178 CR3: 00715000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [c058cf03] cpuidle_idle_call+0x53/0x78
 [c058ceb0] cpuidle_idle_call+0x0/0x78
 [c0402575] cpu_idle+0x97/0xb8
 ===
BUG: soft lockup - CPU#1 stuck for 11s! [swapper:0]

Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #5)
EIP: 0060:[c0603eaa] EFLAGS: 0202 CPU: 1
EIP is at _spin_lock_irqsave+0x16/0x27
EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293
ESI: 0005 EDI: f7873808 EBP:  ESP: f7829f10
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
CR0: 8005003b CR2: 080ff4b0 CR3: 37cf5000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [c0438293] tick_broadcast_oneshot_control+0x10/0xda
 [c0437ce2] tick_notify+0x1d4/0x2eb
 [c04281bc] get_next_timer_interrupt+0x143/0x1b4
 [c06058a1] notifier_call_chain+0x2a/0x47
 [c04345c0] raw_notifier_call_chain+0x17/0x1a
 [c043781e] clockevents_notify+0x19/0x4f
 [c0533d23] acpi_idle_enter_simple+0x183/0x1d0
 [c058cf03] cpuidle_idle_call+0x53/0x78
 [c058ceb0] cpuidle_idle_call+0x0/0x78
 [c0402575] cpu_idle+0x97/0xb8
 ===
BUG: soft lockup - CPU#1 stuck for 11s! [vim:3736]

Pid: 3736, comm: vim Not tainted (2.6.24-rc4 #5)
EIP: 0060:[c060540b] EFLAGS: 0206 CPU: 1
EIP is at do_page_fault+0x117/0x583
EAX: f6ec1380 EBX: 005d0ff4 ECX: 007b EDX: 0003
ESI: f6dd1fb8 EDI: 004c2000 EBP: f149eae0 ESP: f6dd1f70
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 8005003b CR2: 004c2000 CR3: 3736a000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [c046d80d] vfs_read+0x105/0x115
 [c06052f4] do_page_fault+0x0/0x583
 [c06040aa] error_code+0x72/0x78
 [c060] netlbl_domhsh_walk+0x94/0xba
 ===
BUG: soft lockup - CPU#1 stuck for 11s! [swapper:0]

Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #5)
EIP: 0060:[c0533d06] EFLAGS: 0202 CPU: 1
EIP is at acpi_idle_enter_simple+0x166/0x1d0
EAX: f7829f88 EBX: 000a ECX: 0266 EDX: 
ESI:  EDI: 008e8c88 EBP: 008e8c7e ESP: f7829f88
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
CR0: 8005003b CR2: 005da044 CR3: 37cf5000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [c058cf03] cpuidle_idle_call+0x53/0x78
 [c058ceb0] cpuidle_idle_call+0x0/0x78
 [c0402575] cpu_idle+0x97/0xb8
 ===
BUG: soft lockup - CPU#1 stuck for 22s! [swapper:0]

Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #5)
EIP: 0060:[c0533d06] EFLAGS: 0206 CPU: 1
EIP is at acpi_idle_enter_simple+0x166/0x1d0
EAX: f7829f88 EBX: 0d9b ECX: 0266 EDX: 
ESI:  EDI: 007d1802 EBP: 007d0a67 ESP: f7829f88
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
CR0: 8005003b CR2: 005da044 CR3: 37cf5000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [c058cf03] cpuidle_idle_call+0x53/0x78
 [c058ceb0] cpuidle_idle_call+0x0/0x78
 [c0402575] cpu_idle+0x97/0xb8
 ===
BUG: soft lockup - CPU#1 stuck for 18s! [swapper:0]

Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #5)
EIP: 0060:[c0533d06] EFLAGS: 0206 CPU: 1
EIP is at acpi_idle_enter_simple+0x166/0x1d0
EAX: f7829f88 EBX: 0da2 ECX: 0266 EDX: 
ESI:  EDI: 00b651f2 EBP: 00b64450 ESP: f7829f88
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
CR0: 8005003b CR2: 005da044 CR3: 37cf5000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [c058cf03] cpuidle_idle_call+0x53/0x78
 [c058ceb0] cpuidle_idle_call+0x0/0x78
 [c0402575] cpu_idle+0x97/0xb8
 ===
BUG: soft lockup - CPU#1 stuck for 14s! [swapper:0]

Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #5)
EIP: 0060:[c0603eaa] EFLAGS: 0202 CPU: 1
EIP is at _spin_lock_irqsave+0x16/0x27
EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293
ESI: 

Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-08 Thread Ingo Molnar

* Parag Warudkar [EMAIL PROTECTED] wrote:

 while :; do time usleep 111; done
 
  or do these sleeps fluctuate?
 
 They seem to fluctuate - not sure if that's supposed to be exact or if 
 below variations are normal - This is when my compiles are running -

it's normal for them to fluctuate a little when load goes up and the 
task gets delayed.

'bad' fluctuations are ones that happen when the system is idle.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-08 Thread Parag Warudkar
On Dec 8, 2007 2:13 PM, Ingo Molnar [EMAIL PROTECTED] wrote:

 * Parag Warudkar [EMAIL PROTECTED] wrote:

  while :; do time usleep 111; done
  
   or do these sleeps fluctuate?
 
  They seem to fluctuate - not sure if that's supposed to be exact or if
  below variations are normal - This is when my compiles are running -

 it's normal for them to fluctuate a little when load goes up and the
 task gets delayed.

 'bad' fluctuations are ones that happen when the system is idle.


Even on 100% idle I get variations that are approx in the same range
when not idle.
Clocksource is hpet if that matters.  Next I think I will disable
CPU_IDLE, Tickless

My ssh connection just died - another lockup in dmesg -

BUG: soft lockup - CPU#1 stuck for 14s! [sshd:509]

Pid: 509, comm: sshd Not tainted (2.6.24-rc4 #6)
EIP: 0060:[c060540b] EFLAGS: 00200206 CPU: 1
EIP is at do_page_fault+0x117/0x583
EAX: ef5aaa80 EBX: 0048cfc0 ECX: 007b EDX: 0003
ESI: ef691fb8 EDI: b7ca1ab4 EBP: ef47f050 ESP: ef691f70
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
CR0: 8005003b CR2: b7ca1ab4 CR3: 2f555000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [c06052f4] do_page_fault+0x0/0x583
 [c06040aa] error_code+0x72/0x78
 [c060] netlbl_domhsh_walk+0x94/0xba


$ while :; do time usleep 111; done
real0m1.112s
user0m0.000s
sys 0m0.000s

real0m1.385s
user0m0.000s
sys 0m0.004s

real0m1.112s
user0m0.000s
sys 0m0.000s

real0m1.112s
user0m0.000s
sys 0m0.000s

real0m1.279s
user0m0.000s
sys 0m0.000s
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-08 Thread Ingo Molnar

* Parag Warudkar [EMAIL PROTECTED] wrote:

 Even on 100% idle I get variations that are approx in the same range 
 when not idle. Clocksource is hpet if that matters.  Next I think I 
 will disable CPU_IDLE, Tickless

also try the hpet=disable boot option.

 My ssh connection just died - another lockup in dmesg -
 
 BUG: soft lockup - CPU#1 stuck for 14s! [sshd:509]

this would suggest that the softlockup watchdog is correct and indeed 
there was some lockup of sorts?

 $ while :; do time usleep 111; done
 real0m1.112s
 real0m1.385s
 real0m1.112s
 real0m1.112s
 real0m1.279s

this, if done on an idle system, looks anomalous to me. Does 
hpet=disable help?

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-08 Thread Parag Warudkar
On Dec 8, 2007 2:42 PM, Ingo Molnar [EMAIL PROTECTED] wrote:

 * Parag Warudkar [EMAIL PROTECTED] wrote:

  Even on 100% idle I get variations that are approx in the same range
  when not idle. Clocksource is hpet if that matters.  Next I think I
  will disable CPU_IDLE, Tickless

 also try the hpet=disable boot option.

With hpet=disable (clocksource is acpi_pm ) the usleeps are somewhat
more closer to correct - under idle as well as under kernel
compilation load.
But there are still fluctuations under 100% idle -

Kernel compile
--
real0m1.112s
user0m0.004s
sys 0m0.000s

real0m1.112s
user0m0.000s
sys 0m0.000s

real0m1.115s
user0m0.000s
sys 0m0.000s

real0m1.112s
user0m0.000s
sys 0m0.000s

real0m1.112s
user0m0.000s
sys 0m0.000s
--
IDLE
real0m1.112s
user0m0.000s
sys 0m0.000s

real0m1.131s
user0m0.000s
sys 0m0.000s

real0m1.112s
user0m0.000s
sys 0m0.000s

real0m1.112s
user0m0.000s
sys 0m0.000s

real0m1.139s
user0m0.000s
sys 0m0.000s


  My ssh connection just died - another lockup in dmesg -
 
  BUG: soft lockup - CPU#1 stuck for 14s! [sshd:509]

 this would suggest that the softlockup watchdog is correct and indeed
 there was some lockup of sorts?

Yep, I was about to say that there are indeed real soft-lockups going
on as opposed to the psuedo ones that we thought may be due to timer
inconsistencies.

BTW this is no-preempt config and 2.6.22  and latest Centos-5.1
kernels ran fine with HPET.

Parag
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-08 Thread Ingo Molnar

* Parag Warudkar [EMAIL PROTECTED] wrote:

 But there are still fluctuations under 100% idle -

do you have CONFIG_HIGH_RES_TIMERS=y?

 IDLE
 real0m1.112s
 real0m1.131s
 real0m1.112s
 real0m1.112s
 real0m1.139s

these fluctuations would still be OK if they are due to HZ granularity.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-08 Thread Parag Warudkar
On Dec 8, 2007 3:11 PM, Ingo Molnar [EMAIL PROTECTED] wrote:

 * Parag Warudkar [EMAIL PROTECTED] wrote:

  But there are still fluctuations under 100% idle -

 do you have CONFIG_HIGH_RES_TIMERS=y?

Yes - NO_HZ=y  and HIGH_RES_TIMERS=y.
My ssh connection still died with hpet=disable although this time I
did not see soft lockup message.
Also it randomly gets stuck - suddenly there is a freeze and I can't
type for seconds.

 these fluctuations would still be OK if they are due to HZ granularity.


So are the hpet ones ok too or do they seem off even with HRTIMERS=y?
Just trying to figure out if I should be testing further with hpet
disabled or enabling it is ok.

Parag
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-08 Thread Arjan van de Ven
On Sat, 8 Dec 2007 15:46:54 -0500
Parag Warudkar [EMAIL PROTECTED] wrote:

 On Dec 8, 2007 3:11 PM, Ingo Molnar [EMAIL PROTECTED] wrote:
 
  * Parag Warudkar [EMAIL PROTECTED] wrote:
 
   But there are still fluctuations under 100% idle -
 
  do you have CONFIG_HIGH_RES_TIMERS=y?
 
 Yes - NO_HZ=y  and HIGH_RES_TIMERS=y.
 My ssh connection still died with hpet=disable although this time I
 did not see soft lockup message.
 Also it randomly gets stuck - suddenly there is a freeze and I can't
 type for seconds.
 
  these fluctuations would still be OK if they are due to HZ
  granularity.
 
 
 So are the hpet ones ok too or do they seem off even with HRTIMERS=y?
 Just trying to figure out if I should be testing further with hpet
 disabled or enabling it is ok.
 

what chipset is this?
(I wonder if there's one where we shouldn't be force-enabling the hpet)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-08 Thread Parag Warudkar
On Dec 8, 2007 3:51 PM, Arjan van de Ven [EMAIL PROTECTED] wrote:
 what chipset is this?
 (I wonder if there's one where we shouldn't be force-enabling the hpet)

It's an Intel Mac Mini - Intel 945GM chipset.
I believe OSX requires HPET and so there shouldn't be a need to force
enable it on this chipset?

Also I don't have problems under Centos 5.1 kernel (2.6.18-53.1.4.el5)
and hpet is enabled there.

Parag
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-08 Thread Parag Warudkar
No problems after disabling CONFIG_HIGHRES_TIMERS , CONFIG_CPU_IDLE
and CONFIG_NO_HZ.

I will try enabling them one by one - HRT, NOHZ and CPU_IDLE last -
that way we can at least tell what is required to be hit with this
problem.

Parag
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-07 Thread Thomas Gleixner
On Fri, 7 Dec 2007, Pallipadi, Venkatesh wrote:
> >-Original Message-
> >From: Parag Warudkar [mailto:[EMAIL PROTECTED] 
> >Sent: Friday, December 07, 2007 2:54 PM
> >To: LKML
> >Cc: Andrew Morton; Pallipadi, Venkatesh; Linus Torvalds
> >Subject: BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0]
> >
> >Got this on today's git (2.6.24-rc4) while compiling  stuff - Looks
> >like it is related to CpuIdle stuff.
> >I chose CONFIG_CPU_IDLE for the first time so I don't know when this
> >was introduced.
> >
> >This is on x86_32, SMP.
> >
> >BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0]
> >
> >Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #3)
> >EIP: 0060:[] EFLAGS: 0202 CPU: 1
> >EIP is at _spin_lock_irqsave+0x16/0x27
> >EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293
> >ESI: 0005 EDI: f7873808 EBP:  ESP: f7829f10
> > DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
> >CR0: 8005003b CR2: 004f5960 CR3: 372c5000 CR4: 06d0
> >DR0:  DR1:  DR2:  DR3: 
> >DR6: 0ff0 DR7: 0400
> > [] tick_broadcast_oneshot_control+0x10/0xda
> > [] tick_notify+0x1d4/0x2eb
> > [] get_next_timer_interrupt+0x143/0x1b4
> > [] notifier_call_chain+0x2a/0x47
> > [] raw_notifier_call_chain+0x17/0x1a
> > [] clockevents_notify+0x19/0x4f
> > [] acpi_idle_enter_simple+0x183/0x1d0
> > [] cpuidle_idle_call+0x53/0x78
> > [] cpuidle_idle_call+0x0/0x78
> > [] cpu_idle+0x97/0xb8
> >
> 
> Looks like tick_broadcast_lock did not get freed in some path.
> You do not see this when you CPU_IDLE is not configured? 

This looks pretty much like the problem I was solving yesterday.

Parag, can you please try Linus latest and please check whether there
is a stack trace with clockevents_program_event on the top in your
dmesg.

Thanks,
tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-07 Thread Rafael J. Wysocki
On Saturday, 8 of December 2007, Andrew Morton wrote:
> On Fri, 7 Dec 2007 17:53:47 -0500
> "Parag Warudkar" <[EMAIL PROTECTED]> wrote:
> 
> > Got this on today's git (2.6.24-rc4) while compiling  stuff - Looks
> > like it is related to CpuIdle stuff.
> > I chose CONFIG_CPU_IDLE for the first time so I don't know when this
> > was introduced.
> > 
> > This is on x86_32, SMP.
> > 
> > BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0]
> > 
> > Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #3)
> > EIP: 0060:[] EFLAGS: 0202 CPU: 1
> > EIP is at _spin_lock_irqsave+0x16/0x27
> > EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293
> > ESI: 0005 EDI: f7873808 EBP:  ESP: f7829f10
> >  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
> > CR0: 8005003b CR2: 004f5960 CR3: 372c5000 CR4: 06d0
> > DR0:  DR1:  DR2:  DR3: 
> > DR6: 0ff0 DR7: 0400
> >  [] tick_broadcast_oneshot_control+0x10/0xda
> >  [] tick_notify+0x1d4/0x2eb
> >  [] get_next_timer_interrupt+0x143/0x1b4
> >  [] notifier_call_chain+0x2a/0x47
> >  [] raw_notifier_call_chain+0x17/0x1a
> >  [] clockevents_notify+0x19/0x4f
> >  [] acpi_idle_enter_simple+0x183/0x1d0
> >  [] cpuidle_idle_call+0x53/0x78
> >  [] cpuidle_idle_call+0x0/0x78
> >  [] cpu_idle+0x97/0xb8
> 
> OK, thanks.  Another one for the regression list, please.

Added, http://bugzilla.kernel.org/show_bug.cgi?id=9525, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-07 Thread Parag Warudkar
On Dec 7, 2007 6:12 PM, Pallipadi, Venkatesh
<[EMAIL PROTECTED]> wrote:

> Looks like tick_broadcast_lock did not get freed in some path.
> You do not see this when you CPU_IDLE is not configured?
>
> Thanks,
> Venki
>

No, I did not see this prior to enabling CPU_IDLE.

All previous kernels without CPU_IDLE also had soft lockup detection enabled.

Parag
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-07 Thread Andrew Morton
On Fri, 7 Dec 2007 17:53:47 -0500
"Parag Warudkar" <[EMAIL PROTECTED]> wrote:

> Got this on today's git (2.6.24-rc4) while compiling  stuff - Looks
> like it is related to CpuIdle stuff.
> I chose CONFIG_CPU_IDLE for the first time so I don't know when this
> was introduced.
> 
> This is on x86_32, SMP.
> 
> BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0]
> 
> Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #3)
> EIP: 0060:[] EFLAGS: 0202 CPU: 1
> EIP is at _spin_lock_irqsave+0x16/0x27
> EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293
> ESI: 0005 EDI: f7873808 EBP:  ESP: f7829f10
>  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
> CR0: 8005003b CR2: 004f5960 CR3: 372c5000 CR4: 06d0
> DR0:  DR1:  DR2:  DR3: 
> DR6: 0ff0 DR7: 0400
>  [] tick_broadcast_oneshot_control+0x10/0xda
>  [] tick_notify+0x1d4/0x2eb
>  [] get_next_timer_interrupt+0x143/0x1b4
>  [] notifier_call_chain+0x2a/0x47
>  [] raw_notifier_call_chain+0x17/0x1a
>  [] clockevents_notify+0x19/0x4f
>  [] acpi_idle_enter_simple+0x183/0x1d0
>  [] cpuidle_idle_call+0x53/0x78
>  [] cpuidle_idle_call+0x0/0x78
>  [] cpu_idle+0x97/0xb8

OK, thanks.  Another one for the regression list, please.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-07 Thread Pallipadi, Venkatesh
 

>-Original Message-
>From: Parag Warudkar [mailto:[EMAIL PROTECTED] 
>Sent: Friday, December 07, 2007 2:54 PM
>To: LKML
>Cc: Andrew Morton; Pallipadi, Venkatesh; Linus Torvalds
>Subject: BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0]
>
>Got this on today's git (2.6.24-rc4) while compiling  stuff - Looks
>like it is related to CpuIdle stuff.
>I chose CONFIG_CPU_IDLE for the first time so I don't know when this
>was introduced.
>
>This is on x86_32, SMP.
>
>BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0]
>
>Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #3)
>EIP: 0060:[] EFLAGS: 0202 CPU: 1
>EIP is at _spin_lock_irqsave+0x16/0x27
>EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293
>ESI: 0005 EDI: f7873808 EBP:  ESP: f7829f10
> DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
>CR0: 8005003b CR2: 004f5960 CR3: 372c5000 CR4: 06d0
>DR0:  DR1:  DR2:  DR3: 
>DR6: 0ff0 DR7: 0400
> [] tick_broadcast_oneshot_control+0x10/0xda
> [] tick_notify+0x1d4/0x2eb
> [] get_next_timer_interrupt+0x143/0x1b4
> [] notifier_call_chain+0x2a/0x47
> [] raw_notifier_call_chain+0x17/0x1a
> [] clockevents_notify+0x19/0x4f
> [] acpi_idle_enter_simple+0x183/0x1d0
> [] cpuidle_idle_call+0x53/0x78
> [] cpuidle_idle_call+0x0/0x78
> [] cpu_idle+0x97/0xb8
>

Looks like tick_broadcast_lock did not get freed in some path.
You do not see this when you CPU_IDLE is not configured? 

Thanks,
Venki
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-07 Thread Parag Warudkar
Got this on today's git (2.6.24-rc4) while compiling  stuff - Looks
like it is related to CpuIdle stuff.
I chose CONFIG_CPU_IDLE for the first time so I don't know when this
was introduced.

This is on x86_32, SMP.

BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0]

Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #3)
EIP: 0060:[] EFLAGS: 0202 CPU: 1
EIP is at _spin_lock_irqsave+0x16/0x27
EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293
ESI: 0005 EDI: f7873808 EBP:  ESP: f7829f10
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
CR0: 8005003b CR2: 004f5960 CR3: 372c5000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [] tick_broadcast_oneshot_control+0x10/0xda
 [] tick_notify+0x1d4/0x2eb
 [] get_next_timer_interrupt+0x143/0x1b4
 [] notifier_call_chain+0x2a/0x47
 [] raw_notifier_call_chain+0x17/0x1a
 [] clockevents_notify+0x19/0x4f
 [] acpi_idle_enter_simple+0x183/0x1d0
 [] cpuidle_idle_call+0x53/0x78
 [] cpuidle_idle_call+0x0/0x78
 [] cpu_idle+0x97/0xb8
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-07 Thread Andrew Morton
On Fri, 7 Dec 2007 17:53:47 -0500
Parag Warudkar [EMAIL PROTECTED] wrote:

 Got this on today's git (2.6.24-rc4) while compiling  stuff - Looks
 like it is related to CpuIdle stuff.
 I chose CONFIG_CPU_IDLE for the first time so I don't know when this
 was introduced.
 
 This is on x86_32, SMP.
 
 BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0]
 
 Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #3)
 EIP: 0060:[c0603e22] EFLAGS: 0202 CPU: 1
 EIP is at _spin_lock_irqsave+0x16/0x27
 EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293
 ESI: 0005 EDI: f7873808 EBP:  ESP: f7829f10
  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
 CR0: 8005003b CR2: 004f5960 CR3: 372c5000 CR4: 06d0
 DR0:  DR1:  DR2:  DR3: 
 DR6: 0ff0 DR7: 0400
  [c0438233] tick_broadcast_oneshot_control+0x10/0xda
  [c0437c82] tick_notify+0x1d4/0x2eb
  [c04281b4] get_next_timer_interrupt+0x143/0x1b4
  [c0605819] notifier_call_chain+0x2a/0x47
  [c04345a0] raw_notifier_call_chain+0x17/0x1a
  [c04378b7] clockevents_notify+0x19/0x4f
  [c0533cc3] acpi_idle_enter_simple+0x183/0x1d0
  [c058cea3] cpuidle_idle_call+0x53/0x78
  [c058ce50] cpuidle_idle_call+0x0/0x78
  [c0402575] cpu_idle+0x97/0xb8

OK, thanks.  Another one for the regression list, please.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-07 Thread Pallipadi, Venkatesh
 

-Original Message-
From: Parag Warudkar [mailto:[EMAIL PROTECTED] 
Sent: Friday, December 07, 2007 2:54 PM
To: LKML
Cc: Andrew Morton; Pallipadi, Venkatesh; Linus Torvalds
Subject: BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0]

Got this on today's git (2.6.24-rc4) while compiling  stuff - Looks
like it is related to CpuIdle stuff.
I chose CONFIG_CPU_IDLE for the first time so I don't know when this
was introduced.

This is on x86_32, SMP.

BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0]

Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #3)
EIP: 0060:[c0603e22] EFLAGS: 0202 CPU: 1
EIP is at _spin_lock_irqsave+0x16/0x27
EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293
ESI: 0005 EDI: f7873808 EBP:  ESP: f7829f10
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
CR0: 8005003b CR2: 004f5960 CR3: 372c5000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [c0438233] tick_broadcast_oneshot_control+0x10/0xda
 [c0437c82] tick_notify+0x1d4/0x2eb
 [c04281b4] get_next_timer_interrupt+0x143/0x1b4
 [c0605819] notifier_call_chain+0x2a/0x47
 [c04345a0] raw_notifier_call_chain+0x17/0x1a
 [c04378b7] clockevents_notify+0x19/0x4f
 [c0533cc3] acpi_idle_enter_simple+0x183/0x1d0
 [c058cea3] cpuidle_idle_call+0x53/0x78
 [c058ce50] cpuidle_idle_call+0x0/0x78
 [c0402575] cpu_idle+0x97/0xb8


Looks like tick_broadcast_lock did not get freed in some path.
You do not see this when you CPU_IDLE is not configured? 

Thanks,
Venki
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-07 Thread Thomas Gleixner
On Fri, 7 Dec 2007, Pallipadi, Venkatesh wrote:
 -Original Message-
 From: Parag Warudkar [mailto:[EMAIL PROTECTED] 
 Sent: Friday, December 07, 2007 2:54 PM
 To: LKML
 Cc: Andrew Morton; Pallipadi, Venkatesh; Linus Torvalds
 Subject: BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0]
 
 Got this on today's git (2.6.24-rc4) while compiling  stuff - Looks
 like it is related to CpuIdle stuff.
 I chose CONFIG_CPU_IDLE for the first time so I don't know when this
 was introduced.
 
 This is on x86_32, SMP.
 
 BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0]
 
 Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #3)
 EIP: 0060:[c0603e22] EFLAGS: 0202 CPU: 1
 EIP is at _spin_lock_irqsave+0x16/0x27
 EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293
 ESI: 0005 EDI: f7873808 EBP:  ESP: f7829f10
  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
 CR0: 8005003b CR2: 004f5960 CR3: 372c5000 CR4: 06d0
 DR0:  DR1:  DR2:  DR3: 
 DR6: 0ff0 DR7: 0400
  [c0438233] tick_broadcast_oneshot_control+0x10/0xda
  [c0437c82] tick_notify+0x1d4/0x2eb
  [c04281b4] get_next_timer_interrupt+0x143/0x1b4
  [c0605819] notifier_call_chain+0x2a/0x47
  [c04345a0] raw_notifier_call_chain+0x17/0x1a
  [c04378b7] clockevents_notify+0x19/0x4f
  [c0533cc3] acpi_idle_enter_simple+0x183/0x1d0
  [c058cea3] cpuidle_idle_call+0x53/0x78
  [c058ce50] cpuidle_idle_call+0x0/0x78
  [c0402575] cpu_idle+0x97/0xb8
 
 
 Looks like tick_broadcast_lock did not get freed in some path.
 You do not see this when you CPU_IDLE is not configured? 

This looks pretty much like the problem I was solving yesterday.

Parag, can you please try Linus latest and please check whether there
is a stack trace with clockevents_program_event on the top in your
dmesg.

Thanks,
tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-07 Thread Rafael J. Wysocki
On Saturday, 8 of December 2007, Andrew Morton wrote:
 On Fri, 7 Dec 2007 17:53:47 -0500
 Parag Warudkar [EMAIL PROTECTED] wrote:
 
  Got this on today's git (2.6.24-rc4) while compiling  stuff - Looks
  like it is related to CpuIdle stuff.
  I chose CONFIG_CPU_IDLE for the first time so I don't know when this
  was introduced.
  
  This is on x86_32, SMP.
  
  BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0]
  
  Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #3)
  EIP: 0060:[c0603e22] EFLAGS: 0202 CPU: 1
  EIP is at _spin_lock_irqsave+0x16/0x27
  EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293
  ESI: 0005 EDI: f7873808 EBP:  ESP: f7829f10
   DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
  CR0: 8005003b CR2: 004f5960 CR3: 372c5000 CR4: 06d0
  DR0:  DR1:  DR2:  DR3: 
  DR6: 0ff0 DR7: 0400
   [c0438233] tick_broadcast_oneshot_control+0x10/0xda
   [c0437c82] tick_notify+0x1d4/0x2eb
   [c04281b4] get_next_timer_interrupt+0x143/0x1b4
   [c0605819] notifier_call_chain+0x2a/0x47
   [c04345a0] raw_notifier_call_chain+0x17/0x1a
   [c04378b7] clockevents_notify+0x19/0x4f
   [c0533cc3] acpi_idle_enter_simple+0x183/0x1d0
   [c058cea3] cpuidle_idle_call+0x53/0x78
   [c058ce50] cpuidle_idle_call+0x0/0x78
   [c0402575] cpu_idle+0x97/0xb8
 
 OK, thanks.  Another one for the regression list, please.

Added, http://bugzilla.kernel.org/show_bug.cgi?id=9525, thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-07 Thread Parag Warudkar
On Dec 7, 2007 6:12 PM, Pallipadi, Venkatesh
[EMAIL PROTECTED] wrote:

 Looks like tick_broadcast_lock did not get freed in some path.
 You do not see this when you CPU_IDLE is not configured?

 Thanks,
 Venki


No, I did not see this prior to enabling CPU_IDLE.

All previous kernels without CPU_IDLE also had soft lockup detection enabled.

Parag
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0]

2007-12-07 Thread Parag Warudkar
Got this on today's git (2.6.24-rc4) while compiling  stuff - Looks
like it is related to CpuIdle stuff.
I chose CONFIG_CPU_IDLE for the first time so I don't know when this
was introduced.

This is on x86_32, SMP.

BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0]

Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #3)
EIP: 0060:[c0603e22] EFLAGS: 0202 CPU: 1
EIP is at _spin_lock_irqsave+0x16/0x27
EAX: c06b4110 EBX: 0001 ECX: f7873808 EDX: 0293
ESI: 0005 EDI: f7873808 EBP:  ESP: f7829f10
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
CR0: 8005003b CR2: 004f5960 CR3: 372c5000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [c0438233] tick_broadcast_oneshot_control+0x10/0xda
 [c0437c82] tick_notify+0x1d4/0x2eb
 [c04281b4] get_next_timer_interrupt+0x143/0x1b4
 [c0605819] notifier_call_chain+0x2a/0x47
 [c04345a0] raw_notifier_call_chain+0x17/0x1a
 [c04378b7] clockevents_notify+0x19/0x4f
 [c0533cc3] acpi_idle_enter_simple+0x183/0x1d0
 [c058cea3] cpuidle_idle_call+0x53/0x78
 [c058ce50] cpuidle_idle_call+0x0/0x78
 [c0402575] cpu_idle+0x97/0xb8
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/