Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168

2007-02-21 Thread Michal Piotrowski
Michal Piotrowski napisał(a):
> On 17/02/07, Alex Riesen <[EMAIL PROTECTED]> wrote:
>> Thomas Gleixner, Sat, Feb 17, 2007 16:14:17 +0100:
>> > On Sat, 2007-02-17 at 15:47 +0100, Alex Riesen wrote:
>> > > > > 164 if (need_resched())
>> > > > > 165 goto end;
>> > > > > 166
>> > > > > 167 cpu = smp_processor_id();
>> > > > > 168 BUG_ON(local_softirq_pending());
>> > > >
>> > > > Hmm, the BUG_ON is inside of an interrupt disabled region, so we
>> should
>> > > > have bailed out early in the need_resched() check above (because
>> we are
>> > > > in the idle task context according to the stack trace).
>> > > >
>> > > > Is this reproducible ?
>> > >
>> > > Seen this too (Ubuntu, P4/ht-SMT, SATA, typed from screen):
>> >
>> > Can you please apply the patch below, so we can at least see, which
>> > softirq is pending. This should trigger independently of hrtimers and
>> > dynticks. You can keep it compiled in and disable it at the kernel
>> > commandline with "nohz=off" and / or "highres=off"
>>
>> It did, only one time:
>>
>> Idle: local softirq pending: 0020<6>USB Universal Host Controller
>> Interface driver v3.0
>>
> 
> sudo cat /var/log/messages | grep Idle
> Feb 17 17:35:34 bitis-gabonica kernel: Idle: local softirq pending:
> 0020<6>hdd: ATAPI 48X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache,
> UDMA(33)
> Feb 17 19:20:01 bitis-gabonica kernel: Idle: local softirq pending:
> 0020<3>Idle: local softirq pending: 0020<3>Idle: local softirq
> pending: 0020<7>PM: Removing info for No Bus:vcs7
> 
> cat /proc/interrupts
>   CPU0   CPU1
>  0: 232838  0   IO-APIC-edge  timer
>  1:401  0   IO-APIC-edge  i8042
>  7:  0  0   IO-APIC-edge  parport0
>  8:  1  0   IO-APIC-edge  rtc
>  9:  1  0   IO-APIC-fasteoi   acpi
>  12:  4  0   IO-APIC-edge  i8042
>  14:319  0   IO-APIC-edge  ide0
>  15:   1612  0   IO-APIC-edge  ide1
>  16:  16494  0   IO-APIC-fasteoi   uhci_hcd:usb3, libata
>  17:   4670  0   IO-APIC-fasteoi   uhci_hcd:usb1, uhci_hcd:usb4
>  18:  0  0   IO-APIC-fasteoi   uhci_hcd:usb2
>  19:  2  0   IO-APIC-fasteoi   ehci_hcd:usb5
>  20:   2822  0   IO-APIC-fasteoi   Intel ICH5
>  21:   2881  0   IO-APIC-fasteoi   eth1
>  22: 58  0   IO-APIC-fasteoi   eth0
> NMI:  0  0
> LOC: 232562 232846
> ERR:  0
> MIS:  0
> 
> I can confirm that this is ICH5 SATA controller problem.

Here is something interesting

cat /var/log/messages | tail -n 300 | grep NOHZ
Feb 20 20:09:39 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 20 20:09:39 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 20 21:10:01 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 20 23:20:01 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:10:28 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:10:46 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
^

Feb 21 05:10:57 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:11:48 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:12:02 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:12:02 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:12:23 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:12:29 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:13:27 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:13:49 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:13:54 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:14:48 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:15:11 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:15:13 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:15:16 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:16:18 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:16:31 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:17:03 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:17:06 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:17:09 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:17:12 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:17:17 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:17:18 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
   
23 times in 7 minutes.


Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
To unsubscribe from this list: send the line "unsubscribe 

Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168

2007-02-21 Thread Michal Piotrowski
Michal Piotrowski napisał(a):
 On 17/02/07, Alex Riesen [EMAIL PROTECTED] wrote:
 Thomas Gleixner, Sat, Feb 17, 2007 16:14:17 +0100:
  On Sat, 2007-02-17 at 15:47 +0100, Alex Riesen wrote:
 164 if (need_resched())
 165 goto end;
 166
 167 cpu = smp_processor_id();
 168 BUG_ON(local_softirq_pending());
   
Hmm, the BUG_ON is inside of an interrupt disabled region, so we
 should
have bailed out early in the need_resched() check above (because
 we are
in the idle task context according to the stack trace).
   
Is this reproducible ?
  
   Seen this too (Ubuntu, P4/ht-SMT, SATA, typed from screen):
 
  Can you please apply the patch below, so we can at least see, which
  softirq is pending. This should trigger independently of hrtimers and
  dynticks. You can keep it compiled in and disable it at the kernel
  commandline with nohz=off and / or highres=off

 It did, only one time:

 Idle: local softirq pending: 00206USB Universal Host Controller
 Interface driver v3.0

 
 sudo cat /var/log/messages | grep Idle
 Feb 17 17:35:34 bitis-gabonica kernel: Idle: local softirq pending:
 00206hdd: ATAPI 48X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache,
 UDMA(33)
 Feb 17 19:20:01 bitis-gabonica kernel: Idle: local softirq pending:
 00203Idle: local softirq pending: 00203Idle: local softirq
 pending: 00207PM: Removing info for No Bus:vcs7
 
 cat /proc/interrupts
   CPU0   CPU1
  0: 232838  0   IO-APIC-edge  timer
  1:401  0   IO-APIC-edge  i8042
  7:  0  0   IO-APIC-edge  parport0
  8:  1  0   IO-APIC-edge  rtc
  9:  1  0   IO-APIC-fasteoi   acpi
  12:  4  0   IO-APIC-edge  i8042
  14:319  0   IO-APIC-edge  ide0
  15:   1612  0   IO-APIC-edge  ide1
  16:  16494  0   IO-APIC-fasteoi   uhci_hcd:usb3, libata
  17:   4670  0   IO-APIC-fasteoi   uhci_hcd:usb1, uhci_hcd:usb4
  18:  0  0   IO-APIC-fasteoi   uhci_hcd:usb2
  19:  2  0   IO-APIC-fasteoi   ehci_hcd:usb5
  20:   2822  0   IO-APIC-fasteoi   Intel ICH5
  21:   2881  0   IO-APIC-fasteoi   eth1
  22: 58  0   IO-APIC-fasteoi   eth0
 NMI:  0  0
 LOC: 232562 232846
 ERR:  0
 MIS:  0
 
 I can confirm that this is ICH5 SATA controller problem.

Here is something interesting

cat /var/log/messages | tail -n 300 | grep NOHZ
Feb 20 20:09:39 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 20 20:09:39 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 20 21:10:01 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 20 23:20:01 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:10:28 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:10:46 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
^

Feb 21 05:10:57 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:11:48 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:12:02 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:12:02 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:12:23 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:12:29 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:13:27 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:13:49 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:13:54 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:14:48 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:15:11 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:15:13 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:15:16 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:16:18 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:16:31 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:17:03 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:17:06 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:17:09 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:17:12 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:17:17 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:17:18 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
   
23 times in 7 minutes.


Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168

2007-02-18 Thread Michal Piotrowski

On 18/02/07, Alex Riesen <[EMAIL PROTECTED]> wrote:

Thomas Gleixner, Sun, Feb 18, 2007 13:36:56 +0100:
> On Sun, 2007-02-18 at 10:50 +0100, Alex Riesen wrote:
> > > The arch/i386/kernel/process.c part of the patch should apply to 2.6.20
> > > as well. Can you check if the problem is there too ?
> >
> > It does not apply and does not look trivially hackable.
> > The code for cpu_idle was introduced in 2ff2d3d7 "i386: add idle notifier".
>
> Here you go.
>

Ok, I confirm the message in vanilla v2.6.20



dmesg | grep Idle
Idle: local softirq pending: 0020

uname -a
Linux bitis-gabonica.linuxtestersgroup.org 2.6.20 #1 SMP PREEMPT Sun
Feb 18 15:20:27 CET 2007 i686 i686 i386 GNU/Linux

Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168

2007-02-18 Thread Thomas Gleixner
On Sun, 2007-02-18 at 10:50 +0100, Alex Riesen wrote:
> > The arch/i386/kernel/process.c part of the patch should apply to 2.6.20
> > as well. Can you check if the problem is there too ?
> 
> It does not apply and does not look trivially hackable.
> The code for cpu_idle was introduced in 2ff2d3d7 "i386: add idle notifier".

Here you go.

tglx

Index: linux-2.6.20/arch/i386/kernel/process.c
===
--- linux-2.6.20.orig/arch/i386/kernel/process.c
+++ linux-2.6.20/arch/i386/kernel/process.c
@@ -189,6 +189,13 @@ void cpu_idle(void)
play_dead();
 
__get_cpu_var(irq_stat).idle_timestamp = jiffies;
+
+   local_irq_disable();
+   if (local_softirq_pending() && !need_resched())
+   printk(KERN_ERR
+  "Idle: local softirq pending: %04x\n",
+  local_softirq_pending());
+   local_irq_enable();
idle();
}
preempt_enable_no_resched();




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168

2007-02-18 Thread Thomas Gleixner
On Sat, 2007-02-17 at 23:41 +0100, Michal Piotrowski wrote:
> sudo cat /var/log/messages | grep Idle
> Feb 17 17:35:34 bitis-gabonica kernel: Idle: local softirq pending:
> 0020<6>hdd: ATAPI 48X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache,
> UDMA(33)
> Feb 17 19:20:01 bitis-gabonica kernel: Idle: local softirq pending:
> 0020<3>Idle: local softirq pending: 0020<3>Idle: local softirq
> pending: 0020<7>PM: Removing info for No Bus:vcs7>
>
> I can confirm that this is ICH5 SATA controller problem.

The arch/i386/kernel/process.c part of the patch should apply to 2.6.20
as well. Can you check if the problem is there too ?

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168

2007-02-18 Thread Thomas Gleixner
On Sat, 2007-02-17 at 23:41 +0100, Michal Piotrowski wrote:
 sudo cat /var/log/messages | grep Idle
 Feb 17 17:35:34 bitis-gabonica kernel: Idle: local softirq pending:
 00206hdd: ATAPI 48X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache,
 UDMA(33)
 Feb 17 19:20:01 bitis-gabonica kernel: Idle: local softirq pending:
 00203Idle: local softirq pending: 00203Idle: local softirq
 pending: 00207PM: Removing info for No Bus:vcs7

 I can confirm that this is ICH5 SATA controller problem.

The arch/i386/kernel/process.c part of the patch should apply to 2.6.20
as well. Can you check if the problem is there too ?

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168

2007-02-18 Thread Thomas Gleixner
On Sun, 2007-02-18 at 10:50 +0100, Alex Riesen wrote:
  The arch/i386/kernel/process.c part of the patch should apply to 2.6.20
  as well. Can you check if the problem is there too ?
 
 It does not apply and does not look trivially hackable.
 The code for cpu_idle was introduced in 2ff2d3d7 i386: add idle notifier.

Here you go.

tglx

Index: linux-2.6.20/arch/i386/kernel/process.c
===
--- linux-2.6.20.orig/arch/i386/kernel/process.c
+++ linux-2.6.20/arch/i386/kernel/process.c
@@ -189,6 +189,13 @@ void cpu_idle(void)
play_dead();
 
__get_cpu_var(irq_stat).idle_timestamp = jiffies;
+
+   local_irq_disable();
+   if (local_softirq_pending()  !need_resched())
+   printk(KERN_ERR
+  Idle: local softirq pending: %04x\n,
+  local_softirq_pending());
+   local_irq_enable();
idle();
}
preempt_enable_no_resched();




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168

2007-02-18 Thread Michal Piotrowski

On 18/02/07, Alex Riesen [EMAIL PROTECTED] wrote:

Thomas Gleixner, Sun, Feb 18, 2007 13:36:56 +0100:
 On Sun, 2007-02-18 at 10:50 +0100, Alex Riesen wrote:
   The arch/i386/kernel/process.c part of the patch should apply to 2.6.20
   as well. Can you check if the problem is there too ?
 
  It does not apply and does not look trivially hackable.
  The code for cpu_idle was introduced in 2ff2d3d7 i386: add idle notifier.

 Here you go.


Ok, I confirm the message in vanilla v2.6.20



dmesg | grep Idle
Idle: local softirq pending: 0020

uname -a
Linux bitis-gabonica.linuxtestersgroup.org 2.6.20 #1 SMP PREEMPT Sun
Feb 18 15:20:27 CET 2007 i686 i686 i386 GNU/Linux

Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168

2007-02-17 Thread Michal Piotrowski

On 17/02/07, Alex Riesen <[EMAIL PROTECTED]> wrote:

Thomas Gleixner, Sat, Feb 17, 2007 16:14:17 +0100:
> On Sat, 2007-02-17 at 15:47 +0100, Alex Riesen wrote:
> > > > 164 if (need_resched())
> > > > 165 goto end;
> > > > 166
> > > > 167 cpu = smp_processor_id();
> > > > 168 BUG_ON(local_softirq_pending());
> > >
> > > Hmm, the BUG_ON is inside of an interrupt disabled region, so we should
> > > have bailed out early in the need_resched() check above (because we are
> > > in the idle task context according to the stack trace).
> > >
> > > Is this reproducible ?
> >
> > Seen this too (Ubuntu, P4/ht-SMT, SATA, typed from screen):
>
> Can you please apply the patch below, so we can at least see, which
> softirq is pending. This should trigger independently of hrtimers and
> dynticks. You can keep it compiled in and disable it at the kernel
> commandline with "nohz=off" and / or "highres=off"

It did, only one time:

Idle: local softirq pending: 0020<6>USB Universal Host Controller Interface 
driver v3.0



sudo cat /var/log/messages | grep Idle
Feb 17 17:35:34 bitis-gabonica kernel: Idle: local softirq pending:
0020<6>hdd: ATAPI 48X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache,
UDMA(33)
Feb 17 19:20:01 bitis-gabonica kernel: Idle: local softirq pending:
0020<3>Idle: local softirq pending: 0020<3>Idle: local softirq
pending: 0020<7>PM: Removing info for No Bus:vcs7

cat /proc/interrupts
  CPU0   CPU1
 0: 232838  0   IO-APIC-edge  timer
 1:401  0   IO-APIC-edge  i8042
 7:  0  0   IO-APIC-edge  parport0
 8:  1  0   IO-APIC-edge  rtc
 9:  1  0   IO-APIC-fasteoi   acpi
 12:  4  0   IO-APIC-edge  i8042
 14:319  0   IO-APIC-edge  ide0
 15:   1612  0   IO-APIC-edge  ide1
 16:  16494  0   IO-APIC-fasteoi   uhci_hcd:usb3, libata
 17:   4670  0   IO-APIC-fasteoi   uhci_hcd:usb1, uhci_hcd:usb4
 18:  0  0   IO-APIC-fasteoi   uhci_hcd:usb2
 19:  2  0   IO-APIC-fasteoi   ehci_hcd:usb5
 20:   2822  0   IO-APIC-fasteoi   Intel ICH5
 21:   2881  0   IO-APIC-fasteoi   eth1
 22: 58  0   IO-APIC-fasteoi   eth0
NMI:  0  0
LOC: 232562 232846
ERR:  0
MIS:  0

I can confirm that this is ICH5 SATA controller problem.

Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168

2007-02-17 Thread Thomas Gleixner
On Sat, 2007-02-17 at 17:46 +0100, Alex Riesen wrote:
> > Can you please apply the patch below, so we can at least see, which
> > softirq is pending. This should trigger independently of hrtimers and
> > dynticks. You can keep it compiled in and disable it at the kernel
> > commandline with "nohz=off" and / or "highres=off"
> 
> It did, only one time:
> 
> Idle: local softirq pending: 0020<6>USB Universal Host Controller Interface 
> driver v3.0

0x20 is the TASKLET_SOFTIRQ. I have no idea yet, how this can happen.

Can you please check, if this happens when you add "nohz=off" to the
kernel command line.

tglx





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168

2007-02-17 Thread Thomas Gleixner
On Sat, 2007-02-17 at 15:47 +0100, Alex Riesen wrote:
> > > 164 if (need_resched())
> > > 165 goto end;
> > > 166
> > > 167 cpu = smp_processor_id();
> > > 168 BUG_ON(local_softirq_pending());
> > 
> > Hmm, the BUG_ON is inside of an interrupt disabled region, so we should
> > have bailed out early in the need_resched() check above (because we are
> > in the idle task context according to the stack trace).
> > 
> > Is this reproducible ?
>
> Seen this too (Ubuntu, P4/ht-SMT, SATA, typed from screen):

Can you please apply the patch below, so we can at least see, which
softirq is pending. This should trigger independently of hrtimers and
dynticks. You can keep it compiled in and disable it at the kernel
commandline with "nohz=off" and / or "highres=off"

tglx

diff --git a/arch/i386/kernel/process.c b/arch/i386/kernel/process.c
index bea304d..deeb90e 100644
--- a/arch/i386/kernel/process.c
+++ b/arch/i386/kernel/process.c
@@ -236,6 +236,11 @@ void cpu_idle(void)
 * Otherwise, idle callbacks can misfire.
 */
local_irq_disable();
+   if (local_softirq_pending() && !need_resched())
+   printk(KERN_ERR
+  "Idle: local softirq pending: %04x",
+  local_softirq_pending());
+
enter_idle();
idle();
__exit_idle();
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 95e41f7..91d459f 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -165,7 +165,6 @@ void tick_nohz_stop_sched_tick(void)
goto end;
 
cpu = smp_processor_id();
-   BUG_ON(local_softirq_pending());
 
now = ktime_get();
/*


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168

2007-02-17 Thread Thomas Gleixner
On Sat, 2007-02-17 at 15:47 +0100, Alex Riesen wrote:
   164 if (need_resched())
   165 goto end;
   166
   167 cpu = smp_processor_id();
   168 BUG_ON(local_softirq_pending());
  
  Hmm, the BUG_ON is inside of an interrupt disabled region, so we should
  have bailed out early in the need_resched() check above (because we are
  in the idle task context according to the stack trace).
  
  Is this reproducible ?

 Seen this too (Ubuntu, P4/ht-SMT, SATA, typed from screen):

Can you please apply the patch below, so we can at least see, which
softirq is pending. This should trigger independently of hrtimers and
dynticks. You can keep it compiled in and disable it at the kernel
commandline with nohz=off and / or highres=off

tglx

diff --git a/arch/i386/kernel/process.c b/arch/i386/kernel/process.c
index bea304d..deeb90e 100644
--- a/arch/i386/kernel/process.c
+++ b/arch/i386/kernel/process.c
@@ -236,6 +236,11 @@ void cpu_idle(void)
 * Otherwise, idle callbacks can misfire.
 */
local_irq_disable();
+   if (local_softirq_pending()  !need_resched())
+   printk(KERN_ERR
+  Idle: local softirq pending: %04x,
+  local_softirq_pending());
+
enter_idle();
idle();
__exit_idle();
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 95e41f7..91d459f 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -165,7 +165,6 @@ void tick_nohz_stop_sched_tick(void)
goto end;
 
cpu = smp_processor_id();
-   BUG_ON(local_softirq_pending());
 
now = ktime_get();
/*


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168

2007-02-17 Thread Thomas Gleixner
On Sat, 2007-02-17 at 17:46 +0100, Alex Riesen wrote:
  Can you please apply the patch below, so we can at least see, which
  softirq is pending. This should trigger independently of hrtimers and
  dynticks. You can keep it compiled in and disable it at the kernel
  commandline with nohz=off and / or highres=off
 
 It did, only one time:
 
 Idle: local softirq pending: 00206USB Universal Host Controller Interface 
 driver v3.0

0x20 is the TASKLET_SOFTIRQ. I have no idea yet, how this can happen.

Can you please check, if this happens when you add nohz=off to the
kernel command line.

tglx





-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168

2007-02-17 Thread Michal Piotrowski

On 17/02/07, Alex Riesen [EMAIL PROTECTED] wrote:

Thomas Gleixner, Sat, Feb 17, 2007 16:14:17 +0100:
 On Sat, 2007-02-17 at 15:47 +0100, Alex Riesen wrote:
164 if (need_resched())
165 goto end;
166
167 cpu = smp_processor_id();
168 BUG_ON(local_softirq_pending());
  
   Hmm, the BUG_ON is inside of an interrupt disabled region, so we should
   have bailed out early in the need_resched() check above (because we are
   in the idle task context according to the stack trace).
  
   Is this reproducible ?
 
  Seen this too (Ubuntu, P4/ht-SMT, SATA, typed from screen):

 Can you please apply the patch below, so we can at least see, which
 softirq is pending. This should trigger independently of hrtimers and
 dynticks. You can keep it compiled in and disable it at the kernel
 commandline with nohz=off and / or highres=off

It did, only one time:

Idle: local softirq pending: 00206USB Universal Host Controller Interface 
driver v3.0



sudo cat /var/log/messages | grep Idle
Feb 17 17:35:34 bitis-gabonica kernel: Idle: local softirq pending:
00206hdd: ATAPI 48X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache,
UDMA(33)
Feb 17 19:20:01 bitis-gabonica kernel: Idle: local softirq pending:
00203Idle: local softirq pending: 00203Idle: local softirq
pending: 00207PM: Removing info for No Bus:vcs7

cat /proc/interrupts
  CPU0   CPU1
 0: 232838  0   IO-APIC-edge  timer
 1:401  0   IO-APIC-edge  i8042
 7:  0  0   IO-APIC-edge  parport0
 8:  1  0   IO-APIC-edge  rtc
 9:  1  0   IO-APIC-fasteoi   acpi
 12:  4  0   IO-APIC-edge  i8042
 14:319  0   IO-APIC-edge  ide0
 15:   1612  0   IO-APIC-edge  ide1
 16:  16494  0   IO-APIC-fasteoi   uhci_hcd:usb3, libata
 17:   4670  0   IO-APIC-fasteoi   uhci_hcd:usb1, uhci_hcd:usb4
 18:  0  0   IO-APIC-fasteoi   uhci_hcd:usb2
 19:  2  0   IO-APIC-fasteoi   ehci_hcd:usb5
 20:   2822  0   IO-APIC-fasteoi   Intel ICH5
 21:   2881  0   IO-APIC-fasteoi   eth1
 22: 58  0   IO-APIC-fasteoi   eth0
NMI:  0  0
LOC: 232562 232846
ERR:  0
MIS:  0

I can confirm that this is ICH5 SATA controller problem.

Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168

2007-02-16 Thread Michal Piotrowski

On 16/02/07, Thomas Gleixner <[EMAIL PROTECTED]> wrote:

On Fri, 2007-02-16 at 21:38 +0100, Michal Piotrowski wrote:
> Hi,
>
> This looks like a tickless stuff

Yup.

> 0xc0139ea0 is in tick_nohz_stop_sched_tick 
(/mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168).
> 163
> 164 if (need_resched())
> 165 goto end;
> 166
> 167 cpu = smp_processor_id();
> 168 BUG_ON(local_softirq_pending());

Hmm, the BUG_ON is inside of an interrupt disabled region, so we should
have bailed out early in the need_resched() check above (because we are
in the idle task context according to the stack trace).

Is this reproducible ?


Yes.



tglx





Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168

2007-02-16 Thread Thomas Gleixner
On Fri, 2007-02-16 at 21:38 +0100, Michal Piotrowski wrote:
> Hi,
> 
> This looks like a tickless stuff

Yup.

> 0xc0139ea0 is in tick_nohz_stop_sched_tick 
> (/mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168).
> 163
> 164 if (need_resched())
> 165 goto end;
> 166
> 167 cpu = smp_processor_id();
> 168 BUG_ON(local_softirq_pending());

Hmm, the BUG_ON is inside of an interrupt disabled region, so we should
have bailed out early in the need_resched() check above (because we are
in the idle task context according to the stack trace).

Is this reproducible ?

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168

2007-02-16 Thread Michal Piotrowski
Hi,

This looks like a tickless stuff

[ cut here ]
kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168!
invalid opcode:  [#1]
PREEMPT SMP
Modules linked in: rtc unix
CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 00010002   (2.6.20 #53)
EIP is at tick_nohz_stop_sched_tick+0x5d/0x1d9
eax:    ebx: c6021100   ecx: 0001   edx: c03e74e0
esi: c6022c40   edi:    ebp: c041dfac   esp: c041df78
ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
Process swapper (pid: 0, ti=c041d000 task=c03e74e0 task.ti=c041d000)
Stack:  c03e7608 c6021b20  c041dfac 0296 0001 c602007b
   c01000d8 c0451304 c6021100 c0102b42  c041dfc0 c01023b4 c602ee80
   c0453ae4 c602b864 c041dfc8 c010141d c041dff8 c0421a52 c038d4da c04211ed
Call Trace:
 [] show_trace_log_lvl+0x1a/0x2f
 [] show_stack_log_lvl+0x9d/0xa5
 [] show_registers+0x1ed/0x32c
 [] die+0x118/0x22f
 [] do_trap+0x79/0x91
 [] do_invalid_op+0x97/0xa1
 [] error_code+0x7c/0x84
 [] cpu_idle+0x1d/0xfb
 [] rest_init+0x37/0x3a
 [] start_kernel+0x459/0x461
 [<>] 0x0
 ===
Code: f0 ff ff 8b 40 08 a8 08 0f 85 74 01 00 00 e8 74 e7 0b 00 89 c7 bb 80 13 
45 c0 e8 68 e7 0b 00 03 1c 85 00 a6 41 c0 83 3b 00 74 04 <0f> 0b eb fe 8d 45 ec 
e8 bf c0 ff ff 8b 4d ec 8b 5d f0 83 7e 3c
EIP: [] tick_nohz_stop_sched_tick+0x5d/0x1d9 SS:ESP 0068:c041df78
Kernel panic - not syncing: Attempted to kill the idle task!
BUG: at /mnt/md0/devel/linux-git/arch/i386/kernel/smp.c:547 smp_call_function()
 [] show_trace_log_lvl+0x1a/0x2f
 [] show_trace+0x12/0x14
 [] dump_stack+0x16/0x18
 [] smp_call_function+0x61/0x102
 [] smp_send_stop+0x1e/0x31
 [] panic+0x5e/0xfb
 [] do_exit+0x7a/0x702
 [] die+0x209/0x22f
 [] do_trap+0x79/0x91
 [] do_invalid_op+0x97/0xa1
 [] error_code+0x7c/0x84
 [] cpu_idle+0x1d/0xfb
 [] rest_init+0x37/0x3a
 [] start_kernel+0x459/0x461
 [<>] 0x0
 ===

0xc0139ea0 is in tick_nohz_stop_sched_tick 
(/mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168).
163
164 if (need_resched())
165 goto end;
166
167 cpu = smp_processor_id();
168 BUG_ON(local_softirq_pending());
169
170 now = ktime_get();
171 /*
172  * When called from irq_exit we need to account the idle sleep 
time


http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-git13/boot.log
http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-git13/git-config

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168

2007-02-16 Thread Michal Piotrowski
Hi,

This looks like a tickless stuff

[ cut here ]
kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168!
invalid opcode:  [#1]
PREEMPT SMP
Modules linked in: rtc unix
CPU:0
EIP:0060:[c0139ea0]Not tainted VLI
EFLAGS: 00010002   (2.6.20 #53)
EIP is at tick_nohz_stop_sched_tick+0x5d/0x1d9
eax:    ebx: c6021100   ecx: 0001   edx: c03e74e0
esi: c6022c40   edi:    ebp: c041dfac   esp: c041df78
ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
Process swapper (pid: 0, ti=c041d000 task=c03e74e0 task.ti=c041d000)
Stack:  c03e7608 c6021b20  c041dfac 0296 0001 c602007b
   c01000d8 c0451304 c6021100 c0102b42  c041dfc0 c01023b4 c602ee80
   c0453ae4 c602b864 c041dfc8 c010141d c041dff8 c0421a52 c038d4da c04211ed
Call Trace:
 [c01050f9] show_trace_log_lvl+0x1a/0x2f
 [c01051ab] show_stack_log_lvl+0x9d/0xa5
 [c01053a0] show_registers+0x1ed/0x32c
 [c01055f7] die+0x118/0x22f
 [c0105787] do_trap+0x79/0x91
 [c0106017] do_invalid_op+0x97/0xa1
 [c0311abc] error_code+0x7c/0x84
 [c01023b4] cpu_idle+0x1d/0xfb
 [c010141d] rest_init+0x37/0x3a
 [c0421a52] start_kernel+0x459/0x461
 [] 0x0
 ===
Code: f0 ff ff 8b 40 08 a8 08 0f 85 74 01 00 00 e8 74 e7 0b 00 89 c7 bb 80 13 
45 c0 e8 68 e7 0b 00 03 1c 85 00 a6 41 c0 83 3b 00 74 04 0f 0b eb fe 8d 45 ec 
e8 bf c0 ff ff 8b 4d ec 8b 5d f0 83 7e 3c
EIP: [c0139ea0] tick_nohz_stop_sched_tick+0x5d/0x1d9 SS:ESP 0068:c041df78
Kernel panic - not syncing: Attempted to kill the idle task!
BUG: at /mnt/md0/devel/linux-git/arch/i386/kernel/smp.c:547 smp_call_function()
 [c01050f9] show_trace_log_lvl+0x1a/0x2f
 [c01057e0] show_trace+0x12/0x14
 [c0105892] dump_stack+0x16/0x18
 [c0112792] smp_call_function+0x61/0x102
 [c0112851] smp_send_stop+0x1e/0x31
 [c0121ae3] panic+0x5e/0xfb
 [c012463b] do_exit+0x7a/0x702
 [c01056e8] die+0x209/0x22f
 [c0105787] do_trap+0x79/0x91
 [c0106017] do_invalid_op+0x97/0xa1
 [c0311abc] error_code+0x7c/0x84
 [c01023b4] cpu_idle+0x1d/0xfb
 [c010141d] rest_init+0x37/0x3a
 [c0421a52] start_kernel+0x459/0x461
 [] 0x0
 ===

0xc0139ea0 is in tick_nohz_stop_sched_tick 
(/mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168).
163
164 if (need_resched())
165 goto end;
166
167 cpu = smp_processor_id();
168 BUG_ON(local_softirq_pending());
169
170 now = ktime_get();
171 /*
172  * When called from irq_exit we need to account the idle sleep 
time


http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-git13/boot.log
http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-git13/git-config

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168

2007-02-16 Thread Thomas Gleixner
On Fri, 2007-02-16 at 21:38 +0100, Michal Piotrowski wrote:
 Hi,
 
 This looks like a tickless stuff

Yup.

 0xc0139ea0 is in tick_nohz_stop_sched_tick 
 (/mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168).
 163
 164 if (need_resched())
 165 goto end;
 166
 167 cpu = smp_processor_id();
 168 BUG_ON(local_softirq_pending());

Hmm, the BUG_ON is inside of an interrupt disabled region, so we should
have bailed out early in the need_resched() check above (because we are
in the idle task context according to the stack trace).

Is this reproducible ?

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168

2007-02-16 Thread Michal Piotrowski

On 16/02/07, Thomas Gleixner [EMAIL PROTECTED] wrote:

On Fri, 2007-02-16 at 21:38 +0100, Michal Piotrowski wrote:
 Hi,

 This looks like a tickless stuff

Yup.

 0xc0139ea0 is in tick_nohz_stop_sched_tick 
(/mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168).
 163
 164 if (need_resched())
 165 goto end;
 166
 167 cpu = smp_processor_id();
 168 BUG_ON(local_softirq_pending());

Hmm, the BUG_ON is inside of an interrupt disabled region, so we should
have bailed out early in the need_resched() check above (because we are
in the idle task context according to the stack trace).

Is this reproducible ?


Yes.



tglx





Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/