Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168
Michal Piotrowski napisał(a): > On 17/02/07, Alex Riesen <[EMAIL PROTECTED]> wrote: >> Thomas Gleixner, Sat, Feb 17, 2007 16:14:17 +0100: >> > On Sat, 2007-02-17 at 15:47 +0100, Alex Riesen wrote: >> > > > > 164 if (need_resched()) >> > > > > 165 goto end; >> > > > > 166 >> > > > > 167 cpu = smp_processor_id(); >> > > > > 168 BUG_ON(local_softirq_pending()); >> > > > >> > > > Hmm, the BUG_ON is inside of an interrupt disabled region, so we >> should >> > > > have bailed out early in the need_resched() check above (because >> we are >> > > > in the idle task context according to the stack trace). >> > > > >> > > > Is this reproducible ? >> > > >> > > Seen this too (Ubuntu, P4/ht-SMT, SATA, typed from screen): >> > >> > Can you please apply the patch below, so we can at least see, which >> > softirq is pending. This should trigger independently of hrtimers and >> > dynticks. You can keep it compiled in and disable it at the kernel >> > commandline with "nohz=off" and / or "highres=off" >> >> It did, only one time: >> >> Idle: local softirq pending: 0020<6>USB Universal Host Controller >> Interface driver v3.0 >> > > sudo cat /var/log/messages | grep Idle > Feb 17 17:35:34 bitis-gabonica kernel: Idle: local softirq pending: > 0020<6>hdd: ATAPI 48X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache, > UDMA(33) > Feb 17 19:20:01 bitis-gabonica kernel: Idle: local softirq pending: > 0020<3>Idle: local softirq pending: 0020<3>Idle: local softirq > pending: 0020<7>PM: Removing info for No Bus:vcs7 > > cat /proc/interrupts > CPU0 CPU1 > 0: 232838 0 IO-APIC-edge timer > 1:401 0 IO-APIC-edge i8042 > 7: 0 0 IO-APIC-edge parport0 > 8: 1 0 IO-APIC-edge rtc > 9: 1 0 IO-APIC-fasteoi acpi > 12: 4 0 IO-APIC-edge i8042 > 14:319 0 IO-APIC-edge ide0 > 15: 1612 0 IO-APIC-edge ide1 > 16: 16494 0 IO-APIC-fasteoi uhci_hcd:usb3, libata > 17: 4670 0 IO-APIC-fasteoi uhci_hcd:usb1, uhci_hcd:usb4 > 18: 0 0 IO-APIC-fasteoi uhci_hcd:usb2 > 19: 2 0 IO-APIC-fasteoi ehci_hcd:usb5 > 20: 2822 0 IO-APIC-fasteoi Intel ICH5 > 21: 2881 0 IO-APIC-fasteoi eth1 > 22: 58 0 IO-APIC-fasteoi eth0 > NMI: 0 0 > LOC: 232562 232846 > ERR: 0 > MIS: 0 > > I can confirm that this is ICH5 SATA controller problem. Here is something interesting cat /var/log/messages | tail -n 300 | grep NOHZ Feb 20 20:09:39 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 20 20:09:39 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 20 21:10:01 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 20 23:20:01 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 21 05:10:28 bitis-gabonica kernel: NOHZ: local_softirq_pending 02 Feb 21 05:10:46 bitis-gabonica kernel: NOHZ: local_softirq_pending 02 ^ Feb 21 05:10:57 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 21 05:11:48 bitis-gabonica kernel: NOHZ: local_softirq_pending 02 Feb 21 05:12:02 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 21 05:12:02 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 21 05:12:23 bitis-gabonica kernel: NOHZ: local_softirq_pending 02 Feb 21 05:12:29 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 21 05:13:27 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 21 05:13:49 bitis-gabonica kernel: NOHZ: local_softirq_pending 02 Feb 21 05:13:54 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 21 05:14:48 bitis-gabonica kernel: NOHZ: local_softirq_pending 02 Feb 21 05:15:11 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 21 05:15:13 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 21 05:15:16 bitis-gabonica kernel: NOHZ: local_softirq_pending 02 Feb 21 05:16:18 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 21 05:16:31 bitis-gabonica kernel: NOHZ: local_softirq_pending 02 Feb 21 05:17:03 bitis-gabonica kernel: NOHZ: local_softirq_pending 02 Feb 21 05:17:06 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 21 05:17:09 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 21 05:17:12 bitis-gabonica kernel: NOHZ: local_softirq_pending 02 Feb 21 05:17:17 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 21 05:17:18 bitis-gabonica kernel: NOHZ: local_softirq_pending 02 23 times in 7 minutes. Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) - To unsubscribe from this list: send the line "unsubscribe
Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168
Michal Piotrowski napisał(a): On 17/02/07, Alex Riesen [EMAIL PROTECTED] wrote: Thomas Gleixner, Sat, Feb 17, 2007 16:14:17 +0100: On Sat, 2007-02-17 at 15:47 +0100, Alex Riesen wrote: 164 if (need_resched()) 165 goto end; 166 167 cpu = smp_processor_id(); 168 BUG_ON(local_softirq_pending()); Hmm, the BUG_ON is inside of an interrupt disabled region, so we should have bailed out early in the need_resched() check above (because we are in the idle task context according to the stack trace). Is this reproducible ? Seen this too (Ubuntu, P4/ht-SMT, SATA, typed from screen): Can you please apply the patch below, so we can at least see, which softirq is pending. This should trigger independently of hrtimers and dynticks. You can keep it compiled in and disable it at the kernel commandline with nohz=off and / or highres=off It did, only one time: Idle: local softirq pending: 00206USB Universal Host Controller Interface driver v3.0 sudo cat /var/log/messages | grep Idle Feb 17 17:35:34 bitis-gabonica kernel: Idle: local softirq pending: 00206hdd: ATAPI 48X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache, UDMA(33) Feb 17 19:20:01 bitis-gabonica kernel: Idle: local softirq pending: 00203Idle: local softirq pending: 00203Idle: local softirq pending: 00207PM: Removing info for No Bus:vcs7 cat /proc/interrupts CPU0 CPU1 0: 232838 0 IO-APIC-edge timer 1:401 0 IO-APIC-edge i8042 7: 0 0 IO-APIC-edge parport0 8: 1 0 IO-APIC-edge rtc 9: 1 0 IO-APIC-fasteoi acpi 12: 4 0 IO-APIC-edge i8042 14:319 0 IO-APIC-edge ide0 15: 1612 0 IO-APIC-edge ide1 16: 16494 0 IO-APIC-fasteoi uhci_hcd:usb3, libata 17: 4670 0 IO-APIC-fasteoi uhci_hcd:usb1, uhci_hcd:usb4 18: 0 0 IO-APIC-fasteoi uhci_hcd:usb2 19: 2 0 IO-APIC-fasteoi ehci_hcd:usb5 20: 2822 0 IO-APIC-fasteoi Intel ICH5 21: 2881 0 IO-APIC-fasteoi eth1 22: 58 0 IO-APIC-fasteoi eth0 NMI: 0 0 LOC: 232562 232846 ERR: 0 MIS: 0 I can confirm that this is ICH5 SATA controller problem. Here is something interesting cat /var/log/messages | tail -n 300 | grep NOHZ Feb 20 20:09:39 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 20 20:09:39 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 20 21:10:01 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 20 23:20:01 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 21 05:10:28 bitis-gabonica kernel: NOHZ: local_softirq_pending 02 Feb 21 05:10:46 bitis-gabonica kernel: NOHZ: local_softirq_pending 02 ^ Feb 21 05:10:57 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 21 05:11:48 bitis-gabonica kernel: NOHZ: local_softirq_pending 02 Feb 21 05:12:02 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 21 05:12:02 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 21 05:12:23 bitis-gabonica kernel: NOHZ: local_softirq_pending 02 Feb 21 05:12:29 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 21 05:13:27 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 21 05:13:49 bitis-gabonica kernel: NOHZ: local_softirq_pending 02 Feb 21 05:13:54 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 21 05:14:48 bitis-gabonica kernel: NOHZ: local_softirq_pending 02 Feb 21 05:15:11 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 21 05:15:13 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 21 05:15:16 bitis-gabonica kernel: NOHZ: local_softirq_pending 02 Feb 21 05:16:18 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 21 05:16:31 bitis-gabonica kernel: NOHZ: local_softirq_pending 02 Feb 21 05:17:03 bitis-gabonica kernel: NOHZ: local_softirq_pending 02 Feb 21 05:17:06 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 21 05:17:09 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 21 05:17:12 bitis-gabonica kernel: NOHZ: local_softirq_pending 02 Feb 21 05:17:17 bitis-gabonica kernel: NOHZ: local_softirq_pending 20 Feb 21 05:17:18 bitis-gabonica kernel: NOHZ: local_softirq_pending 02 23 times in 7 minutes. Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at
Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168
On 18/02/07, Alex Riesen <[EMAIL PROTECTED]> wrote: Thomas Gleixner, Sun, Feb 18, 2007 13:36:56 +0100: > On Sun, 2007-02-18 at 10:50 +0100, Alex Riesen wrote: > > > The arch/i386/kernel/process.c part of the patch should apply to 2.6.20 > > > as well. Can you check if the problem is there too ? > > > > It does not apply and does not look trivially hackable. > > The code for cpu_idle was introduced in 2ff2d3d7 "i386: add idle notifier". > > Here you go. > Ok, I confirm the message in vanilla v2.6.20 dmesg | grep Idle Idle: local softirq pending: 0020 uname -a Linux bitis-gabonica.linuxtestersgroup.org 2.6.20 #1 SMP PREEMPT Sun Feb 18 15:20:27 CET 2007 i686 i686 i386 GNU/Linux Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168
On Sun, 2007-02-18 at 10:50 +0100, Alex Riesen wrote: > > The arch/i386/kernel/process.c part of the patch should apply to 2.6.20 > > as well. Can you check if the problem is there too ? > > It does not apply and does not look trivially hackable. > The code for cpu_idle was introduced in 2ff2d3d7 "i386: add idle notifier". Here you go. tglx Index: linux-2.6.20/arch/i386/kernel/process.c === --- linux-2.6.20.orig/arch/i386/kernel/process.c +++ linux-2.6.20/arch/i386/kernel/process.c @@ -189,6 +189,13 @@ void cpu_idle(void) play_dead(); __get_cpu_var(irq_stat).idle_timestamp = jiffies; + + local_irq_disable(); + if (local_softirq_pending() && !need_resched()) + printk(KERN_ERR + "Idle: local softirq pending: %04x\n", + local_softirq_pending()); + local_irq_enable(); idle(); } preempt_enable_no_resched(); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168
On Sat, 2007-02-17 at 23:41 +0100, Michal Piotrowski wrote: > sudo cat /var/log/messages | grep Idle > Feb 17 17:35:34 bitis-gabonica kernel: Idle: local softirq pending: > 0020<6>hdd: ATAPI 48X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache, > UDMA(33) > Feb 17 19:20:01 bitis-gabonica kernel: Idle: local softirq pending: > 0020<3>Idle: local softirq pending: 0020<3>Idle: local softirq > pending: 0020<7>PM: Removing info for No Bus:vcs7> > > I can confirm that this is ICH5 SATA controller problem. The arch/i386/kernel/process.c part of the patch should apply to 2.6.20 as well. Can you check if the problem is there too ? tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168
On Sat, 2007-02-17 at 23:41 +0100, Michal Piotrowski wrote: sudo cat /var/log/messages | grep Idle Feb 17 17:35:34 bitis-gabonica kernel: Idle: local softirq pending: 00206hdd: ATAPI 48X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache, UDMA(33) Feb 17 19:20:01 bitis-gabonica kernel: Idle: local softirq pending: 00203Idle: local softirq pending: 00203Idle: local softirq pending: 00207PM: Removing info for No Bus:vcs7 I can confirm that this is ICH5 SATA controller problem. The arch/i386/kernel/process.c part of the patch should apply to 2.6.20 as well. Can you check if the problem is there too ? tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168
On Sun, 2007-02-18 at 10:50 +0100, Alex Riesen wrote: The arch/i386/kernel/process.c part of the patch should apply to 2.6.20 as well. Can you check if the problem is there too ? It does not apply and does not look trivially hackable. The code for cpu_idle was introduced in 2ff2d3d7 i386: add idle notifier. Here you go. tglx Index: linux-2.6.20/arch/i386/kernel/process.c === --- linux-2.6.20.orig/arch/i386/kernel/process.c +++ linux-2.6.20/arch/i386/kernel/process.c @@ -189,6 +189,13 @@ void cpu_idle(void) play_dead(); __get_cpu_var(irq_stat).idle_timestamp = jiffies; + + local_irq_disable(); + if (local_softirq_pending() !need_resched()) + printk(KERN_ERR + Idle: local softirq pending: %04x\n, + local_softirq_pending()); + local_irq_enable(); idle(); } preempt_enable_no_resched(); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168
On 18/02/07, Alex Riesen [EMAIL PROTECTED] wrote: Thomas Gleixner, Sun, Feb 18, 2007 13:36:56 +0100: On Sun, 2007-02-18 at 10:50 +0100, Alex Riesen wrote: The arch/i386/kernel/process.c part of the patch should apply to 2.6.20 as well. Can you check if the problem is there too ? It does not apply and does not look trivially hackable. The code for cpu_idle was introduced in 2ff2d3d7 i386: add idle notifier. Here you go. Ok, I confirm the message in vanilla v2.6.20 dmesg | grep Idle Idle: local softirq pending: 0020 uname -a Linux bitis-gabonica.linuxtestersgroup.org 2.6.20 #1 SMP PREEMPT Sun Feb 18 15:20:27 CET 2007 i686 i686 i386 GNU/Linux Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168
On 17/02/07, Alex Riesen <[EMAIL PROTECTED]> wrote: Thomas Gleixner, Sat, Feb 17, 2007 16:14:17 +0100: > On Sat, 2007-02-17 at 15:47 +0100, Alex Riesen wrote: > > > > 164 if (need_resched()) > > > > 165 goto end; > > > > 166 > > > > 167 cpu = smp_processor_id(); > > > > 168 BUG_ON(local_softirq_pending()); > > > > > > Hmm, the BUG_ON is inside of an interrupt disabled region, so we should > > > have bailed out early in the need_resched() check above (because we are > > > in the idle task context according to the stack trace). > > > > > > Is this reproducible ? > > > > Seen this too (Ubuntu, P4/ht-SMT, SATA, typed from screen): > > Can you please apply the patch below, so we can at least see, which > softirq is pending. This should trigger independently of hrtimers and > dynticks. You can keep it compiled in and disable it at the kernel > commandline with "nohz=off" and / or "highres=off" It did, only one time: Idle: local softirq pending: 0020<6>USB Universal Host Controller Interface driver v3.0 sudo cat /var/log/messages | grep Idle Feb 17 17:35:34 bitis-gabonica kernel: Idle: local softirq pending: 0020<6>hdd: ATAPI 48X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache, UDMA(33) Feb 17 19:20:01 bitis-gabonica kernel: Idle: local softirq pending: 0020<3>Idle: local softirq pending: 0020<3>Idle: local softirq pending: 0020<7>PM: Removing info for No Bus:vcs7 cat /proc/interrupts CPU0 CPU1 0: 232838 0 IO-APIC-edge timer 1:401 0 IO-APIC-edge i8042 7: 0 0 IO-APIC-edge parport0 8: 1 0 IO-APIC-edge rtc 9: 1 0 IO-APIC-fasteoi acpi 12: 4 0 IO-APIC-edge i8042 14:319 0 IO-APIC-edge ide0 15: 1612 0 IO-APIC-edge ide1 16: 16494 0 IO-APIC-fasteoi uhci_hcd:usb3, libata 17: 4670 0 IO-APIC-fasteoi uhci_hcd:usb1, uhci_hcd:usb4 18: 0 0 IO-APIC-fasteoi uhci_hcd:usb2 19: 2 0 IO-APIC-fasteoi ehci_hcd:usb5 20: 2822 0 IO-APIC-fasteoi Intel ICH5 21: 2881 0 IO-APIC-fasteoi eth1 22: 58 0 IO-APIC-fasteoi eth0 NMI: 0 0 LOC: 232562 232846 ERR: 0 MIS: 0 I can confirm that this is ICH5 SATA controller problem. Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168
On Sat, 2007-02-17 at 17:46 +0100, Alex Riesen wrote: > > Can you please apply the patch below, so we can at least see, which > > softirq is pending. This should trigger independently of hrtimers and > > dynticks. You can keep it compiled in and disable it at the kernel > > commandline with "nohz=off" and / or "highres=off" > > It did, only one time: > > Idle: local softirq pending: 0020<6>USB Universal Host Controller Interface > driver v3.0 0x20 is the TASKLET_SOFTIRQ. I have no idea yet, how this can happen. Can you please check, if this happens when you add "nohz=off" to the kernel command line. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168
On Sat, 2007-02-17 at 15:47 +0100, Alex Riesen wrote: > > > 164 if (need_resched()) > > > 165 goto end; > > > 166 > > > 167 cpu = smp_processor_id(); > > > 168 BUG_ON(local_softirq_pending()); > > > > Hmm, the BUG_ON is inside of an interrupt disabled region, so we should > > have bailed out early in the need_resched() check above (because we are > > in the idle task context according to the stack trace). > > > > Is this reproducible ? > > Seen this too (Ubuntu, P4/ht-SMT, SATA, typed from screen): Can you please apply the patch below, so we can at least see, which softirq is pending. This should trigger independently of hrtimers and dynticks. You can keep it compiled in and disable it at the kernel commandline with "nohz=off" and / or "highres=off" tglx diff --git a/arch/i386/kernel/process.c b/arch/i386/kernel/process.c index bea304d..deeb90e 100644 --- a/arch/i386/kernel/process.c +++ b/arch/i386/kernel/process.c @@ -236,6 +236,11 @@ void cpu_idle(void) * Otherwise, idle callbacks can misfire. */ local_irq_disable(); + if (local_softirq_pending() && !need_resched()) + printk(KERN_ERR + "Idle: local softirq pending: %04x", + local_softirq_pending()); + enter_idle(); idle(); __exit_idle(); diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 95e41f7..91d459f 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -165,7 +165,6 @@ void tick_nohz_stop_sched_tick(void) goto end; cpu = smp_processor_id(); - BUG_ON(local_softirq_pending()); now = ktime_get(); /* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168
On Sat, 2007-02-17 at 15:47 +0100, Alex Riesen wrote: 164 if (need_resched()) 165 goto end; 166 167 cpu = smp_processor_id(); 168 BUG_ON(local_softirq_pending()); Hmm, the BUG_ON is inside of an interrupt disabled region, so we should have bailed out early in the need_resched() check above (because we are in the idle task context according to the stack trace). Is this reproducible ? Seen this too (Ubuntu, P4/ht-SMT, SATA, typed from screen): Can you please apply the patch below, so we can at least see, which softirq is pending. This should trigger independently of hrtimers and dynticks. You can keep it compiled in and disable it at the kernel commandline with nohz=off and / or highres=off tglx diff --git a/arch/i386/kernel/process.c b/arch/i386/kernel/process.c index bea304d..deeb90e 100644 --- a/arch/i386/kernel/process.c +++ b/arch/i386/kernel/process.c @@ -236,6 +236,11 @@ void cpu_idle(void) * Otherwise, idle callbacks can misfire. */ local_irq_disable(); + if (local_softirq_pending() !need_resched()) + printk(KERN_ERR + Idle: local softirq pending: %04x, + local_softirq_pending()); + enter_idle(); idle(); __exit_idle(); diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 95e41f7..91d459f 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -165,7 +165,6 @@ void tick_nohz_stop_sched_tick(void) goto end; cpu = smp_processor_id(); - BUG_ON(local_softirq_pending()); now = ktime_get(); /* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168
On Sat, 2007-02-17 at 17:46 +0100, Alex Riesen wrote: Can you please apply the patch below, so we can at least see, which softirq is pending. This should trigger independently of hrtimers and dynticks. You can keep it compiled in and disable it at the kernel commandline with nohz=off and / or highres=off It did, only one time: Idle: local softirq pending: 00206USB Universal Host Controller Interface driver v3.0 0x20 is the TASKLET_SOFTIRQ. I have no idea yet, how this can happen. Can you please check, if this happens when you add nohz=off to the kernel command line. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168
On 17/02/07, Alex Riesen [EMAIL PROTECTED] wrote: Thomas Gleixner, Sat, Feb 17, 2007 16:14:17 +0100: On Sat, 2007-02-17 at 15:47 +0100, Alex Riesen wrote: 164 if (need_resched()) 165 goto end; 166 167 cpu = smp_processor_id(); 168 BUG_ON(local_softirq_pending()); Hmm, the BUG_ON is inside of an interrupt disabled region, so we should have bailed out early in the need_resched() check above (because we are in the idle task context according to the stack trace). Is this reproducible ? Seen this too (Ubuntu, P4/ht-SMT, SATA, typed from screen): Can you please apply the patch below, so we can at least see, which softirq is pending. This should trigger independently of hrtimers and dynticks. You can keep it compiled in and disable it at the kernel commandline with nohz=off and / or highres=off It did, only one time: Idle: local softirq pending: 00206USB Universal Host Controller Interface driver v3.0 sudo cat /var/log/messages | grep Idle Feb 17 17:35:34 bitis-gabonica kernel: Idle: local softirq pending: 00206hdd: ATAPI 48X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache, UDMA(33) Feb 17 19:20:01 bitis-gabonica kernel: Idle: local softirq pending: 00203Idle: local softirq pending: 00203Idle: local softirq pending: 00207PM: Removing info for No Bus:vcs7 cat /proc/interrupts CPU0 CPU1 0: 232838 0 IO-APIC-edge timer 1:401 0 IO-APIC-edge i8042 7: 0 0 IO-APIC-edge parport0 8: 1 0 IO-APIC-edge rtc 9: 1 0 IO-APIC-fasteoi acpi 12: 4 0 IO-APIC-edge i8042 14:319 0 IO-APIC-edge ide0 15: 1612 0 IO-APIC-edge ide1 16: 16494 0 IO-APIC-fasteoi uhci_hcd:usb3, libata 17: 4670 0 IO-APIC-fasteoi uhci_hcd:usb1, uhci_hcd:usb4 18: 0 0 IO-APIC-fasteoi uhci_hcd:usb2 19: 2 0 IO-APIC-fasteoi ehci_hcd:usb5 20: 2822 0 IO-APIC-fasteoi Intel ICH5 21: 2881 0 IO-APIC-fasteoi eth1 22: 58 0 IO-APIC-fasteoi eth0 NMI: 0 0 LOC: 232562 232846 ERR: 0 MIS: 0 I can confirm that this is ICH5 SATA controller problem. Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168
On 16/02/07, Thomas Gleixner <[EMAIL PROTECTED]> wrote: On Fri, 2007-02-16 at 21:38 +0100, Michal Piotrowski wrote: > Hi, > > This looks like a tickless stuff Yup. > 0xc0139ea0 is in tick_nohz_stop_sched_tick (/mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168). > 163 > 164 if (need_resched()) > 165 goto end; > 166 > 167 cpu = smp_processor_id(); > 168 BUG_ON(local_softirq_pending()); Hmm, the BUG_ON is inside of an interrupt disabled region, so we should have bailed out early in the need_resched() check above (because we are in the idle task context according to the stack trace). Is this reproducible ? Yes. tglx Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168
On Fri, 2007-02-16 at 21:38 +0100, Michal Piotrowski wrote: > Hi, > > This looks like a tickless stuff Yup. > 0xc0139ea0 is in tick_nohz_stop_sched_tick > (/mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168). > 163 > 164 if (need_resched()) > 165 goto end; > 166 > 167 cpu = smp_processor_id(); > 168 BUG_ON(local_softirq_pending()); Hmm, the BUG_ON is inside of an interrupt disabled region, so we should have bailed out early in the need_resched() check above (because we are in the idle task context according to the stack trace). Is this reproducible ? tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168
Hi, This looks like a tickless stuff [ cut here ] kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168! invalid opcode: [#1] PREEMPT SMP Modules linked in: rtc unix CPU:0 EIP:0060:[]Not tainted VLI EFLAGS: 00010002 (2.6.20 #53) EIP is at tick_nohz_stop_sched_tick+0x5d/0x1d9 eax: ebx: c6021100 ecx: 0001 edx: c03e74e0 esi: c6022c40 edi: ebp: c041dfac esp: c041df78 ds: 007b es: 007b fs: 00d8 gs: ss: 0068 Process swapper (pid: 0, ti=c041d000 task=c03e74e0 task.ti=c041d000) Stack: c03e7608 c6021b20 c041dfac 0296 0001 c602007b c01000d8 c0451304 c6021100 c0102b42 c041dfc0 c01023b4 c602ee80 c0453ae4 c602b864 c041dfc8 c010141d c041dff8 c0421a52 c038d4da c04211ed Call Trace: [] show_trace_log_lvl+0x1a/0x2f [] show_stack_log_lvl+0x9d/0xa5 [] show_registers+0x1ed/0x32c [] die+0x118/0x22f [] do_trap+0x79/0x91 [] do_invalid_op+0x97/0xa1 [] error_code+0x7c/0x84 [] cpu_idle+0x1d/0xfb [] rest_init+0x37/0x3a [] start_kernel+0x459/0x461 [<>] 0x0 === Code: f0 ff ff 8b 40 08 a8 08 0f 85 74 01 00 00 e8 74 e7 0b 00 89 c7 bb 80 13 45 c0 e8 68 e7 0b 00 03 1c 85 00 a6 41 c0 83 3b 00 74 04 <0f> 0b eb fe 8d 45 ec e8 bf c0 ff ff 8b 4d ec 8b 5d f0 83 7e 3c EIP: [] tick_nohz_stop_sched_tick+0x5d/0x1d9 SS:ESP 0068:c041df78 Kernel panic - not syncing: Attempted to kill the idle task! BUG: at /mnt/md0/devel/linux-git/arch/i386/kernel/smp.c:547 smp_call_function() [] show_trace_log_lvl+0x1a/0x2f [] show_trace+0x12/0x14 [] dump_stack+0x16/0x18 [] smp_call_function+0x61/0x102 [] smp_send_stop+0x1e/0x31 [] panic+0x5e/0xfb [] do_exit+0x7a/0x702 [] die+0x209/0x22f [] do_trap+0x79/0x91 [] do_invalid_op+0x97/0xa1 [] error_code+0x7c/0x84 [] cpu_idle+0x1d/0xfb [] rest_init+0x37/0x3a [] start_kernel+0x459/0x461 [<>] 0x0 === 0xc0139ea0 is in tick_nohz_stop_sched_tick (/mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168). 163 164 if (need_resched()) 165 goto end; 166 167 cpu = smp_processor_id(); 168 BUG_ON(local_softirq_pending()); 169 170 now = ktime_get(); 171 /* 172 * When called from irq_exit we need to account the idle sleep time http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-git13/boot.log http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-git13/git-config Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168
Hi, This looks like a tickless stuff [ cut here ] kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168! invalid opcode: [#1] PREEMPT SMP Modules linked in: rtc unix CPU:0 EIP:0060:[c0139ea0]Not tainted VLI EFLAGS: 00010002 (2.6.20 #53) EIP is at tick_nohz_stop_sched_tick+0x5d/0x1d9 eax: ebx: c6021100 ecx: 0001 edx: c03e74e0 esi: c6022c40 edi: ebp: c041dfac esp: c041df78 ds: 007b es: 007b fs: 00d8 gs: ss: 0068 Process swapper (pid: 0, ti=c041d000 task=c03e74e0 task.ti=c041d000) Stack: c03e7608 c6021b20 c041dfac 0296 0001 c602007b c01000d8 c0451304 c6021100 c0102b42 c041dfc0 c01023b4 c602ee80 c0453ae4 c602b864 c041dfc8 c010141d c041dff8 c0421a52 c038d4da c04211ed Call Trace: [c01050f9] show_trace_log_lvl+0x1a/0x2f [c01051ab] show_stack_log_lvl+0x9d/0xa5 [c01053a0] show_registers+0x1ed/0x32c [c01055f7] die+0x118/0x22f [c0105787] do_trap+0x79/0x91 [c0106017] do_invalid_op+0x97/0xa1 [c0311abc] error_code+0x7c/0x84 [c01023b4] cpu_idle+0x1d/0xfb [c010141d] rest_init+0x37/0x3a [c0421a52] start_kernel+0x459/0x461 [] 0x0 === Code: f0 ff ff 8b 40 08 a8 08 0f 85 74 01 00 00 e8 74 e7 0b 00 89 c7 bb 80 13 45 c0 e8 68 e7 0b 00 03 1c 85 00 a6 41 c0 83 3b 00 74 04 0f 0b eb fe 8d 45 ec e8 bf c0 ff ff 8b 4d ec 8b 5d f0 83 7e 3c EIP: [c0139ea0] tick_nohz_stop_sched_tick+0x5d/0x1d9 SS:ESP 0068:c041df78 Kernel panic - not syncing: Attempted to kill the idle task! BUG: at /mnt/md0/devel/linux-git/arch/i386/kernel/smp.c:547 smp_call_function() [c01050f9] show_trace_log_lvl+0x1a/0x2f [c01057e0] show_trace+0x12/0x14 [c0105892] dump_stack+0x16/0x18 [c0112792] smp_call_function+0x61/0x102 [c0112851] smp_send_stop+0x1e/0x31 [c0121ae3] panic+0x5e/0xfb [c012463b] do_exit+0x7a/0x702 [c01056e8] die+0x209/0x22f [c0105787] do_trap+0x79/0x91 [c0106017] do_invalid_op+0x97/0xa1 [c0311abc] error_code+0x7c/0x84 [c01023b4] cpu_idle+0x1d/0xfb [c010141d] rest_init+0x37/0x3a [c0421a52] start_kernel+0x459/0x461 [] 0x0 === 0xc0139ea0 is in tick_nohz_stop_sched_tick (/mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168). 163 164 if (need_resched()) 165 goto end; 166 167 cpu = smp_processor_id(); 168 BUG_ON(local_softirq_pending()); 169 170 now = ktime_get(); 171 /* 172 * When called from irq_exit we need to account the idle sleep time http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-git13/boot.log http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-git13/git-config Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168
On Fri, 2007-02-16 at 21:38 +0100, Michal Piotrowski wrote: Hi, This looks like a tickless stuff Yup. 0xc0139ea0 is in tick_nohz_stop_sched_tick (/mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168). 163 164 if (need_resched()) 165 goto end; 166 167 cpu = smp_processor_id(); 168 BUG_ON(local_softirq_pending()); Hmm, the BUG_ON is inside of an interrupt disabled region, so we should have bailed out early in the need_resched() check above (because we are in the idle task context according to the stack trace). Is this reproducible ? tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168
On 16/02/07, Thomas Gleixner [EMAIL PROTECTED] wrote: On Fri, 2007-02-16 at 21:38 +0100, Michal Piotrowski wrote: Hi, This looks like a tickless stuff Yup. 0xc0139ea0 is in tick_nohz_stop_sched_tick (/mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168). 163 164 if (need_resched()) 165 goto end; 166 167 cpu = smp_processor_id(); 168 BUG_ON(local_softirq_pending()); Hmm, the BUG_ON is inside of an interrupt disabled region, so we should have bailed out early in the need_resched() check above (because we are in the idle task context according to the stack trace). Is this reproducible ? Yes. tglx Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/