Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Tue, 1 May 2007 09:22:33 -0700 Randy Dunlap wrote: > On Tue, 1 May 2007 08:22:58 +0200 Andi Kleen wrote: > > > On Mon, Apr 30, 2007 at 10:16:24PM -0700, Randy Dunlap wrote: > > > On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote: > > > > > > > > Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at > > > > > randomish times (presumably in the timer irq handler) when netconsole > > > > > and > > > > > printk-time are enabled. > > > > > > > > A backtrace would be good. Does nmi_watchdog=2 show anything > > > > interesting or if not sysrq-t? > > > > > > I can't get anything from sysrq or nmi_watchdog. > > > > Hmm, ok when the console locks up those likely don't work. > > > > > > > > > > I was hitting the same thing on i386 uniprocessor, but I thought it > > > > > got > > > > > fixed. > > > > > > > > Yes. > > > > > > Fixed where? Merged into mainline or in your firstfloor patches? > > > > None of the sched-clock changes are in mainline yet. > > > > Can you perhaps test latest firstfloor alone (without rest of -mm)? > > OK. so your 2.6.21-rc7-git5 patch, applied to 2.6.21-git4 or > applied to 2.6.21-rc7-git5 ? Applied cleanly to 2.6.21-rc7-git5, but it has build errors: arch/x86_64/mm/built-in.o: In function `mark_rodata_ro': (.text+0x180): undefined reference to `_stext' arch/x86_64/mm/built-in.o: In function `mem_init': (.init.text+0x2cf): undefined reference to `_stext' arch/x86_64/mm/built-in.o: In function `do_page_fault': (.kprobes.text+0x59c): undefined reference to `_stext' arch/x86_64/vdso/built-in.o: In function `arch_setup_additional_pages': (.text+0x40): undefined reference to `vdso_end' arch/x86_64/vdso/built-in.o: In function `arch_setup_additional_pages': (.text+0x58): undefined reference to `vdso_start' arch/x86_64/vdso/built-in.o: In function `init_vdso_vars': vma.c:(.init.text+0x1b): undefined reference to `vdso_end' vma.c:(.init.text+0x26): undefined reference to `vdso_start' vma.c:(.init.text+0x3c): undefined reference to `vdso_start' kernel/built-in.o: In function `profile_hits': (.text+0x9609): undefined reference to `_stext' kernel/built-in.o: In function `core_kernel_text': (.text+0x197c4): undefined reference to `_stext' kernel/built-in.o: In function `is_ksym_addr': kallsyms.c:(.text+0x27042): undefined reference to `_stext' kernel/built-in.o: In function `profile_init': (.init.text+0xc57): undefined reference to `_stext' make: *** [.tmp_vmlinux1] Error 1 --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Tue, 1 May 2007 08:22:58 +0200 Andi Kleen wrote: > On Mon, Apr 30, 2007 at 10:16:24PM -0700, Randy Dunlap wrote: > > On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote: > > > > > > Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at > > > > randomish times (presumably in the timer irq handler) when netconsole > > > > and > > > > printk-time are enabled. > > > > > > A backtrace would be good. Does nmi_watchdog=2 show anything > > > interesting or if not sysrq-t? > > > > I can't get anything from sysrq or nmi_watchdog. > > Hmm, ok when the console locks up those likely don't work. > > > > > > > I was hitting the same thing on i386 uniprocessor, but I thought it got > > > > fixed. > > > > > > Yes. > > > > Fixed where? Merged into mainline or in your firstfloor patches? > > None of the sched-clock changes are in mainline yet. > > Can you perhaps test latest firstfloor alone (without rest of -mm)? OK. so your 2.6.21-rc7-git5 patch, applied to 2.6.21-git4 or applied to 2.6.21-rc7-git5 ? --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Mon, 30 Apr 2007 22:38:59 -0700 Andrew Morton wrote: > On Tue, 1 May 2007 08:24:56 +0200 Andi Kleen <[EMAIL PROTECTED]> wrote: > > > > The bug is in firstfloor only, and the fix (if present) will be there too. > > > > > > > > > > > > Nope, > > > > > > ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/sched-clock-share > > > > > > is identical to > > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/x86_64-mm-sched-clock-share.patch > > > > Or perhaps the deadlock is in the cpufrequency handler. Does it happen > > without CONFIG_CPUFREQ > > too? > > > > [cpufreq handler calls ktime_get which might take xtime lock for reading] > > > > Sounds right. That's what was happening to me for a while. > > Randy, it'd be interesting to try: > > --- a/arch/x86_64/kernel/tsc.c~a > +++ a/arch/x86_64/kernel/tsc.c > @@ -84,8 +84,8 @@ static int time_cpufreq_notifier(struct > cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new); > > tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new); > - if (!(freq->flags & CPUFREQ_CONST_LOOPS)) > - mark_tsc_unstable("cpufreq changes"); > +// if (!(freq->flags & CPUFREQ_CONST_LOOPS)) > +// mark_tsc_unstable("cpufreq changes"); > } > > return 0; > _ I don't have CPU_FREQ enabled, so that didn't change anything. > and if that "fixes" it, disable netconsole and do > > --- a/arch/x86_64/kernel/tsc.c~a > +++ a/arch/x86_64/kernel/tsc.c > @@ -85,7 +85,7 @@ static int time_cpufreq_notifier(struct > > tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new); > if (!(freq->flags & CPUFREQ_CONST_LOOPS)) > - mark_tsc_unstable("cpufreq changes"); > + dump_stack(); > } > > return 0; --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Mon, 30 Apr 2007 22:38:59 -0700 Andrew Morton wrote: On Tue, 1 May 2007 08:24:56 +0200 Andi Kleen [EMAIL PROTECTED] wrote: The bug is in firstfloor only, and the fix (if present) will be there too. checks Nope, ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/sched-clock-share is identical to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/x86_64-mm-sched-clock-share.patch Or perhaps the deadlock is in the cpufrequency handler. Does it happen without CONFIG_CPUFREQ too? [cpufreq handler calls ktime_get which might take xtime lock for reading] Sounds right. That's what was happening to me for a while. Randy, it'd be interesting to try: --- a/arch/x86_64/kernel/tsc.c~a +++ a/arch/x86_64/kernel/tsc.c @@ -84,8 +84,8 @@ static int time_cpufreq_notifier(struct cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq-new); tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq-new); - if (!(freq-flags CPUFREQ_CONST_LOOPS)) - mark_tsc_unstable(cpufreq changes); +// if (!(freq-flags CPUFREQ_CONST_LOOPS)) +// mark_tsc_unstable(cpufreq changes); } return 0; _ I don't have CPU_FREQ enabled, so that didn't change anything. and if that fixes it, disable netconsole and do --- a/arch/x86_64/kernel/tsc.c~a +++ a/arch/x86_64/kernel/tsc.c @@ -85,7 +85,7 @@ static int time_cpufreq_notifier(struct tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq-new); if (!(freq-flags CPUFREQ_CONST_LOOPS)) - mark_tsc_unstable(cpufreq changes); + dump_stack(); } return 0; --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Tue, 1 May 2007 08:22:58 +0200 Andi Kleen wrote: On Mon, Apr 30, 2007 at 10:16:24PM -0700, Randy Dunlap wrote: On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote: Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at randomish times (presumably in the timer irq handler) when netconsole and printk-time are enabled. A backtrace would be good. Does nmi_watchdog=2 show anything interesting or if not sysrq-t? I can't get anything from sysrq or nmi_watchdog. Hmm, ok when the console locks up those likely don't work. I was hitting the same thing on i386 uniprocessor, but I thought it got fixed. Yes. Fixed where? Merged into mainline or in your firstfloor patches? None of the sched-clock changes are in mainline yet. Can you perhaps test latest firstfloor alone (without rest of -mm)? OK. so your 2.6.21-rc7-git5 patch, applied to 2.6.21-git4 or applied to 2.6.21-rc7-git5 ? --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Tue, 1 May 2007 09:22:33 -0700 Randy Dunlap wrote: On Tue, 1 May 2007 08:22:58 +0200 Andi Kleen wrote: On Mon, Apr 30, 2007 at 10:16:24PM -0700, Randy Dunlap wrote: On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote: Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at randomish times (presumably in the timer irq handler) when netconsole and printk-time are enabled. A backtrace would be good. Does nmi_watchdog=2 show anything interesting or if not sysrq-t? I can't get anything from sysrq or nmi_watchdog. Hmm, ok when the console locks up those likely don't work. I was hitting the same thing on i386 uniprocessor, but I thought it got fixed. Yes. Fixed where? Merged into mainline or in your firstfloor patches? None of the sched-clock changes are in mainline yet. Can you perhaps test latest firstfloor alone (without rest of -mm)? OK. so your 2.6.21-rc7-git5 patch, applied to 2.6.21-git4 or applied to 2.6.21-rc7-git5 ? Applied cleanly to 2.6.21-rc7-git5, but it has build errors: arch/x86_64/mm/built-in.o: In function `mark_rodata_ro': (.text+0x180): undefined reference to `_stext' arch/x86_64/mm/built-in.o: In function `mem_init': (.init.text+0x2cf): undefined reference to `_stext' arch/x86_64/mm/built-in.o: In function `do_page_fault': (.kprobes.text+0x59c): undefined reference to `_stext' arch/x86_64/vdso/built-in.o: In function `arch_setup_additional_pages': (.text+0x40): undefined reference to `vdso_end' arch/x86_64/vdso/built-in.o: In function `arch_setup_additional_pages': (.text+0x58): undefined reference to `vdso_start' arch/x86_64/vdso/built-in.o: In function `init_vdso_vars': vma.c:(.init.text+0x1b): undefined reference to `vdso_end' vma.c:(.init.text+0x26): undefined reference to `vdso_start' vma.c:(.init.text+0x3c): undefined reference to `vdso_start' kernel/built-in.o: In function `profile_hits': (.text+0x9609): undefined reference to `_stext' kernel/built-in.o: In function `core_kernel_text': (.text+0x197c4): undefined reference to `_stext' kernel/built-in.o: In function `is_ksym_addr': kallsyms.c:(.text+0x27042): undefined reference to `_stext' kernel/built-in.o: In function `profile_init': (.init.text+0xc57): undefined reference to `_stext' make: *** [.tmp_vmlinux1] Error 1 --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Tue, 1 May 2007 08:24:56 +0200 Andi Kleen <[EMAIL PROTECTED]> wrote: > > The bug is in firstfloor only, and the fix (if present) will be there too. > > > > > > > > Nope, > > > > ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/sched-clock-share > > > > is identical to > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/x86_64-mm-sched-clock-share.patch > > Or perhaps the deadlock is in the cpufrequency handler. Does it happen > without CONFIG_CPUFREQ > too? > > [cpufreq handler calls ktime_get which might take xtime lock for reading] > Sounds right. That's what was happening to me for a while. Randy, it'd be interesting to try: --- a/arch/x86_64/kernel/tsc.c~a +++ a/arch/x86_64/kernel/tsc.c @@ -84,8 +84,8 @@ static int time_cpufreq_notifier(struct cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new); tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new); - if (!(freq->flags & CPUFREQ_CONST_LOOPS)) - mark_tsc_unstable("cpufreq changes"); +// if (!(freq->flags & CPUFREQ_CONST_LOOPS)) +// mark_tsc_unstable("cpufreq changes"); } return 0; _ and if that "fixes" it, disable netconsole and do --- a/arch/x86_64/kernel/tsc.c~a +++ a/arch/x86_64/kernel/tsc.c @@ -85,7 +85,7 @@ static int time_cpufreq_notifier(struct tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new); if (!(freq->flags & CPUFREQ_CONST_LOOPS)) - mark_tsc_unstable("cpufreq changes"); + dump_stack(); } return 0; _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
> The bug is in firstfloor only, and the fix (if present) will be there too. > > > > Nope, > > ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/sched-clock-share > > is identical to > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/x86_64-mm-sched-clock-share.patch Or perhaps the deadlock is in the cpufrequency handler. Does it happen without CONFIG_CPUFREQ too? [cpufreq handler calls ktime_get which might take xtime lock for reading] -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Mon, 30 Apr 2007 22:16:24 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote: > > > I was hitting the same thing on i386 uniprocessor, but I thought it got > > > fixed. > > > > Yes. > > Fixed where? Merged into mainline or in your firstfloor patches? The bug is in firstfloor only, and the fix (if present) will be there too. Nope, ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/sched-clock-share is identical to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/x86_64-mm-sched-clock-share.patch - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Mon, Apr 30, 2007 at 10:16:24PM -0700, Randy Dunlap wrote: > On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote: > > > > Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at > > > randomish times (presumably in the timer irq handler) when netconsole and > > > printk-time are enabled. > > > > A backtrace would be good. Does nmi_watchdog=2 show anything > > interesting or if not sysrq-t? > > I can't get anything from sysrq or nmi_watchdog. Hmm, ok when the console locks up those likely don't work. > > > > I was hitting the same thing on i386 uniprocessor, but I thought it got > > > fixed. > > > > Yes. > > Fixed where? Merged into mainline or in your firstfloor patches? None of the sched-clock changes are in mainline yet. Can you perhaps test latest firstfloor alone (without rest of -mm)? -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote: > > Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at > > randomish times (presumably in the timer irq handler) when netconsole and > > printk-time are enabled. > > A backtrace would be good. Does nmi_watchdog=2 show anything > interesting or if not sysrq-t? I can't get anything from sysrq or nmi_watchdog. > > I was hitting the same thing on i386 uniprocessor, but I thought it got > > fixed. > > Yes. Fixed where? Merged into mainline or in your firstfloor patches? > My current sched_clock does not take any locks anymore and it was removed > from the cpufreq handler too. --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
> Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at > randomish times (presumably in the timer irq handler) when netconsole and > printk-time are enabled. A backtrace would be good. Does nmi_watchdog=2 show anything interesting or if not sysrq-t? > > I was hitting the same thing on i386 uniprocessor, but I thought it got > fixed. Yes. My current sched_clock does not take any locks anymore and it was removed from the cpufreq handler too. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Mon, 30 Apr 2007 17:45:55 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote: > Andrew Morton wrote: > > On Mon, 30 Apr 2007 16:51:01 -0700 > > Randy Dunlap <[EMAIL PROTECTED]> wrote: > > > >> On Mon, 30 Apr 2007 08:16:53 -0700 Randy Dunlap wrote: > >> > >>> On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote: > >>> > On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap <[EMAIL PROTECTED]> > wrote: > > > On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote: > > > >> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/ > > I'm getting a hang near the end of booting on x86_64 UP. > > The last initcall_debug function varies. E.g.: > > > > 1/ > > [0.140257] Calling initcall 0x806f2fa8: > > init_misc_binfmt+0x0/0x3f() > > [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() > > returned 0. > > [0.140275] initcall 0x806f2fa8 ran for 0 msecs: > > init_misc_binfmt+0x0/0x3f() > > [0.140284] Calling initcall 0x806f2fe7: > > init_script_binfmt+0x0/0x12() > > [0.140293] initcall 0x806f2fe7: > > init_script_binfmt+0x0/0x12() returned 0. > > [0.140302] initcall 0x806f2fe7 ran for 0 msecs: > > init_script_binfmt+0x0/0x12() > > [0.140310] Calling initcall 0x806f2ff9: > > init_elf_binfmt+0x0/0x12() > > [0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() > > returned 0. > > [0.140326] initcall 0x806f2ff9 ran for 0 msecs: > > init_elf_binfmt+0x0/0x12() > > [0.140335] Calling initcall 0x806f3de9: > > debugfs_init+0x0/0x4a() > > [0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() > > returned 0. > > [0.140351] initcall 0x806f3de9 ran for 0 msecs: > > debugfs_init+0x0/0x4a() > > > > 2/ > > [0.140206] Calling initcall 0x806efeb1: > > ksysfs_init+0x0/0x29() > > [0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() > > returned 0. > > [0.140222] initcall 0x806efeb1 ran for 0 msecs: > > ksysfs_init+0x0/0x29() > > [0.140230] Calling initcall 0x806f25be: > > filelock_init+0x0/0x31() > > [0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() > > returned 0. > > [0.140249] initcall 0x806f25be ran for 0 msecs: > > filelock_init+0x0/0x31() > > [0.140258] Calling initcall 0x806f2fa8: > > init_misc_binfmt+0x0/0x3f() > > [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() > > returned 0. > > [0.140276] initcall 0x806f2fa8 ran for 0 msecs: > > init_misc_binfmt+0x0/0x3f() > > [0.140284] Calling initcall 0x806f2fe7: > > init_script_binfmt+0x0/0x12() > > [0.140293] initcall 0x806f2fe7: > > init_script_binfmt+0x0/0x12() returned 0. > > > So perhaps it locks during a timer interrupt. > > > .config is attached. > > > > Any ideas/suggestions? > Just the usual: nothing from sysrq or NMI watchdog? > >>> Nothing from either of those. I'll jiggle some config options. > >> config option changes didn't help, but removing > >>netconsole= > >> from the kernel command line makes it all happy. :( > > > > argh. > > > >> Do we know of netconsole hang problems? (anyone?) > > > > You have "time" as well? I found on i386 uniproc that time+netconsole > > caused hangs because the printk timestamping code was taking > > xtime_lock for reading inside a write_seqlock. But I though that Andi > > fixed that. Perhaps i386 got fixed but x86_64 did not. > > Yes, I have CONFIG_PRINTK_TIME=y and disabling it allows it to boot. Thanks. > > Maybe the patch isn't merged yet? Could be. I don't recall whether Andi's statement was before or after 2.6.21-rc7-mm2 actually. > Now if I can just remember this until the next time that I hit it... Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at randomish times (presumably in the timer irq handler) when netconsole and printk-time are enabled. I was hitting the same thing on i386 uniprocessor, but I thought it got fixed. The problem was that the printable string which is newly passed to mark_tsc_unstable() is printed out inside write_seqlock(xtime_lock) but printk timestamping (and perhaps netconsole tx?) want to take xtime_lock for reading, which will hang. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
Andrew Morton wrote: On Mon, 30 Apr 2007 16:51:01 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote: On Mon, 30 Apr 2007 08:16:53 -0700 Randy Dunlap wrote: On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote: On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote: On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/ I'm getting a hang near the end of booting on x86_64 UP. The last initcall_debug function varies. E.g.: 1/ [0.140257] Calling initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() returned 0. [0.140275] initcall 0x806f2fa8 ran for 0 msecs: init_misc_binfmt+0x0/0x3f() [0.140284] Calling initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() returned 0. [0.140302] initcall 0x806f2fe7 ran for 0 msecs: init_script_binfmt+0x0/0x12() [0.140310] Calling initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() [0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() returned 0. [0.140326] initcall 0x806f2ff9 ran for 0 msecs: init_elf_binfmt+0x0/0x12() [0.140335] Calling initcall 0x806f3de9: debugfs_init+0x0/0x4a() [0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() returned 0. [0.140351] initcall 0x806f3de9 ran for 0 msecs: debugfs_init+0x0/0x4a() 2/ [0.140206] Calling initcall 0x806efeb1: ksysfs_init+0x0/0x29() [0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() returned 0. [0.140222] initcall 0x806efeb1 ran for 0 msecs: ksysfs_init+0x0/0x29() [0.140230] Calling initcall 0x806f25be: filelock_init+0x0/0x31() [0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() returned 0. [0.140249] initcall 0x806f25be ran for 0 msecs: filelock_init+0x0/0x31() [0.140258] Calling initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() returned 0. [0.140276] initcall 0x806f2fa8 ran for 0 msecs: init_misc_binfmt+0x0/0x3f() [0.140284] Calling initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() returned 0. So perhaps it locks during a timer interrupt. .config is attached. Any ideas/suggestions? Just the usual: nothing from sysrq or NMI watchdog? Nothing from either of those. I'll jiggle some config options. config option changes didn't help, but removing netconsole= from the kernel command line makes it all happy. :( argh. Do we know of netconsole hang problems? (anyone?) You have "time" as well? I found on i386 uniproc that time+netconsole caused hangs because the printk timestamping code was taking xtime_lock for reading inside a write_seqlock. But I though that Andi fixed that. Perhaps i386 got fixed but x86_64 did not. Yes, I have CONFIG_PRINTK_TIME=y and disabling it allows it to boot. Thanks. Maybe the patch isn't merged yet? Now if I can just remember this until the next time that I hit it... -- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Mon, 30 Apr 2007 16:51:01 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote: > On Mon, 30 Apr 2007 08:16:53 -0700 Randy Dunlap wrote: > > > On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote: > > > > > On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote: > > > > > > > On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote: > > > > > > > > > > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/ > > > > > > > > I'm getting a hang near the end of booting on x86_64 UP. > > > > The last initcall_debug function varies. E.g.: > > > > > > > > 1/ > > > > [0.140257] Calling initcall 0x806f2fa8: > > > > init_misc_binfmt+0x0/0x3f() > > > > [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() > > > > returned 0. > > > > [0.140275] initcall 0x806f2fa8 ran for 0 msecs: > > > > init_misc_binfmt+0x0/0x3f() > > > > [0.140284] Calling initcall 0x806f2fe7: > > > > init_script_binfmt+0x0/0x12() > > > > [0.140293] initcall 0x806f2fe7: > > > > init_script_binfmt+0x0/0x12() returned 0. > > > > [0.140302] initcall 0x806f2fe7 ran for 0 msecs: > > > > init_script_binfmt+0x0/0x12() > > > > [0.140310] Calling initcall 0x806f2ff9: > > > > init_elf_binfmt+0x0/0x12() > > > > [0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() > > > > returned 0. > > > > [0.140326] initcall 0x806f2ff9 ran for 0 msecs: > > > > init_elf_binfmt+0x0/0x12() > > > > [0.140335] Calling initcall 0x806f3de9: > > > > debugfs_init+0x0/0x4a() > > > > [0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() > > > > returned 0. > > > > [0.140351] initcall 0x806f3de9 ran for 0 msecs: > > > > debugfs_init+0x0/0x4a() > > > > > > > > 2/ > > > > [0.140206] Calling initcall 0x806efeb1: > > > > ksysfs_init+0x0/0x29() > > > > [0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() > > > > returned 0. > > > > [0.140222] initcall 0x806efeb1 ran for 0 msecs: > > > > ksysfs_init+0x0/0x29() > > > > [0.140230] Calling initcall 0x806f25be: > > > > filelock_init+0x0/0x31() > > > > [0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() > > > > returned 0. > > > > [0.140249] initcall 0x806f25be ran for 0 msecs: > > > > filelock_init+0x0/0x31() > > > > [0.140258] Calling initcall 0x806f2fa8: > > > > init_misc_binfmt+0x0/0x3f() > > > > [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() > > > > returned 0. > > > > [0.140276] initcall 0x806f2fa8 ran for 0 msecs: > > > > init_misc_binfmt+0x0/0x3f() > > > > [0.140284] Calling initcall 0x806f2fe7: > > > > init_script_binfmt+0x0/0x12() > > > > [0.140293] initcall 0x806f2fe7: > > > > init_script_binfmt+0x0/0x12() returned 0. > > > > > > > > > > So perhaps it locks during a timer interrupt. > > > > > > > .config is attached. > > > > > > > > Any ideas/suggestions? > > > > > > Just the usual: nothing from sysrq or NMI watchdog? > > > > Nothing from either of those. I'll jiggle some config options. > > config option changes didn't help, but removing > netconsole= > from the kernel command line makes it all happy. :( argh. > Do we know of netconsole hang problems? (anyone?) You have "time" as well? I found on i386 uniproc that time+netconsole caused hangs because the printk timestamping code was taking xtime_lock for reading inside a write_seqlock. But I though that Andi fixed that. Perhaps i386 got fixed but x86_64 did not. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Mon, 30 Apr 2007 08:16:53 -0700 Randy Dunlap wrote: > On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote: > > > On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote: > > > > > On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote: > > > > > > > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/ > > > > > > I'm getting a hang near the end of booting on x86_64 UP. > > > The last initcall_debug function varies. E.g.: > > > > > > 1/ > > > [0.140257] Calling initcall 0x806f2fa8: > > > init_misc_binfmt+0x0/0x3f() > > > [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() > > > returned 0. > > > [0.140275] initcall 0x806f2fa8 ran for 0 msecs: > > > init_misc_binfmt+0x0/0x3f() > > > [0.140284] Calling initcall 0x806f2fe7: > > > init_script_binfmt+0x0/0x12() > > > [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() > > > returned 0. > > > [0.140302] initcall 0x806f2fe7 ran for 0 msecs: > > > init_script_binfmt+0x0/0x12() > > > [0.140310] Calling initcall 0x806f2ff9: > > > init_elf_binfmt+0x0/0x12() > > > [0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() > > > returned 0. > > > [0.140326] initcall 0x806f2ff9 ran for 0 msecs: > > > init_elf_binfmt+0x0/0x12() > > > [0.140335] Calling initcall 0x806f3de9: > > > debugfs_init+0x0/0x4a() > > > [0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() > > > returned 0. > > > [0.140351] initcall 0x806f3de9 ran for 0 msecs: > > > debugfs_init+0x0/0x4a() > > > > > > 2/ > > > [0.140206] Calling initcall 0x806efeb1: ksysfs_init+0x0/0x29() > > > [0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() > > > returned 0. > > > [0.140222] initcall 0x806efeb1 ran for 0 msecs: > > > ksysfs_init+0x0/0x29() > > > [0.140230] Calling initcall 0x806f25be: > > > filelock_init+0x0/0x31() > > > [0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() > > > returned 0. > > > [0.140249] initcall 0x806f25be ran for 0 msecs: > > > filelock_init+0x0/0x31() > > > [0.140258] Calling initcall 0x806f2fa8: > > > init_misc_binfmt+0x0/0x3f() > > > [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() > > > returned 0. > > > [0.140276] initcall 0x806f2fa8 ran for 0 msecs: > > > init_misc_binfmt+0x0/0x3f() > > > [0.140284] Calling initcall 0x806f2fe7: > > > init_script_binfmt+0x0/0x12() > > > [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() > > > returned 0. > > > > > > > So perhaps it locks during a timer interrupt. > > > > > .config is attached. > > > > > > Any ideas/suggestions? > > > > Just the usual: nothing from sysrq or NMI watchdog? > > Nothing from either of those. I'll jiggle some config options. config option changes didn't help, but removing netconsole= from the kernel command line makes it all happy. :( Do we know of netconsole hang problems? (anyone?) --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Mon, 30 Apr 2007 08:16:53 -0700 Randy Dunlap wrote: On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote: On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap [EMAIL PROTECTED] wrote: On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/ I'm getting a hang near the end of booting on x86_64 UP. The last initcall_debug function varies. E.g.: 1/ [0.140257] Calling initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() returned 0. [0.140275] initcall 0x806f2fa8 ran for 0 msecs: init_misc_binfmt+0x0/0x3f() [0.140284] Calling initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() returned 0. [0.140302] initcall 0x806f2fe7 ran for 0 msecs: init_script_binfmt+0x0/0x12() [0.140310] Calling initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() [0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() returned 0. [0.140326] initcall 0x806f2ff9 ran for 0 msecs: init_elf_binfmt+0x0/0x12() [0.140335] Calling initcall 0x806f3de9: debugfs_init+0x0/0x4a() [0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() returned 0. [0.140351] initcall 0x806f3de9 ran for 0 msecs: debugfs_init+0x0/0x4a() 2/ [0.140206] Calling initcall 0x806efeb1: ksysfs_init+0x0/0x29() [0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() returned 0. [0.140222] initcall 0x806efeb1 ran for 0 msecs: ksysfs_init+0x0/0x29() [0.140230] Calling initcall 0x806f25be: filelock_init+0x0/0x31() [0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() returned 0. [0.140249] initcall 0x806f25be ran for 0 msecs: filelock_init+0x0/0x31() [0.140258] Calling initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() returned 0. [0.140276] initcall 0x806f2fa8 ran for 0 msecs: init_misc_binfmt+0x0/0x3f() [0.140284] Calling initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() returned 0. So perhaps it locks during a timer interrupt. .config is attached. Any ideas/suggestions? Just the usual: nothing from sysrq or NMI watchdog? Nothing from either of those. I'll jiggle some config options. config option changes didn't help, but removing netconsole=params from the kernel command line makes it all happy. :( Do we know of netconsole hang problems? (anyone?) --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Mon, 30 Apr 2007 16:51:01 -0700 Randy Dunlap [EMAIL PROTECTED] wrote: On Mon, 30 Apr 2007 08:16:53 -0700 Randy Dunlap wrote: On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote: On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap [EMAIL PROTECTED] wrote: On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/ I'm getting a hang near the end of booting on x86_64 UP. The last initcall_debug function varies. E.g.: 1/ [0.140257] Calling initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() returned 0. [0.140275] initcall 0x806f2fa8 ran for 0 msecs: init_misc_binfmt+0x0/0x3f() [0.140284] Calling initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() returned 0. [0.140302] initcall 0x806f2fe7 ran for 0 msecs: init_script_binfmt+0x0/0x12() [0.140310] Calling initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() [0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() returned 0. [0.140326] initcall 0x806f2ff9 ran for 0 msecs: init_elf_binfmt+0x0/0x12() [0.140335] Calling initcall 0x806f3de9: debugfs_init+0x0/0x4a() [0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() returned 0. [0.140351] initcall 0x806f3de9 ran for 0 msecs: debugfs_init+0x0/0x4a() 2/ [0.140206] Calling initcall 0x806efeb1: ksysfs_init+0x0/0x29() [0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() returned 0. [0.140222] initcall 0x806efeb1 ran for 0 msecs: ksysfs_init+0x0/0x29() [0.140230] Calling initcall 0x806f25be: filelock_init+0x0/0x31() [0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() returned 0. [0.140249] initcall 0x806f25be ran for 0 msecs: filelock_init+0x0/0x31() [0.140258] Calling initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() returned 0. [0.140276] initcall 0x806f2fa8 ran for 0 msecs: init_misc_binfmt+0x0/0x3f() [0.140284] Calling initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() returned 0. So perhaps it locks during a timer interrupt. .config is attached. Any ideas/suggestions? Just the usual: nothing from sysrq or NMI watchdog? Nothing from either of those. I'll jiggle some config options. config option changes didn't help, but removing netconsole=params from the kernel command line makes it all happy. :( argh. Do we know of netconsole hang problems? (anyone?) You have time as well? I found on i386 uniproc that time+netconsole caused hangs because the printk timestamping code was taking xtime_lock for reading inside a write_seqlock. But I though that Andi fixed that. Perhaps i386 got fixed but x86_64 did not. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
Andrew Morton wrote: On Mon, 30 Apr 2007 16:51:01 -0700 Randy Dunlap [EMAIL PROTECTED] wrote: On Mon, 30 Apr 2007 08:16:53 -0700 Randy Dunlap wrote: On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote: On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap [EMAIL PROTECTED] wrote: On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/ I'm getting a hang near the end of booting on x86_64 UP. The last initcall_debug function varies. E.g.: 1/ [0.140257] Calling initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() returned 0. [0.140275] initcall 0x806f2fa8 ran for 0 msecs: init_misc_binfmt+0x0/0x3f() [0.140284] Calling initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() returned 0. [0.140302] initcall 0x806f2fe7 ran for 0 msecs: init_script_binfmt+0x0/0x12() [0.140310] Calling initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() [0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() returned 0. [0.140326] initcall 0x806f2ff9 ran for 0 msecs: init_elf_binfmt+0x0/0x12() [0.140335] Calling initcall 0x806f3de9: debugfs_init+0x0/0x4a() [0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() returned 0. [0.140351] initcall 0x806f3de9 ran for 0 msecs: debugfs_init+0x0/0x4a() 2/ [0.140206] Calling initcall 0x806efeb1: ksysfs_init+0x0/0x29() [0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() returned 0. [0.140222] initcall 0x806efeb1 ran for 0 msecs: ksysfs_init+0x0/0x29() [0.140230] Calling initcall 0x806f25be: filelock_init+0x0/0x31() [0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() returned 0. [0.140249] initcall 0x806f25be ran for 0 msecs: filelock_init+0x0/0x31() [0.140258] Calling initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() returned 0. [0.140276] initcall 0x806f2fa8 ran for 0 msecs: init_misc_binfmt+0x0/0x3f() [0.140284] Calling initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() returned 0. So perhaps it locks during a timer interrupt. .config is attached. Any ideas/suggestions? Just the usual: nothing from sysrq or NMI watchdog? Nothing from either of those. I'll jiggle some config options. config option changes didn't help, but removing netconsole=params from the kernel command line makes it all happy. :( argh. Do we know of netconsole hang problems? (anyone?) You have time as well? I found on i386 uniproc that time+netconsole caused hangs because the printk timestamping code was taking xtime_lock for reading inside a write_seqlock. But I though that Andi fixed that. Perhaps i386 got fixed but x86_64 did not. Yes, I have CONFIG_PRINTK_TIME=y and disabling it allows it to boot. Thanks. Maybe the patch isn't merged yet? Now if I can just remember this until the next time that I hit it... -- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Mon, 30 Apr 2007 17:45:55 -0700 Randy Dunlap [EMAIL PROTECTED] wrote: Andrew Morton wrote: On Mon, 30 Apr 2007 16:51:01 -0700 Randy Dunlap [EMAIL PROTECTED] wrote: On Mon, 30 Apr 2007 08:16:53 -0700 Randy Dunlap wrote: On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote: On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap [EMAIL PROTECTED] wrote: On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/ I'm getting a hang near the end of booting on x86_64 UP. The last initcall_debug function varies. E.g.: 1/ [0.140257] Calling initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() returned 0. [0.140275] initcall 0x806f2fa8 ran for 0 msecs: init_misc_binfmt+0x0/0x3f() [0.140284] Calling initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() returned 0. [0.140302] initcall 0x806f2fe7 ran for 0 msecs: init_script_binfmt+0x0/0x12() [0.140310] Calling initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() [0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() returned 0. [0.140326] initcall 0x806f2ff9 ran for 0 msecs: init_elf_binfmt+0x0/0x12() [0.140335] Calling initcall 0x806f3de9: debugfs_init+0x0/0x4a() [0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() returned 0. [0.140351] initcall 0x806f3de9 ran for 0 msecs: debugfs_init+0x0/0x4a() 2/ [0.140206] Calling initcall 0x806efeb1: ksysfs_init+0x0/0x29() [0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() returned 0. [0.140222] initcall 0x806efeb1 ran for 0 msecs: ksysfs_init+0x0/0x29() [0.140230] Calling initcall 0x806f25be: filelock_init+0x0/0x31() [0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() returned 0. [0.140249] initcall 0x806f25be ran for 0 msecs: filelock_init+0x0/0x31() [0.140258] Calling initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() returned 0. [0.140276] initcall 0x806f2fa8 ran for 0 msecs: init_misc_binfmt+0x0/0x3f() [0.140284] Calling initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() returned 0. So perhaps it locks during a timer interrupt. .config is attached. Any ideas/suggestions? Just the usual: nothing from sysrq or NMI watchdog? Nothing from either of those. I'll jiggle some config options. config option changes didn't help, but removing netconsole=params from the kernel command line makes it all happy. :( argh. Do we know of netconsole hang problems? (anyone?) You have time as well? I found on i386 uniproc that time+netconsole caused hangs because the printk timestamping code was taking xtime_lock for reading inside a write_seqlock. But I though that Andi fixed that. Perhaps i386 got fixed but x86_64 did not. Yes, I have CONFIG_PRINTK_TIME=y and disabling it allows it to boot. Thanks. Maybe the patch isn't merged yet? Could be. I don't recall whether Andi's statement was before or after 2.6.21-rc7-mm2 actually. Now if I can just remember this until the next time that I hit it... Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at randomish times (presumably in the timer irq handler) when netconsole and printk-time are enabled. I was hitting the same thing on i386 uniprocessor, but I thought it got fixed. The problem was that the printable string which is newly passed to mark_tsc_unstable() is printed out inside write_seqlock(xtime_lock) but printk timestamping (and perhaps netconsole tx?) want to take xtime_lock for reading, which will hang. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at randomish times (presumably in the timer irq handler) when netconsole and printk-time are enabled. A backtrace would be good. Does nmi_watchdog=2 show anything interesting or if not sysrq-t? I was hitting the same thing on i386 uniprocessor, but I thought it got fixed. Yes. My current sched_clock does not take any locks anymore and it was removed from the cpufreq handler too. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote: Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at randomish times (presumably in the timer irq handler) when netconsole and printk-time are enabled. A backtrace would be good. Does nmi_watchdog=2 show anything interesting or if not sysrq-t? I can't get anything from sysrq or nmi_watchdog. I was hitting the same thing on i386 uniprocessor, but I thought it got fixed. Yes. Fixed where? Merged into mainline or in your firstfloor patches? My current sched_clock does not take any locks anymore and it was removed from the cpufreq handler too. --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Mon, 30 Apr 2007 22:16:24 -0700 Randy Dunlap [EMAIL PROTECTED] wrote: I was hitting the same thing on i386 uniprocessor, but I thought it got fixed. Yes. Fixed where? Merged into mainline or in your firstfloor patches? The bug is in firstfloor only, and the fix (if present) will be there too. checks Nope, ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/sched-clock-share is identical to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/x86_64-mm-sched-clock-share.patch - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Mon, Apr 30, 2007 at 10:16:24PM -0700, Randy Dunlap wrote: On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote: Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at randomish times (presumably in the timer irq handler) when netconsole and printk-time are enabled. A backtrace would be good. Does nmi_watchdog=2 show anything interesting or if not sysrq-t? I can't get anything from sysrq or nmi_watchdog. Hmm, ok when the console locks up those likely don't work. I was hitting the same thing on i386 uniprocessor, but I thought it got fixed. Yes. Fixed where? Merged into mainline or in your firstfloor patches? None of the sched-clock changes are in mainline yet. Can you perhaps test latest firstfloor alone (without rest of -mm)? -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
The bug is in firstfloor only, and the fix (if present) will be there too. checks Nope, ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/sched-clock-share is identical to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/x86_64-mm-sched-clock-share.patch Or perhaps the deadlock is in the cpufrequency handler. Does it happen without CONFIG_CPUFREQ too? [cpufreq handler calls ktime_get which might take xtime lock for reading] -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Tue, 1 May 2007 08:24:56 +0200 Andi Kleen [EMAIL PROTECTED] wrote: The bug is in firstfloor only, and the fix (if present) will be there too. checks Nope, ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/sched-clock-share is identical to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/x86_64-mm-sched-clock-share.patch Or perhaps the deadlock is in the cpufrequency handler. Does it happen without CONFIG_CPUFREQ too? [cpufreq handler calls ktime_get which might take xtime lock for reading] Sounds right. That's what was happening to me for a while. Randy, it'd be interesting to try: --- a/arch/x86_64/kernel/tsc.c~a +++ a/arch/x86_64/kernel/tsc.c @@ -84,8 +84,8 @@ static int time_cpufreq_notifier(struct cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq-new); tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq-new); - if (!(freq-flags CPUFREQ_CONST_LOOPS)) - mark_tsc_unstable(cpufreq changes); +// if (!(freq-flags CPUFREQ_CONST_LOOPS)) +// mark_tsc_unstable(cpufreq changes); } return 0; _ and if that fixes it, disable netconsole and do --- a/arch/x86_64/kernel/tsc.c~a +++ a/arch/x86_64/kernel/tsc.c @@ -85,7 +85,7 @@ static int time_cpufreq_notifier(struct tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq-new); if (!(freq-flags CPUFREQ_CONST_LOOPS)) - mark_tsc_unstable(cpufreq changes); + dump_stack(); } return 0; _ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/