Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-05-01 Thread Randy Dunlap
On Tue, 1 May 2007 09:22:33 -0700 Randy Dunlap wrote:

> On Tue, 1 May 2007 08:22:58 +0200 Andi Kleen wrote:
> 
> > On Mon, Apr 30, 2007 at 10:16:24PM -0700, Randy Dunlap wrote:
> > > On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote:
> > > 
> > > > > Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at
> > > > > randomish times (presumably in the timer irq handler) when netconsole 
> > > > > and
> > > > > printk-time are enabled.
> > > > 
> > > > A backtrace would be good. Does nmi_watchdog=2 show anything
> > > > interesting or if not sysrq-t?
> > > 
> > > I can't get anything from sysrq or nmi_watchdog.
> > 
> > Hmm, ok when the console locks up those likely don't work.
> > 
> > > 
> > > > > I was hitting the same thing on i386 uniprocessor, but I thought it 
> > > > > got
> > > > > fixed.
> > > > 
> > > > Yes.
> > > 
> > > Fixed where?  Merged into mainline or in your firstfloor patches?
> > 
> > None of the sched-clock changes are in mainline yet.
> > 
> > Can you perhaps test latest firstfloor alone (without rest of -mm)?
> 
> OK.  so your 2.6.21-rc7-git5 patch, applied to 2.6.21-git4 or
> applied to 2.6.21-rc7-git5 ?

Applied cleanly to 2.6.21-rc7-git5, but it has build errors:


arch/x86_64/mm/built-in.o: In function `mark_rodata_ro':
(.text+0x180): undefined reference to `_stext'
arch/x86_64/mm/built-in.o: In function `mem_init':
(.init.text+0x2cf): undefined reference to `_stext'
arch/x86_64/mm/built-in.o: In function `do_page_fault':
(.kprobes.text+0x59c): undefined reference to `_stext'
arch/x86_64/vdso/built-in.o: In function `arch_setup_additional_pages':
(.text+0x40): undefined reference to `vdso_end'
arch/x86_64/vdso/built-in.o: In function `arch_setup_additional_pages':
(.text+0x58): undefined reference to `vdso_start'
arch/x86_64/vdso/built-in.o: In function `init_vdso_vars':
vma.c:(.init.text+0x1b): undefined reference to `vdso_end'
vma.c:(.init.text+0x26): undefined reference to `vdso_start'
vma.c:(.init.text+0x3c): undefined reference to `vdso_start'
kernel/built-in.o: In function `profile_hits':
(.text+0x9609): undefined reference to `_stext'
kernel/built-in.o: In function `core_kernel_text':
(.text+0x197c4): undefined reference to `_stext'
kernel/built-in.o: In function `is_ksym_addr':
kallsyms.c:(.text+0x27042): undefined reference to `_stext'
kernel/built-in.o: In function `profile_init':
(.init.text+0xc57): undefined reference to `_stext'
make: *** [.tmp_vmlinux1] Error 1

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-05-01 Thread Randy Dunlap
On Tue, 1 May 2007 08:22:58 +0200 Andi Kleen wrote:

> On Mon, Apr 30, 2007 at 10:16:24PM -0700, Randy Dunlap wrote:
> > On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote:
> > 
> > > > Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at
> > > > randomish times (presumably in the timer irq handler) when netconsole 
> > > > and
> > > > printk-time are enabled.
> > > 
> > > A backtrace would be good. Does nmi_watchdog=2 show anything
> > > interesting or if not sysrq-t?
> > 
> > I can't get anything from sysrq or nmi_watchdog.
> 
> Hmm, ok when the console locks up those likely don't work.
> 
> > 
> > > > I was hitting the same thing on i386 uniprocessor, but I thought it got
> > > > fixed.
> > > 
> > > Yes.
> > 
> > Fixed where?  Merged into mainline or in your firstfloor patches?
> 
> None of the sched-clock changes are in mainline yet.
> 
> Can you perhaps test latest firstfloor alone (without rest of -mm)?

OK.  so your 2.6.21-rc7-git5 patch, applied to 2.6.21-git4 or
applied to 2.6.21-rc7-git5 ?

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-05-01 Thread Randy Dunlap
On Mon, 30 Apr 2007 22:38:59 -0700 Andrew Morton wrote:

> On Tue, 1 May 2007 08:24:56 +0200 Andi Kleen <[EMAIL PROTECTED]> wrote:
> 
> > > The bug is in firstfloor only, and the fix (if present) will be there too.
> > > 
> > > 
> > > 
> > > Nope,
> > > 
> > > ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/sched-clock-share
> > > 
> > > is identical to
> > > 
> > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/x86_64-mm-sched-clock-share.patch
> > 
> > Or perhaps the deadlock is in the cpufrequency handler. Does it happen 
> > without CONFIG_CPUFREQ
> > too?
> > 
> > [cpufreq handler calls ktime_get which might take xtime lock for reading] 
> > 
> 
> Sounds right.  That's what was happening to me for a while.
> 
> Randy, it'd be interesting to try:
> 
> --- a/arch/x86_64/kernel/tsc.c~a
> +++ a/arch/x86_64/kernel/tsc.c
> @@ -84,8 +84,8 @@ static int time_cpufreq_notifier(struct 
>   cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);
>  
>   tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new);
> - if (!(freq->flags & CPUFREQ_CONST_LOOPS))
> - mark_tsc_unstable("cpufreq changes");
> +//   if (!(freq->flags & CPUFREQ_CONST_LOOPS))
> +//   mark_tsc_unstable("cpufreq changes");
>   }
>  
>   return 0;
> _

I don't have CPU_FREQ enabled, so that didn't change anything.


> and if that "fixes" it, disable netconsole and do
> 
> --- a/arch/x86_64/kernel/tsc.c~a
> +++ a/arch/x86_64/kernel/tsc.c
> @@ -85,7 +85,7 @@ static int time_cpufreq_notifier(struct 
>  
>   tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new);
>   if (!(freq->flags & CPUFREQ_CONST_LOOPS))
> - mark_tsc_unstable("cpufreq changes");
> + dump_stack();
>   }
>  
>   return 0;


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-05-01 Thread Randy Dunlap
On Mon, 30 Apr 2007 22:38:59 -0700 Andrew Morton wrote:

 On Tue, 1 May 2007 08:24:56 +0200 Andi Kleen [EMAIL PROTECTED] wrote:
 
   The bug is in firstfloor only, and the fix (if present) will be there too.
   
   checks
   
   Nope,
   
   ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/sched-clock-share
   
   is identical to
   
   ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/x86_64-mm-sched-clock-share.patch
  
  Or perhaps the deadlock is in the cpufrequency handler. Does it happen 
  without CONFIG_CPUFREQ
  too?
  
  [cpufreq handler calls ktime_get which might take xtime lock for reading] 
  
 
 Sounds right.  That's what was happening to me for a while.
 
 Randy, it'd be interesting to try:
 
 --- a/arch/x86_64/kernel/tsc.c~a
 +++ a/arch/x86_64/kernel/tsc.c
 @@ -84,8 +84,8 @@ static int time_cpufreq_notifier(struct 
   cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq-new);
  
   tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq-new);
 - if (!(freq-flags  CPUFREQ_CONST_LOOPS))
 - mark_tsc_unstable(cpufreq changes);
 +//   if (!(freq-flags  CPUFREQ_CONST_LOOPS))
 +//   mark_tsc_unstable(cpufreq changes);
   }
  
   return 0;
 _

I don't have CPU_FREQ enabled, so that didn't change anything.


 and if that fixes it, disable netconsole and do
 
 --- a/arch/x86_64/kernel/tsc.c~a
 +++ a/arch/x86_64/kernel/tsc.c
 @@ -85,7 +85,7 @@ static int time_cpufreq_notifier(struct 
  
   tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq-new);
   if (!(freq-flags  CPUFREQ_CONST_LOOPS))
 - mark_tsc_unstable(cpufreq changes);
 + dump_stack();
   }
  
   return 0;


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-05-01 Thread Randy Dunlap
On Tue, 1 May 2007 08:22:58 +0200 Andi Kleen wrote:

 On Mon, Apr 30, 2007 at 10:16:24PM -0700, Randy Dunlap wrote:
  On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote:
  
Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at
randomish times (presumably in the timer irq handler) when netconsole 
and
printk-time are enabled.
   
   A backtrace would be good. Does nmi_watchdog=2 show anything
   interesting or if not sysrq-t?
  
  I can't get anything from sysrq or nmi_watchdog.
 
 Hmm, ok when the console locks up those likely don't work.
 
  
I was hitting the same thing on i386 uniprocessor, but I thought it got
fixed.
   
   Yes.
  
  Fixed where?  Merged into mainline or in your firstfloor patches?
 
 None of the sched-clock changes are in mainline yet.
 
 Can you perhaps test latest firstfloor alone (without rest of -mm)?

OK.  so your 2.6.21-rc7-git5 patch, applied to 2.6.21-git4 or
applied to 2.6.21-rc7-git5 ?

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-05-01 Thread Randy Dunlap
On Tue, 1 May 2007 09:22:33 -0700 Randy Dunlap wrote:

 On Tue, 1 May 2007 08:22:58 +0200 Andi Kleen wrote:
 
  On Mon, Apr 30, 2007 at 10:16:24PM -0700, Randy Dunlap wrote:
   On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote:
   
 Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at
 randomish times (presumably in the timer irq handler) when netconsole 
 and
 printk-time are enabled.

A backtrace would be good. Does nmi_watchdog=2 show anything
interesting or if not sysrq-t?
   
   I can't get anything from sysrq or nmi_watchdog.
  
  Hmm, ok when the console locks up those likely don't work.
  
   
 I was hitting the same thing on i386 uniprocessor, but I thought it 
 got
 fixed.

Yes.
   
   Fixed where?  Merged into mainline or in your firstfloor patches?
  
  None of the sched-clock changes are in mainline yet.
  
  Can you perhaps test latest firstfloor alone (without rest of -mm)?
 
 OK.  so your 2.6.21-rc7-git5 patch, applied to 2.6.21-git4 or
 applied to 2.6.21-rc7-git5 ?

Applied cleanly to 2.6.21-rc7-git5, but it has build errors:


arch/x86_64/mm/built-in.o: In function `mark_rodata_ro':
(.text+0x180): undefined reference to `_stext'
arch/x86_64/mm/built-in.o: In function `mem_init':
(.init.text+0x2cf): undefined reference to `_stext'
arch/x86_64/mm/built-in.o: In function `do_page_fault':
(.kprobes.text+0x59c): undefined reference to `_stext'
arch/x86_64/vdso/built-in.o: In function `arch_setup_additional_pages':
(.text+0x40): undefined reference to `vdso_end'
arch/x86_64/vdso/built-in.o: In function `arch_setup_additional_pages':
(.text+0x58): undefined reference to `vdso_start'
arch/x86_64/vdso/built-in.o: In function `init_vdso_vars':
vma.c:(.init.text+0x1b): undefined reference to `vdso_end'
vma.c:(.init.text+0x26): undefined reference to `vdso_start'
vma.c:(.init.text+0x3c): undefined reference to `vdso_start'
kernel/built-in.o: In function `profile_hits':
(.text+0x9609): undefined reference to `_stext'
kernel/built-in.o: In function `core_kernel_text':
(.text+0x197c4): undefined reference to `_stext'
kernel/built-in.o: In function `is_ksym_addr':
kallsyms.c:(.text+0x27042): undefined reference to `_stext'
kernel/built-in.o: In function `profile_init':
(.init.text+0xc57): undefined reference to `_stext'
make: *** [.tmp_vmlinux1] Error 1

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Andrew Morton
On Tue, 1 May 2007 08:24:56 +0200 Andi Kleen <[EMAIL PROTECTED]> wrote:

> > The bug is in firstfloor only, and the fix (if present) will be there too.
> > 
> > 
> > 
> > Nope,
> > 
> > ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/sched-clock-share
> > 
> > is identical to
> > 
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/x86_64-mm-sched-clock-share.patch
> 
> Or perhaps the deadlock is in the cpufrequency handler. Does it happen 
> without CONFIG_CPUFREQ
> too?
> 
> [cpufreq handler calls ktime_get which might take xtime lock for reading] 
> 

Sounds right.  That's what was happening to me for a while.

Randy, it'd be interesting to try:

--- a/arch/x86_64/kernel/tsc.c~a
+++ a/arch/x86_64/kernel/tsc.c
@@ -84,8 +84,8 @@ static int time_cpufreq_notifier(struct 
cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);
 
tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new);
-   if (!(freq->flags & CPUFREQ_CONST_LOOPS))
-   mark_tsc_unstable("cpufreq changes");
+// if (!(freq->flags & CPUFREQ_CONST_LOOPS))
+// mark_tsc_unstable("cpufreq changes");
}
 
return 0;
_

and if that "fixes" it, disable netconsole and do

--- a/arch/x86_64/kernel/tsc.c~a
+++ a/arch/x86_64/kernel/tsc.c
@@ -85,7 +85,7 @@ static int time_cpufreq_notifier(struct 
 
tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new);
if (!(freq->flags & CPUFREQ_CONST_LOOPS))
-   mark_tsc_unstable("cpufreq changes");
+   dump_stack();
}
 
return 0;
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Andi Kleen
> The bug is in firstfloor only, and the fix (if present) will be there too.
> 
> 
> 
> Nope,
> 
> ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/sched-clock-share
> 
> is identical to
> 
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/x86_64-mm-sched-clock-share.patch

Or perhaps the deadlock is in the cpufrequency handler. Does it happen without 
CONFIG_CPUFREQ
too?

[cpufreq handler calls ktime_get which might take xtime lock for reading] 

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Andrew Morton
On Mon, 30 Apr 2007 22:16:24 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote:

> > > I was hitting the same thing on i386 uniprocessor, but I thought it got
> > > fixed.
> > 
> > Yes.
> 
> Fixed where?  Merged into mainline or in your firstfloor patches?

The bug is in firstfloor only, and the fix (if present) will be there too.



Nope,

ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/sched-clock-share

is identical to

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/x86_64-mm-sched-clock-share.patch


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Andi Kleen
On Mon, Apr 30, 2007 at 10:16:24PM -0700, Randy Dunlap wrote:
> On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote:
> 
> > > Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at
> > > randomish times (presumably in the timer irq handler) when netconsole and
> > > printk-time are enabled.
> > 
> > A backtrace would be good. Does nmi_watchdog=2 show anything
> > interesting or if not sysrq-t?
> 
> I can't get anything from sysrq or nmi_watchdog.

Hmm, ok when the console locks up those likely don't work.

> 
> > > I was hitting the same thing on i386 uniprocessor, but I thought it got
> > > fixed.
> > 
> > Yes.
> 
> Fixed where?  Merged into mainline or in your firstfloor patches?

None of the sched-clock changes are in mainline yet.

Can you perhaps test latest firstfloor alone (without rest of -mm)?

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Randy Dunlap
On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote:

> > Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at
> > randomish times (presumably in the timer irq handler) when netconsole and
> > printk-time are enabled.
> 
> A backtrace would be good. Does nmi_watchdog=2 show anything
> interesting or if not sysrq-t?

I can't get anything from sysrq or nmi_watchdog.

> > I was hitting the same thing on i386 uniprocessor, but I thought it got
> > fixed.
> 
> Yes.

Fixed where?  Merged into mainline or in your firstfloor patches?

> My current sched_clock does not take any locks anymore and it was removed
> from the cpufreq handler too.


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Andi Kleen
> Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at
> randomish times (presumably in the timer irq handler) when netconsole and
> printk-time are enabled.

A backtrace would be good. Does nmi_watchdog=2 show anything
interesting or if not sysrq-t?

> 
> I was hitting the same thing on i386 uniprocessor, but I thought it got
> fixed.

Yes.

My current sched_clock does not take any locks anymore and it was removed
from the cpufreq handler too.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Andrew Morton
On Mon, 30 Apr 2007 17:45:55 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote:

> Andrew Morton wrote:
> > On Mon, 30 Apr 2007 16:51:01 -0700
> > Randy Dunlap <[EMAIL PROTECTED]> wrote:
> > 
> >> On Mon, 30 Apr 2007 08:16:53 -0700 Randy Dunlap wrote:
> >>
> >>> On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote:
> >>>
>  On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap <[EMAIL PROTECTED]> 
>  wrote:
> 
> > On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote:
> >
> >> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/
> > I'm getting a hang near the end of booting on x86_64 UP.
> > The last initcall_debug function varies.  E.g.:
> >
> > 1/
> > [0.140257] Calling initcall 0x806f2fa8: 
> > init_misc_binfmt+0x0/0x3f()
> > [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
> > returned 0.
> > [0.140275] initcall 0x806f2fa8 ran for 0 msecs: 
> > init_misc_binfmt+0x0/0x3f()
> > [0.140284] Calling initcall 0x806f2fe7: 
> > init_script_binfmt+0x0/0x12()
> > [0.140293] initcall 0x806f2fe7: 
> > init_script_binfmt+0x0/0x12() returned 0.
> > [0.140302] initcall 0x806f2fe7 ran for 0 msecs: 
> > init_script_binfmt+0x0/0x12()
> > [0.140310] Calling initcall 0x806f2ff9: 
> > init_elf_binfmt+0x0/0x12()
> > [0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() 
> > returned 0.
> > [0.140326] initcall 0x806f2ff9 ran for 0 msecs: 
> > init_elf_binfmt+0x0/0x12()
> > [0.140335] Calling initcall 0x806f3de9: 
> > debugfs_init+0x0/0x4a()
> > [0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() 
> > returned 0.
> > [0.140351] initcall 0x806f3de9 ran for 0 msecs: 
> > debugfs_init+0x0/0x4a()
> >
> > 2/
> > [0.140206] Calling initcall 0x806efeb1: 
> > ksysfs_init+0x0/0x29()
> > [0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() 
> > returned 0.
> > [0.140222] initcall 0x806efeb1 ran for 0 msecs: 
> > ksysfs_init+0x0/0x29()
> > [0.140230] Calling initcall 0x806f25be: 
> > filelock_init+0x0/0x31()
> > [0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() 
> > returned 0.
> > [0.140249] initcall 0x806f25be ran for 0 msecs: 
> > filelock_init+0x0/0x31()
> > [0.140258] Calling initcall 0x806f2fa8: 
> > init_misc_binfmt+0x0/0x3f()
> > [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
> > returned 0.
> > [0.140276] initcall 0x806f2fa8 ran for 0 msecs: 
> > init_misc_binfmt+0x0/0x3f()
> > [0.140284] Calling initcall 0x806f2fe7: 
> > init_script_binfmt+0x0/0x12()
> > [0.140293] initcall 0x806f2fe7: 
> > init_script_binfmt+0x0/0x12() returned 0.
> >
>  So perhaps it locks during a timer interrupt.
> 
> > .config is attached.
> >
> > Any ideas/suggestions?
>  Just the usual: nothing from sysrq or NMI watchdog?
> >>> Nothing from either of those.  I'll jiggle some config options.
> >> config option changes didn't help, but removing
> >>netconsole=
> >> from the kernel command line makes it all happy.  :(
> > 
> > argh.
> > 
> >> Do we know of netconsole hang problems?  (anyone?)
> > 
> > You have "time" as well?  I found on i386 uniproc that time+netconsole
> > caused hangs because the printk timestamping code was taking
> > xtime_lock for reading inside a write_seqlock.  But I though that Andi
> > fixed that.  Perhaps i386 got fixed but x86_64 did not.
> 
> Yes, I have CONFIG_PRINTK_TIME=y and disabling it allows it to boot.  Thanks.
> 
> Maybe the patch isn't merged yet?

Could be.  I don't recall whether Andi's statement was before or after
2.6.21-rc7-mm2 actually.

> Now if I can just remember this until the next time that I hit it...

Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at
randomish times (presumably in the timer irq handler) when netconsole and
printk-time are enabled.

I was hitting the same thing on i386 uniprocessor, but I thought it got
fixed.

The problem was that the printable string which is newly passed to
mark_tsc_unstable() is printed out inside write_seqlock(xtime_lock) but
printk timestamping (and perhaps netconsole tx?) want to take xtime_lock
for reading, which will hang.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Randy Dunlap

Andrew Morton wrote:

On Mon, 30 Apr 2007 16:51:01 -0700
Randy Dunlap <[EMAIL PROTECTED]> wrote:


On Mon, 30 Apr 2007 08:16:53 -0700 Randy Dunlap wrote:


On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote:


On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote:


On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote:


ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/

I'm getting a hang near the end of booting on x86_64 UP.
The last initcall_debug function varies.  E.g.:

1/
[0.140257] Calling initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f()
[0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
returned 0.
[0.140275] initcall 0x806f2fa8 ran for 0 msecs: 
init_misc_binfmt+0x0/0x3f()
[0.140284] Calling initcall 0x806f2fe7: 
init_script_binfmt+0x0/0x12()
[0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() 
returned 0.
[0.140302] initcall 0x806f2fe7 ran for 0 msecs: 
init_script_binfmt+0x0/0x12()
[0.140310] Calling initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12()
[0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() returned 
0.
[0.140326] initcall 0x806f2ff9 ran for 0 msecs: 
init_elf_binfmt+0x0/0x12()
[0.140335] Calling initcall 0x806f3de9: debugfs_init+0x0/0x4a()
[0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() returned 0.
[0.140351] initcall 0x806f3de9 ran for 0 msecs: 
debugfs_init+0x0/0x4a()

2/
[0.140206] Calling initcall 0x806efeb1: ksysfs_init+0x0/0x29()
[0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() returned 0.
[0.140222] initcall 0x806efeb1 ran for 0 msecs: 
ksysfs_init+0x0/0x29()
[0.140230] Calling initcall 0x806f25be: filelock_init+0x0/0x31()
[0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() returned 0.
[0.140249] initcall 0x806f25be ran for 0 msecs: 
filelock_init+0x0/0x31()
[0.140258] Calling initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f()
[0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
returned 0.
[0.140276] initcall 0x806f2fa8 ran for 0 msecs: 
init_misc_binfmt+0x0/0x3f()
[0.140284] Calling initcall 0x806f2fe7: 
init_script_binfmt+0x0/0x12()
[0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() 
returned 0.


So perhaps it locks during a timer interrupt.


.config is attached.

Any ideas/suggestions?

Just the usual: nothing from sysrq or NMI watchdog?

Nothing from either of those.  I'll jiggle some config options.

config option changes didn't help, but removing
netconsole=
from the kernel command line makes it all happy.  :(


argh.


Do we know of netconsole hang problems?  (anyone?)


You have "time" as well?  I found on i386 uniproc that time+netconsole
caused hangs because the printk timestamping code was taking
xtime_lock for reading inside a write_seqlock.  But I though that Andi
fixed that.  Perhaps i386 got fixed but x86_64 did not.


Yes, I have CONFIG_PRINTK_TIME=y and disabling it allows it to boot.  Thanks.

Maybe the patch isn't merged yet?

Now if I can just remember this until the next time that I hit it...

--
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Andrew Morton
On Mon, 30 Apr 2007 16:51:01 -0700
Randy Dunlap <[EMAIL PROTECTED]> wrote:

> On Mon, 30 Apr 2007 08:16:53 -0700 Randy Dunlap wrote:
> 
> > On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote:
> > 
> > > On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote:
> > > 
> > > > On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote:
> > > > 
> > > > > 
> > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/
> > > > 
> > > > I'm getting a hang near the end of booting on x86_64 UP.
> > > > The last initcall_debug function varies.  E.g.:
> > > > 
> > > > 1/
> > > > [0.140257] Calling initcall 0x806f2fa8: 
> > > > init_misc_binfmt+0x0/0x3f()
> > > > [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
> > > > returned 0.
> > > > [0.140275] initcall 0x806f2fa8 ran for 0 msecs: 
> > > > init_misc_binfmt+0x0/0x3f()
> > > > [0.140284] Calling initcall 0x806f2fe7: 
> > > > init_script_binfmt+0x0/0x12()
> > > > [0.140293] initcall 0x806f2fe7: 
> > > > init_script_binfmt+0x0/0x12() returned 0.
> > > > [0.140302] initcall 0x806f2fe7 ran for 0 msecs: 
> > > > init_script_binfmt+0x0/0x12()
> > > > [0.140310] Calling initcall 0x806f2ff9: 
> > > > init_elf_binfmt+0x0/0x12()
> > > > [0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() 
> > > > returned 0.
> > > > [0.140326] initcall 0x806f2ff9 ran for 0 msecs: 
> > > > init_elf_binfmt+0x0/0x12()
> > > > [0.140335] Calling initcall 0x806f3de9: 
> > > > debugfs_init+0x0/0x4a()
> > > > [0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() 
> > > > returned 0.
> > > > [0.140351] initcall 0x806f3de9 ran for 0 msecs: 
> > > > debugfs_init+0x0/0x4a()
> > > > 
> > > > 2/
> > > > [0.140206] Calling initcall 0x806efeb1: 
> > > > ksysfs_init+0x0/0x29()
> > > > [0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() 
> > > > returned 0.
> > > > [0.140222] initcall 0x806efeb1 ran for 0 msecs: 
> > > > ksysfs_init+0x0/0x29()
> > > > [0.140230] Calling initcall 0x806f25be: 
> > > > filelock_init+0x0/0x31()
> > > > [0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() 
> > > > returned 0.
> > > > [0.140249] initcall 0x806f25be ran for 0 msecs: 
> > > > filelock_init+0x0/0x31()
> > > > [0.140258] Calling initcall 0x806f2fa8: 
> > > > init_misc_binfmt+0x0/0x3f()
> > > > [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
> > > > returned 0.
> > > > [0.140276] initcall 0x806f2fa8 ran for 0 msecs: 
> > > > init_misc_binfmt+0x0/0x3f()
> > > > [0.140284] Calling initcall 0x806f2fe7: 
> > > > init_script_binfmt+0x0/0x12()
> > > > [0.140293] initcall 0x806f2fe7: 
> > > > init_script_binfmt+0x0/0x12() returned 0.
> > > > 
> > > 
> > > So perhaps it locks during a timer interrupt.
> > > 
> > > > .config is attached.
> > > > 
> > > > Any ideas/suggestions?
> > > 
> > > Just the usual: nothing from sysrq or NMI watchdog?
> > 
> > Nothing from either of those.  I'll jiggle some config options.
> 
> config option changes didn't help, but removing
>   netconsole=
> from the kernel command line makes it all happy.  :(

argh.

> Do we know of netconsole hang problems?  (anyone?)

You have "time" as well?  I found on i386 uniproc that time+netconsole
caused hangs because the printk timestamping code was taking
xtime_lock for reading inside a write_seqlock.  But I though that Andi
fixed that.  Perhaps i386 got fixed but x86_64 did not.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Randy Dunlap
On Mon, 30 Apr 2007 08:16:53 -0700 Randy Dunlap wrote:

> On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote:
> 
> > On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote:
> > 
> > > On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote:
> > > 
> > > > 
> > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/
> > > 
> > > I'm getting a hang near the end of booting on x86_64 UP.
> > > The last initcall_debug function varies.  E.g.:
> > > 
> > > 1/
> > > [0.140257] Calling initcall 0x806f2fa8: 
> > > init_misc_binfmt+0x0/0x3f()
> > > [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
> > > returned 0.
> > > [0.140275] initcall 0x806f2fa8 ran for 0 msecs: 
> > > init_misc_binfmt+0x0/0x3f()
> > > [0.140284] Calling initcall 0x806f2fe7: 
> > > init_script_binfmt+0x0/0x12()
> > > [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() 
> > > returned 0.
> > > [0.140302] initcall 0x806f2fe7 ran for 0 msecs: 
> > > init_script_binfmt+0x0/0x12()
> > > [0.140310] Calling initcall 0x806f2ff9: 
> > > init_elf_binfmt+0x0/0x12()
> > > [0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() 
> > > returned 0.
> > > [0.140326] initcall 0x806f2ff9 ran for 0 msecs: 
> > > init_elf_binfmt+0x0/0x12()
> > > [0.140335] Calling initcall 0x806f3de9: 
> > > debugfs_init+0x0/0x4a()
> > > [0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() 
> > > returned 0.
> > > [0.140351] initcall 0x806f3de9 ran for 0 msecs: 
> > > debugfs_init+0x0/0x4a()
> > > 
> > > 2/
> > > [0.140206] Calling initcall 0x806efeb1: ksysfs_init+0x0/0x29()
> > > [0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() 
> > > returned 0.
> > > [0.140222] initcall 0x806efeb1 ran for 0 msecs: 
> > > ksysfs_init+0x0/0x29()
> > > [0.140230] Calling initcall 0x806f25be: 
> > > filelock_init+0x0/0x31()
> > > [0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() 
> > > returned 0.
> > > [0.140249] initcall 0x806f25be ran for 0 msecs: 
> > > filelock_init+0x0/0x31()
> > > [0.140258] Calling initcall 0x806f2fa8: 
> > > init_misc_binfmt+0x0/0x3f()
> > > [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
> > > returned 0.
> > > [0.140276] initcall 0x806f2fa8 ran for 0 msecs: 
> > > init_misc_binfmt+0x0/0x3f()
> > > [0.140284] Calling initcall 0x806f2fe7: 
> > > init_script_binfmt+0x0/0x12()
> > > [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() 
> > > returned 0.
> > > 
> > 
> > So perhaps it locks during a timer interrupt.
> > 
> > > .config is attached.
> > > 
> > > Any ideas/suggestions?
> > 
> > Just the usual: nothing from sysrq or NMI watchdog?
> 
> Nothing from either of those.  I'll jiggle some config options.

config option changes didn't help, but removing
netconsole=
from the kernel command line makes it all happy.  :(

Do we know of netconsole hang problems?  (anyone?)

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Randy Dunlap
On Mon, 30 Apr 2007 08:16:53 -0700 Randy Dunlap wrote:

 On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote:
 
  On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap [EMAIL PROTECTED] wrote:
  
   On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote:
   

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/
   
   I'm getting a hang near the end of booting on x86_64 UP.
   The last initcall_debug function varies.  E.g.:
   
   1/
   [0.140257] Calling initcall 0x806f2fa8: 
   init_misc_binfmt+0x0/0x3f()
   [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
   returned 0.
   [0.140275] initcall 0x806f2fa8 ran for 0 msecs: 
   init_misc_binfmt+0x0/0x3f()
   [0.140284] Calling initcall 0x806f2fe7: 
   init_script_binfmt+0x0/0x12()
   [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() 
   returned 0.
   [0.140302] initcall 0x806f2fe7 ran for 0 msecs: 
   init_script_binfmt+0x0/0x12()
   [0.140310] Calling initcall 0x806f2ff9: 
   init_elf_binfmt+0x0/0x12()
   [0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() 
   returned 0.
   [0.140326] initcall 0x806f2ff9 ran for 0 msecs: 
   init_elf_binfmt+0x0/0x12()
   [0.140335] Calling initcall 0x806f3de9: 
   debugfs_init+0x0/0x4a()
   [0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() 
   returned 0.
   [0.140351] initcall 0x806f3de9 ran for 0 msecs: 
   debugfs_init+0x0/0x4a()
   
   2/
   [0.140206] Calling initcall 0x806efeb1: ksysfs_init+0x0/0x29()
   [0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() 
   returned 0.
   [0.140222] initcall 0x806efeb1 ran for 0 msecs: 
   ksysfs_init+0x0/0x29()
   [0.140230] Calling initcall 0x806f25be: 
   filelock_init+0x0/0x31()
   [0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() 
   returned 0.
   [0.140249] initcall 0x806f25be ran for 0 msecs: 
   filelock_init+0x0/0x31()
   [0.140258] Calling initcall 0x806f2fa8: 
   init_misc_binfmt+0x0/0x3f()
   [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
   returned 0.
   [0.140276] initcall 0x806f2fa8 ran for 0 msecs: 
   init_misc_binfmt+0x0/0x3f()
   [0.140284] Calling initcall 0x806f2fe7: 
   init_script_binfmt+0x0/0x12()
   [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() 
   returned 0.
   
  
  So perhaps it locks during a timer interrupt.
  
   .config is attached.
   
   Any ideas/suggestions?
  
  Just the usual: nothing from sysrq or NMI watchdog?
 
 Nothing from either of those.  I'll jiggle some config options.

config option changes didn't help, but removing
netconsole=params
from the kernel command line makes it all happy.  :(

Do we know of netconsole hang problems?  (anyone?)

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Andrew Morton
On Mon, 30 Apr 2007 16:51:01 -0700
Randy Dunlap [EMAIL PROTECTED] wrote:

 On Mon, 30 Apr 2007 08:16:53 -0700 Randy Dunlap wrote:
 
  On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote:
  
   On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap [EMAIL PROTECTED] wrote:
   
On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote:

 
 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/

I'm getting a hang near the end of booting on x86_64 UP.
The last initcall_debug function varies.  E.g.:

1/
[0.140257] Calling initcall 0x806f2fa8: 
init_misc_binfmt+0x0/0x3f()
[0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
returned 0.
[0.140275] initcall 0x806f2fa8 ran for 0 msecs: 
init_misc_binfmt+0x0/0x3f()
[0.140284] Calling initcall 0x806f2fe7: 
init_script_binfmt+0x0/0x12()
[0.140293] initcall 0x806f2fe7: 
init_script_binfmt+0x0/0x12() returned 0.
[0.140302] initcall 0x806f2fe7 ran for 0 msecs: 
init_script_binfmt+0x0/0x12()
[0.140310] Calling initcall 0x806f2ff9: 
init_elf_binfmt+0x0/0x12()
[0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() 
returned 0.
[0.140326] initcall 0x806f2ff9 ran for 0 msecs: 
init_elf_binfmt+0x0/0x12()
[0.140335] Calling initcall 0x806f3de9: 
debugfs_init+0x0/0x4a()
[0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() 
returned 0.
[0.140351] initcall 0x806f3de9 ran for 0 msecs: 
debugfs_init+0x0/0x4a()

2/
[0.140206] Calling initcall 0x806efeb1: 
ksysfs_init+0x0/0x29()
[0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() 
returned 0.
[0.140222] initcall 0x806efeb1 ran for 0 msecs: 
ksysfs_init+0x0/0x29()
[0.140230] Calling initcall 0x806f25be: 
filelock_init+0x0/0x31()
[0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() 
returned 0.
[0.140249] initcall 0x806f25be ran for 0 msecs: 
filelock_init+0x0/0x31()
[0.140258] Calling initcall 0x806f2fa8: 
init_misc_binfmt+0x0/0x3f()
[0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
returned 0.
[0.140276] initcall 0x806f2fa8 ran for 0 msecs: 
init_misc_binfmt+0x0/0x3f()
[0.140284] Calling initcall 0x806f2fe7: 
init_script_binfmt+0x0/0x12()
[0.140293] initcall 0x806f2fe7: 
init_script_binfmt+0x0/0x12() returned 0.

   
   So perhaps it locks during a timer interrupt.
   
.config is attached.

Any ideas/suggestions?
   
   Just the usual: nothing from sysrq or NMI watchdog?
  
  Nothing from either of those.  I'll jiggle some config options.
 
 config option changes didn't help, but removing
   netconsole=params
 from the kernel command line makes it all happy.  :(

argh.

 Do we know of netconsole hang problems?  (anyone?)

You have time as well?  I found on i386 uniproc that time+netconsole
caused hangs because the printk timestamping code was taking
xtime_lock for reading inside a write_seqlock.  But I though that Andi
fixed that.  Perhaps i386 got fixed but x86_64 did not.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Randy Dunlap

Andrew Morton wrote:

On Mon, 30 Apr 2007 16:51:01 -0700
Randy Dunlap [EMAIL PROTECTED] wrote:


On Mon, 30 Apr 2007 08:16:53 -0700 Randy Dunlap wrote:


On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote:


On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap [EMAIL PROTECTED] wrote:


On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote:


ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/

I'm getting a hang near the end of booting on x86_64 UP.
The last initcall_debug function varies.  E.g.:

1/
[0.140257] Calling initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f()
[0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
returned 0.
[0.140275] initcall 0x806f2fa8 ran for 0 msecs: 
init_misc_binfmt+0x0/0x3f()
[0.140284] Calling initcall 0x806f2fe7: 
init_script_binfmt+0x0/0x12()
[0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() 
returned 0.
[0.140302] initcall 0x806f2fe7 ran for 0 msecs: 
init_script_binfmt+0x0/0x12()
[0.140310] Calling initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12()
[0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() returned 
0.
[0.140326] initcall 0x806f2ff9 ran for 0 msecs: 
init_elf_binfmt+0x0/0x12()
[0.140335] Calling initcall 0x806f3de9: debugfs_init+0x0/0x4a()
[0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() returned 0.
[0.140351] initcall 0x806f3de9 ran for 0 msecs: 
debugfs_init+0x0/0x4a()

2/
[0.140206] Calling initcall 0x806efeb1: ksysfs_init+0x0/0x29()
[0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() returned 0.
[0.140222] initcall 0x806efeb1 ran for 0 msecs: 
ksysfs_init+0x0/0x29()
[0.140230] Calling initcall 0x806f25be: filelock_init+0x0/0x31()
[0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() returned 0.
[0.140249] initcall 0x806f25be ran for 0 msecs: 
filelock_init+0x0/0x31()
[0.140258] Calling initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f()
[0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
returned 0.
[0.140276] initcall 0x806f2fa8 ran for 0 msecs: 
init_misc_binfmt+0x0/0x3f()
[0.140284] Calling initcall 0x806f2fe7: 
init_script_binfmt+0x0/0x12()
[0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() 
returned 0.


So perhaps it locks during a timer interrupt.


.config is attached.

Any ideas/suggestions?

Just the usual: nothing from sysrq or NMI watchdog?

Nothing from either of those.  I'll jiggle some config options.

config option changes didn't help, but removing
netconsole=params
from the kernel command line makes it all happy.  :(


argh.


Do we know of netconsole hang problems?  (anyone?)


You have time as well?  I found on i386 uniproc that time+netconsole
caused hangs because the printk timestamping code was taking
xtime_lock for reading inside a write_seqlock.  But I though that Andi
fixed that.  Perhaps i386 got fixed but x86_64 did not.


Yes, I have CONFIG_PRINTK_TIME=y and disabling it allows it to boot.  Thanks.

Maybe the patch isn't merged yet?

Now if I can just remember this until the next time that I hit it...

--
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Andrew Morton
On Mon, 30 Apr 2007 17:45:55 -0700 Randy Dunlap [EMAIL PROTECTED] wrote:

 Andrew Morton wrote:
  On Mon, 30 Apr 2007 16:51:01 -0700
  Randy Dunlap [EMAIL PROTECTED] wrote:
  
  On Mon, 30 Apr 2007 08:16:53 -0700 Randy Dunlap wrote:
 
  On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote:
 
  On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap [EMAIL PROTECTED] 
  wrote:
 
  On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote:
 
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/
  I'm getting a hang near the end of booting on x86_64 UP.
  The last initcall_debug function varies.  E.g.:
 
  1/
  [0.140257] Calling initcall 0x806f2fa8: 
  init_misc_binfmt+0x0/0x3f()
  [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
  returned 0.
  [0.140275] initcall 0x806f2fa8 ran for 0 msecs: 
  init_misc_binfmt+0x0/0x3f()
  [0.140284] Calling initcall 0x806f2fe7: 
  init_script_binfmt+0x0/0x12()
  [0.140293] initcall 0x806f2fe7: 
  init_script_binfmt+0x0/0x12() returned 0.
  [0.140302] initcall 0x806f2fe7 ran for 0 msecs: 
  init_script_binfmt+0x0/0x12()
  [0.140310] Calling initcall 0x806f2ff9: 
  init_elf_binfmt+0x0/0x12()
  [0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() 
  returned 0.
  [0.140326] initcall 0x806f2ff9 ran for 0 msecs: 
  init_elf_binfmt+0x0/0x12()
  [0.140335] Calling initcall 0x806f3de9: 
  debugfs_init+0x0/0x4a()
  [0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() 
  returned 0.
  [0.140351] initcall 0x806f3de9 ran for 0 msecs: 
  debugfs_init+0x0/0x4a()
 
  2/
  [0.140206] Calling initcall 0x806efeb1: 
  ksysfs_init+0x0/0x29()
  [0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() 
  returned 0.
  [0.140222] initcall 0x806efeb1 ran for 0 msecs: 
  ksysfs_init+0x0/0x29()
  [0.140230] Calling initcall 0x806f25be: 
  filelock_init+0x0/0x31()
  [0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() 
  returned 0.
  [0.140249] initcall 0x806f25be ran for 0 msecs: 
  filelock_init+0x0/0x31()
  [0.140258] Calling initcall 0x806f2fa8: 
  init_misc_binfmt+0x0/0x3f()
  [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
  returned 0.
  [0.140276] initcall 0x806f2fa8 ran for 0 msecs: 
  init_misc_binfmt+0x0/0x3f()
  [0.140284] Calling initcall 0x806f2fe7: 
  init_script_binfmt+0x0/0x12()
  [0.140293] initcall 0x806f2fe7: 
  init_script_binfmt+0x0/0x12() returned 0.
 
  So perhaps it locks during a timer interrupt.
 
  .config is attached.
 
  Any ideas/suggestions?
  Just the usual: nothing from sysrq or NMI watchdog?
  Nothing from either of those.  I'll jiggle some config options.
  config option changes didn't help, but removing
 netconsole=params
  from the kernel command line makes it all happy.  :(
  
  argh.
  
  Do we know of netconsole hang problems?  (anyone?)
  
  You have time as well?  I found on i386 uniproc that time+netconsole
  caused hangs because the printk timestamping code was taking
  xtime_lock for reading inside a write_seqlock.  But I though that Andi
  fixed that.  Perhaps i386 got fixed but x86_64 did not.
 
 Yes, I have CONFIG_PRINTK_TIME=y and disabling it allows it to boot.  Thanks.
 
 Maybe the patch isn't merged yet?

Could be.  I don't recall whether Andi's statement was before or after
2.6.21-rc7-mm2 actually.

 Now if I can just remember this until the next time that I hit it...

Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at
randomish times (presumably in the timer irq handler) when netconsole and
printk-time are enabled.

I was hitting the same thing on i386 uniprocessor, but I thought it got
fixed.

The problem was that the printable string which is newly passed to
mark_tsc_unstable() is printed out inside write_seqlock(xtime_lock) but
printk timestamping (and perhaps netconsole tx?) want to take xtime_lock
for reading, which will hang.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Andi Kleen
 Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at
 randomish times (presumably in the timer irq handler) when netconsole and
 printk-time are enabled.

A backtrace would be good. Does nmi_watchdog=2 show anything
interesting or if not sysrq-t?

 
 I was hitting the same thing on i386 uniprocessor, but I thought it got
 fixed.

Yes.

My current sched_clock does not take any locks anymore and it was removed
from the cpufreq handler too.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Randy Dunlap
On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote:

  Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at
  randomish times (presumably in the timer irq handler) when netconsole and
  printk-time are enabled.
 
 A backtrace would be good. Does nmi_watchdog=2 show anything
 interesting or if not sysrq-t?

I can't get anything from sysrq or nmi_watchdog.

  I was hitting the same thing on i386 uniprocessor, but I thought it got
  fixed.
 
 Yes.

Fixed where?  Merged into mainline or in your firstfloor patches?

 My current sched_clock does not take any locks anymore and it was removed
 from the cpufreq handler too.


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Andrew Morton
On Mon, 30 Apr 2007 22:16:24 -0700 Randy Dunlap [EMAIL PROTECTED] wrote:

   I was hitting the same thing on i386 uniprocessor, but I thought it got
   fixed.
  
  Yes.
 
 Fixed where?  Merged into mainline or in your firstfloor patches?

The bug is in firstfloor only, and the fix (if present) will be there too.

checks

Nope,

ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/sched-clock-share

is identical to

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/x86_64-mm-sched-clock-share.patch


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Andi Kleen
On Mon, Apr 30, 2007 at 10:16:24PM -0700, Randy Dunlap wrote:
 On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote:
 
   Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at
   randomish times (presumably in the timer irq handler) when netconsole and
   printk-time are enabled.
  
  A backtrace would be good. Does nmi_watchdog=2 show anything
  interesting or if not sysrq-t?
 
 I can't get anything from sysrq or nmi_watchdog.

Hmm, ok when the console locks up those likely don't work.

 
   I was hitting the same thing on i386 uniprocessor, but I thought it got
   fixed.
  
  Yes.
 
 Fixed where?  Merged into mainline or in your firstfloor patches?

None of the sched-clock changes are in mainline yet.

Can you perhaps test latest firstfloor alone (without rest of -mm)?

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Andi Kleen
 The bug is in firstfloor only, and the fix (if present) will be there too.
 
 checks
 
 Nope,
 
 ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/sched-clock-share
 
 is identical to
 
 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/x86_64-mm-sched-clock-share.patch

Or perhaps the deadlock is in the cpufrequency handler. Does it happen without 
CONFIG_CPUFREQ
too?

[cpufreq handler calls ktime_get which might take xtime lock for reading] 

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Andrew Morton
On Tue, 1 May 2007 08:24:56 +0200 Andi Kleen [EMAIL PROTECTED] wrote:

  The bug is in firstfloor only, and the fix (if present) will be there too.
  
  checks
  
  Nope,
  
  ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/sched-clock-share
  
  is identical to
  
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/x86_64-mm-sched-clock-share.patch
 
 Or perhaps the deadlock is in the cpufrequency handler. Does it happen 
 without CONFIG_CPUFREQ
 too?
 
 [cpufreq handler calls ktime_get which might take xtime lock for reading] 
 

Sounds right.  That's what was happening to me for a while.

Randy, it'd be interesting to try:

--- a/arch/x86_64/kernel/tsc.c~a
+++ a/arch/x86_64/kernel/tsc.c
@@ -84,8 +84,8 @@ static int time_cpufreq_notifier(struct 
cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq-new);
 
tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq-new);
-   if (!(freq-flags  CPUFREQ_CONST_LOOPS))
-   mark_tsc_unstable(cpufreq changes);
+// if (!(freq-flags  CPUFREQ_CONST_LOOPS))
+// mark_tsc_unstable(cpufreq changes);
}
 
return 0;
_

and if that fixes it, disable netconsole and do

--- a/arch/x86_64/kernel/tsc.c~a
+++ a/arch/x86_64/kernel/tsc.c
@@ -85,7 +85,7 @@ static int time_cpufreq_notifier(struct 
 
tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq-new);
if (!(freq-flags  CPUFREQ_CONST_LOOPS))
-   mark_tsc_unstable(cpufreq changes);
+   dump_stack();
}
 
return 0;
_

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/