SOLVED - Re: Simple question re: oops

2005-07-30 Thread Lee Revell
On Sun, 2005-07-31 at 10:40 +1000, Dave Airlie wrote:
> > panic_on_oops has no effect, a bunch of stuff flies past and the last
> > thing I see is "gam_server: scheduling while atomic" then a stack trace
> > of the core dump path then "Aiee, killing interrupt handler".
> > 
> > I am starting to suspect the hard drive, does that sound plausible?
> > It's as if it locks up when it hits a certain disk block.
> 
> run memtest on it... you might have bad RAM..

This was some kind of (ACPI related?) kernel bug.  I upgraded from Hoary
(2.6.11) to Breezy (2.6.12) and the problem which had been 100%
reproducible went away.

One strange thing I noticed was some strange APM/ACPI related messages
in the logs when starting X (APM: overridden by ACPI or something).  Now
I don't get these and the X log just says /dev/apm_bios: No such device.

Oh well, it's working now.

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Simple question re: oops

2005-07-30 Thread Lee Revell
On Sun, 2005-07-31 at 10:40 +1000, Dave Airlie wrote:
> > panic_on_oops has no effect, a bunch of stuff flies past and the last
> > thing I see is "gam_server: scheduling while atomic" then a stack trace
> > of the core dump path then "Aiee, killing interrupt handler".
> > 
> > I am starting to suspect the hard drive, does that sound plausible?
> > It's as if it locks up when it hits a certain disk block.
> 
> run memtest on it... you might have bad RAM..
> 

Already swapped it out, but I'll try memtest.

Any idea why printk_ratelimit does not work?  I set it to 1000 (per the
docs this should limit to 1 printk per second) and burst to 1 but I
still get screenfuls of text flying by.

Lee


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Simple question re: oops

2005-07-30 Thread Dave Airlie
> panic_on_oops has no effect, a bunch of stuff flies past and the last
> thing I see is "gam_server: scheduling while atomic" then a stack trace
> of the core dump path then "Aiee, killing interrupt handler".
> 
> I am starting to suspect the hard drive, does that sound plausible?
> It's as if it locks up when it hits a certain disk block.

run memtest on it... you might have bad RAM..

Dave.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Simple question re: oops

2005-07-30 Thread Lee Revell
On Sun, 2005-07-31 at 02:11 +0200, Alexander Nyberg wrote:
> On Sat, Jul 30, 2005 at 07:48:11PM -0400 Lee Revell wrote:
> 
> > I have a machine here that oopses reliably when I start X, but the
> > interesting stuff scrolls away too fast, and a bunch more Oopses get
> > printed ending with "Aieee, killing interrupt handler".
> > 
> > How do I get the output to stop after the first Oops?
> > 
> 
> set /proc/sys/kernel/panic_on_oops to 1
> 
> What version of the kernel is that? It shouldn't do recursive oopses
> (of the same task) any more.
> 

panic_on_oops has no effect, a bunch of stuff flies past and the last
thing I see is "gam_server: scheduling while atomic" then a stack trace
of the core dump path then "Aiee, killing interrupt handler".

I am starting to suspect the hard drive, does that sound plausible?
It's as if it locks up when it hits a certain disk block.

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Simple question re: oops

2005-07-30 Thread Lee Revell
On Sun, 2005-07-31 at 02:11 +0200, Alexander Nyberg wrote:
> On Sat, Jul 30, 2005 at 07:48:11PM -0400 Lee Revell wrote:
> 
> > I have a machine here that oopses reliably when I start X, but the
> > interesting stuff scrolls away too fast, and a bunch more Oopses get
> > printed ending with "Aieee, killing interrupt handler".
> > 
> > How do I get the output to stop after the first Oops?
> > 
> 
> set /proc/sys/kernel/panic_on_oops to 1
> 
> What version of the kernel is that? It shouldn't do recursive oopses
> (of the same task) any more.
> 

2.6.10 (whatever comes with Ubuntu Hoary).  It's a demo install for a
client on cobbled together hardware.  First I suspected the bleeding
edge GeForce video card, then we swapped it which didn't help.  Now I
suspect the hard drive (or a kernel bug).

And I was wrong, it wasn't more Oopses, it was "scheduling while atomic"
messages that forced the interesting stuff offscreen.

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Simple question re: oops

2005-07-30 Thread Alexander Nyberg
On Sat, Jul 30, 2005 at 07:48:11PM -0400 Lee Revell wrote:

> I have a machine here that oopses reliably when I start X, but the
> interesting stuff scrolls away too fast, and a bunch more Oopses get
> printed ending with "Aieee, killing interrupt handler".
> 
> How do I get the output to stop after the first Oops?
> 

set /proc/sys/kernel/panic_on_oops to 1

What version of the kernel is that? It shouldn't do recursive oopses
(of the same task) any more.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Simple question re: oops

2005-07-30 Thread Lee Revell
On Sat, 2005-07-30 at 19:48 -0400, Lee Revell wrote:
> I have a machine here that oopses reliably when I start X, but the
> interesting stuff scrolls away too fast, and a bunch more Oopses get
> printed ending with "Aieee, killing interrupt handler".
> 
> How do I get the output to stop after the first Oops?
> 

Never mind, /proc/sys/kernel/panic_on_oops should do it.

Lee


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Simple question re: oops

2005-07-30 Thread Lee Revell
On Sat, 2005-07-30 at 19:48 -0400, Lee Revell wrote:
 I have a machine here that oopses reliably when I start X, but the
 interesting stuff scrolls away too fast, and a bunch more Oopses get
 printed ending with Aieee, killing interrupt handler.
 
 How do I get the output to stop after the first Oops?
 

Never mind, /proc/sys/kernel/panic_on_oops should do it.

Lee


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Simple question re: oops

2005-07-30 Thread Alexander Nyberg
On Sat, Jul 30, 2005 at 07:48:11PM -0400 Lee Revell wrote:

 I have a machine here that oopses reliably when I start X, but the
 interesting stuff scrolls away too fast, and a bunch more Oopses get
 printed ending with Aieee, killing interrupt handler.
 
 How do I get the output to stop after the first Oops?
 

set /proc/sys/kernel/panic_on_oops to 1

What version of the kernel is that? It shouldn't do recursive oopses
(of the same task) any more.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Simple question re: oops

2005-07-30 Thread Lee Revell
On Sun, 2005-07-31 at 02:11 +0200, Alexander Nyberg wrote:
 On Sat, Jul 30, 2005 at 07:48:11PM -0400 Lee Revell wrote:
 
  I have a machine here that oopses reliably when I start X, but the
  interesting stuff scrolls away too fast, and a bunch more Oopses get
  printed ending with Aieee, killing interrupt handler.
  
  How do I get the output to stop after the first Oops?
  
 
 set /proc/sys/kernel/panic_on_oops to 1
 
 What version of the kernel is that? It shouldn't do recursive oopses
 (of the same task) any more.
 

2.6.10 (whatever comes with Ubuntu Hoary).  It's a demo install for a
client on cobbled together hardware.  First I suspected the bleeding
edge GeForce video card, then we swapped it which didn't help.  Now I
suspect the hard drive (or a kernel bug).

And I was wrong, it wasn't more Oopses, it was scheduling while atomic
messages that forced the interesting stuff offscreen.

Lee

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Simple question re: oops

2005-07-30 Thread Lee Revell
On Sun, 2005-07-31 at 02:11 +0200, Alexander Nyberg wrote:
 On Sat, Jul 30, 2005 at 07:48:11PM -0400 Lee Revell wrote:
 
  I have a machine here that oopses reliably when I start X, but the
  interesting stuff scrolls away too fast, and a bunch more Oopses get
  printed ending with Aieee, killing interrupt handler.
  
  How do I get the output to stop after the first Oops?
  
 
 set /proc/sys/kernel/panic_on_oops to 1
 
 What version of the kernel is that? It shouldn't do recursive oopses
 (of the same task) any more.
 

panic_on_oops has no effect, a bunch of stuff flies past and the last
thing I see is gam_server: scheduling while atomic then a stack trace
of the core dump path then Aiee, killing interrupt handler.

I am starting to suspect the hard drive, does that sound plausible?
It's as if it locks up when it hits a certain disk block.

Lee

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Simple question re: oops

2005-07-30 Thread Dave Airlie
 panic_on_oops has no effect, a bunch of stuff flies past and the last
 thing I see is gam_server: scheduling while atomic then a stack trace
 of the core dump path then Aiee, killing interrupt handler.
 
 I am starting to suspect the hard drive, does that sound plausible?
 It's as if it locks up when it hits a certain disk block.

run memtest on it... you might have bad RAM..

Dave.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Simple question re: oops

2005-07-30 Thread Lee Revell
On Sun, 2005-07-31 at 10:40 +1000, Dave Airlie wrote:
  panic_on_oops has no effect, a bunch of stuff flies past and the last
  thing I see is gam_server: scheduling while atomic then a stack trace
  of the core dump path then Aiee, killing interrupt handler.
  
  I am starting to suspect the hard drive, does that sound plausible?
  It's as if it locks up when it hits a certain disk block.
 
 run memtest on it... you might have bad RAM..
 

Already swapped it out, but I'll try memtest.

Any idea why printk_ratelimit does not work?  I set it to 1000 (per the
docs this should limit to 1 printk per second) and burst to 1 but I
still get screenfuls of text flying by.

Lee


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


SOLVED - Re: Simple question re: oops

2005-07-30 Thread Lee Revell
On Sun, 2005-07-31 at 10:40 +1000, Dave Airlie wrote:
  panic_on_oops has no effect, a bunch of stuff flies past and the last
  thing I see is gam_server: scheduling while atomic then a stack trace
  of the core dump path then Aiee, killing interrupt handler.
  
  I am starting to suspect the hard drive, does that sound plausible?
  It's as if it locks up when it hits a certain disk block.
 
 run memtest on it... you might have bad RAM..

This was some kind of (ACPI related?) kernel bug.  I upgraded from Hoary
(2.6.11) to Breezy (2.6.12) and the problem which had been 100%
reproducible went away.

One strange thing I noticed was some strange APM/ACPI related messages
in the logs when starting X (APM: overridden by ACPI or something).  Now
I don't get these and the X log just says /dev/apm_bios: No such device.

Oh well, it's working now.

Lee

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/