Re: kernel panic

2011-02-16 Thread Jerry Feldman
One of my old laptops had a similar issue. It had a bottom section that
contained the floppy and CD drive. It ended up to be a bad memory module.

On 02/15/2011 10:33 AM, Jack Coats wrote:
> I have seen such things be thermal related.
>
> Even though it is a laptop, open it up or have a tech do it and clean
> it out.  Dust, lint, etc build up and can cause thermal overload.  It
> is very dusty were we live, and I have this as an issue in desktops
> here.
>
> My son had a laptop at college and found that it had a lot of lint in
> it.  I am guessing because he used it on a bed.
>
> The next thing I would check is memory.  Run a long running memory
> test.  Memory sticks do go bad, but not often.


-- 
Jerry Feldman 
Boston Linux and Unix
PGP key id: 537C5846
PGP Key fingerprint: 3D1B 8377 A3C0 A5F2 ECBB  CA3B 4607 4319 537C 5846


___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: kernel panic

2011-02-15 Thread F Ozbek
On Tue, Feb 15, 2011 at 12:40 PM, Tom Metro  wrote:

> Jack Coats wrote:
> > Even though it is a laptop, open it up...and clean it out.
>
> I do that about once a year and did so in January when I upgraded the RAM.
>
> The crash frequency has perhaps increased in frequency some since the
> RAM upgrade, but the problem was definitely there prior to the RAM
> upgrade. The increase might lend creditability to the thermal theory,
> assuming the 4 GB module gives off enough additional heat, compared to
> the 2 GB module it replaced, to make a difference.
>
>
did you try running memtest86+ ?
___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: kernel panic

2011-02-15 Thread Derek Martin
On Tue, Feb 15, 2011 at 01:22:43AM -0500, Tom Metro wrote:
> Hmmm...maybe a "blue screen of death" isn't such a bad design.

I don't think anyone ever lamented the design... just how often it
presented itself. :(

-- 
Derek D. Martinhttp://www.pizzashack.org/   GPG Key ID: 0xDFBEAD02
-=-=-=-=-
This message is posted from an invalid address.  Replying to it will result in
undeliverable mail due to spam prevention.  Sorry for the inconvenience.

___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: kernel panic

2011-02-15 Thread Jarod Wilson
On Feb 15, 2011, at 1:00 PM, Tom Metro wrote:

> Tom Metro wrote:
>> If this is a practical option, I'll dig deeper and see if I can turn up
>> a guide for using it with an Ubuntu kernel.
> 
> Installing kexec-tools did generate this warning:
> 
> update-rc.d: warning: kdump start runlevel arguments (2) do not match
> LSB Default-Start values (0 1 2 3 4 5)
> update-rc.d: warning: kdump stop runlevel arguments (none) do not match
> LSB Default-Stop values (6)
> 
> (Which I think this bug report addresses:
> https://bugs.launchpad.net/ubuntu/+source/kexec-tools/+bug/569980 )
> 
> and a kdump placeholder man page, so that's a good indication it is a
> patched kexec-tools.

Patched kexec-tools shouldn't be necessary these days, as far as I can
recall. It was when we were working on it in the pre-RHEL5.0 GA days,
in late 2006, but I'm pretty sure the necessary support has all been
in kexec-tools for some time now.


> I then ran across:
> https://wiki.ubuntu.com/Kernel/CrashdumpRecipe
> 
> which says to install the linux-crashdump package, then after a reboot,
> kernel panics should automatically be logged.

I've heard of that one, but never used it. Never got a ton of upstream
traction -- kdump is the upstream crash dump method of choice now.


>>> ...which collects an entire vmcore...that can be analyzed.
>> 
>> How does it record that?
> 
> Apparently it saves the dump to RAM, boots a special kernel that then
> provides access to that RAM and writes the drop to a safe area on disk.

Nope. You're on the right track, but not quite. You boot your primary
kernel with reserved memory region. This memory region is completely
untouched by your system during normal use (you lose use of that area).
The crash kernel is loaded into that memory area, and given rights to
use it upon a panic. The trick is that we boot into the crash kernel
without resetting or touching the rest of RAM, leaving it in exactly
the state it was in when the panic occurred. Once the crash kernel has
booted, it essentially dumps a raw copy of the memory from when the
panic happened out to your dump target device, with optional filtering
of the vmcore to reduce size (don't save unused pages, skip userspace
memory, etc).


> I'm curious to see if it can be configured to use a Flash drive. The
> turn-key Ubuntu process makes no mention of that.

Yes, it can. You can dump to essentially any storage device, if you
have the appropriate tools. In Red Hat Enterprise Linux, we provide
some additional infrastructure that allows creating an initrd with all
the bits necessary to dump over ssh to a remote system, over nfs, to
a fibre channel array, or wherever you like, without even having to
mount your root file system (which may be corrupted as a result of
the panic you're trying to capture a vmcore from). Those bits are a
bit distro-specific though, as its a modified version of Red Hat's
initrd creation utility that is at the heart of this. But its very
doable. Just not sure if Ubuntu has the hooks to do anything other
than a local fs dump.


-- 
Jarod Wilson
ja...@wilsonet.com



___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: kernel panic

2011-02-15 Thread Tom Metro
Tom Metro wrote:
> If this is a practical option, I'll dig deeper and see if I can turn up
> a guide for using it with an Ubuntu kernel.

Installing kexec-tools did generate this warning:

update-rc.d: warning: kdump start runlevel arguments (2) do not match
LSB Default-Start values (0 1 2 3 4 5)
update-rc.d: warning: kdump stop runlevel arguments (none) do not match
LSB Default-Stop values (6)

(Which I think this bug report addresses:
https://bugs.launchpad.net/ubuntu/+source/kexec-tools/+bug/569980 )

and a kdump placeholder man page, so that's a good indication it is a
patched kexec-tools.

I then ran across:
https://wiki.ubuntu.com/Kernel/CrashdumpRecipe

which says to install the linux-crashdump package, then after a reboot,
kernel panics should automatically be logged.

I also ran across a bunch of forum postings from people saying how
linux-crashdump didn't work for them. We'll see...


>> ...which collects an entire vmcore...that can be analyzed.
> 
> How does it record that?

Apparently it saves the dump to RAM, boots a special kernel that then
provides access to that RAM and writes the drop to a safe area on disk.

I'm curious to see if it can be configured to use a Flash drive. The
turn-key Ubuntu process makes no mention of that.

 -Tom

-- 
Tom Metro
Venture Logic, Newton, MA, USA
"Enterprise solutions through open source."
Professional Profile: http://tmetro.venturelogic.com/
___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: kernel panic

2011-02-15 Thread Tom Metro
Jack Coats wrote:
> Even though it is a laptop, open it up...and clean it out.

I do that about once a year and did so in January when I upgraded the RAM.

The crash frequency has perhaps increased in frequency some since the
RAM upgrade, but the problem was definitely there prior to the RAM
upgrade. The increase might lend creditability to the thermal theory,
assuming the 4 GB module gives off enough additional heat, compared to
the 2 GB module it replaced, to make a difference.

(The RAM upgraded raised other issues: the intention was to go from 2 2
GB modules to 2 4 GB modules. The Intel chipset supposedly supports it.
But I couldn't get the machine to POST with both new modules. Only with
either of the 4 GB modules alone, or in combination with one of the old
2 GB modules. So I left it at 6 GB. Acer tells me a BIOS upgrade won't
address the problem, and their official stance is that the machine only
supports 4 GB max (clearly not true). I also notice on startup it
reports a memory bus speed of 800 MHz, instead of the expected 1066 MHz.
Could be the old 2 GB module "dragging it down." I haven't investigated
yet.)

 -Tom

-- 
Tom Metro
Venture Logic, Newton, MA, USA
"Enterprise solutions through open source."
Professional Profile: http://tmetro.venturelogic.com/
___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: kernel panic

2011-02-15 Thread Tom Metro
Jarod Wilson wrote:
> Tom Metro wrote:
>> My recollection is that the only way to
>> capture the output of a kernel panic is to capture the output of the
>> serial console. Is that still true?
> 
> No. There have been other ways for quite some time.

Ah, good.


> The most common upstream-supported way is kdump...

I ran across that, but the main page described it as a set of patches,
and didn't give the impression that it had been incorporated into any
distributions.

If this is a practical option, I'll dig deeper and see if I can turn up
a guide for using it with an Ubuntu kernel.

There are no Ubuntu (9.10) packages for kdump, kmsg_dump, or kmsgdump or
packages containing those substrings. (While these tools mainly reside
in the kernel, there still should be some user space tools in a separate
package, I assume.) Although I guess Ubuntu's kexec-tools might just be
the patched version.


> ...which collects an entire vmcore...that can be analyzed.

How does it record that?


> There's also kmsg_dump now, which can output the kernel message
> buffer (including the panic trace) to RAM, flash, or other target,
> when appropriately configured. Its fairly new upstream though, so it 
> may or may not be in the kernel you're running.

This sounds more like what I want, but less likely to be something I can
get to work.

How would I determine if the kernel I have supports it?

 -Tom

-- 
Tom Metro
Venture Logic, Newton, MA, USA
"Enterprise solutions through open source."
Professional Profile: http://tmetro.venturelogic.com/
___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: kernel panic

2011-02-15 Thread Jack Coats
I have seen such things be thermal related.

Even though it is a laptop, open it up or have a tech do it and clean
it out.  Dust, lint, etc build up and can cause thermal overload.  It
is very dusty were we live, and I have this as an issue in desktops
here.

My son had a laptop at college and found that it had a lot of lint in
it.  I am guessing because he used it on a bed.

The next thing I would check is memory.  Run a long running memory
test.  Memory sticks do go bad, but not often.

Let us know how it goes.
___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: kernel panic

2011-02-15 Thread Jarod Wilson
On Feb 15, 2011, at 1:22 AM, Tom Metro wrote:

> My laptop is crashing pretty regularly with a kernel panic after 24 to
> 48 hours of uptime. Rolling back to the last two kernel versions hasn't
> helped.
> 
> My attempts to leave the computer idle showing a virtual console also
> hasn't been too successful at revealing the cause (see prior post on
> difficulty in disabling the screen blanking feature). It did catch one,
> but most of the information had scrolled off the screen. Lots of
> "do_invalid_op" messages.
> 
> Another time I was able to catch a fragment of the log showing in a GUI
> screenlet that shows dmesg output, and that one showed "iwlagn: can't
> stop RX DMA", implicating the Intel WiFi hardware or driver. That led me
> to some suggestions to try turning up the power saving level on the
> Intel card on the theory that it is overheating. I tried that. I
> monitored the temperature via
> /sys/bus/pci/drivers/iwlagn/:05:00.0/temperature
> before and after the power setting change, and it always hovered around
> 67, whatever that means (67 C?). And a day later the system crashed again.
> 
> The logs saved to disk don't show anything relevant, which I guess is
> usual for a kernel panic. My recollection is that the only way to
> capture the output of a kernel panic is to capture the output of the
> serial console. Is that still true?

No. There have been other ways for quite some time. The most
common upstream-supported way is kdump, which collects an entire
vmcore (or a filtered/trimmed one) that can be analyzed. There's
also kmsg_dump now, which can output the kernel message buffer
(including the panic trace) to RAM, flash, or other target, when
appropriately configured. Its fairly new upstream though, so it
may or may not be in the kernel you're running.


-- 
Jarod Wilson
ja...@wilsonet.com



___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss