On  8 Apr, Anand Kumria wrote:
>  On Mon, Apr 08, 2002 at 12:00:53AM +1000, [EMAIL PROTECTED] wrote:
> > Not being on any of the kernel mailing lists, I wouldn't know id the
> > following subject has come up, and so I thought I'd ask here if anyone
> > knows of any planned work in this area...
>  
>  You did google for `Linux Kernel Crash Dumps', no?

Sorry.

>  Anyway, lkcd.sf.net is what you are after. It might get integrated,
>  it might not. You can always petition Alan Cox to include it in RedHat's
>  kernel when is downunder next year.

Yep, that sounds really good - thanks.

> > A friend at Sun remarked that supporting Linux at the enterprise level
> > is much harder than supporting Solaris, mainly because Linux has no
> > crash dump facility.  That is, when Solaris crashes, it leaves a dump
>  
>  I've only had Solaris crash 5 times (same number as Linux) and have only
>  had it generate a crash dump once. All the other times involved IO code
>  and/or hardware and the machine(s) just spontaneously rebooted.
>  
>  So Linus' thoughts on the desirability of having crash dump code in the
>  kernel is understandable; your friend's comments about support ease with 
>  crash dumps isn't though. I don't think I'm alone in having only a 20%
>  successful crash dump on catastrophic failure.

The impression I had, talking to him, was that being able to get a
crash dump was normal under Solaris, not a 1 in 5 chance.  Hmm.

> > Having lived through a few Linux panics, I have to agree that it has
> > nothing like this - it has something only marginally better than
> > Windows NT's blue screen of death.  
>  
>  Slightly more than marginally; I'm in the process of restoring a corrupted
>  LVM partition of mine. The LVM code thinks there are more bits to the disk
>  than there really and regularly generates faults.

Sorry, I wasn't clear.  I meant, it didn't seem to be much better than
NT as far as providing crash dump type info.  I agree that it's much
more robust in the face of serious errors.

>  I'm still using my machine despite it having `opps'ed about 45 minutes ago.
>  
>  I can't access the particular partition in question though, I'll need to
>  reboot, but having only that particular subsystem/hardware item be locked
>  off it damn handy.

Agreed.

> > (At least Linux has ksymoops so
> > that after you have laboriously copied down a text screen full of hex
> > numbers, and then typed them in, you can at least get some symbolic
> > debug info.  So it's better than Windows, but it's a painful process.)
>  
>  Normally ksymoops is tied into your logfile stuff so it automagically
>  decodes the entries that got logged without the need for you to copy
>  things down.

I didn't know that until tonight.  :-)  Though in my case, no Oops
stuff was logged, and the system completely and utterly froze.  All I
could do was copy stuff down from the screen.  Fortunately this was for
a problem that was so repeatable that I could provoke it without X
running.  (Normally, the console was not on display, only X.  I mean
the machine was utterly stopped.)

>  Even more important is that you can actually look at the code and see
>  where it all went to pieces.

Agreed.

> > Does anyone know whether something more like Solaris's kind of facility
> > is being planned for Linux?
>  
>  Hopefully this is more info than you wanted to know.

It's great - much appreciated.

luke

-- 
SLUG - Sydney Linux User Group Mailing List - http://slug.org.au/
More Info: http://lists.slug.org.au/listinfo/slug

Reply via email to