On 8 Apr, Anand Kumria wrote: > On Mon, Apr 08, 2002 at 12:00:53AM +1000, [EMAIL PROTECTED] wrote: > > Not being on any of the kernel mailing lists, I wouldn't know id the > > following subject has come up, and so I thought I'd ask here if anyone > > knows of any planned work in this area... > > You did google for `Linux Kernel Crash Dumps', no?
Sorry. > Anyway, lkcd.sf.net is what you are after. It might get integrated, > it might not. You can always petition Alan Cox to include it in RedHat's > kernel when is downunder next year. Yep, that sounds really good - thanks. > > A friend at Sun remarked that supporting Linux at the enterprise level > > is much harder than supporting Solaris, mainly because Linux has no > > crash dump facility. That is, when Solaris crashes, it leaves a dump > > I've only had Solaris crash 5 times (same number as Linux) and have only > had it generate a crash dump once. All the other times involved IO code > and/or hardware and the machine(s) just spontaneously rebooted. > > So Linus' thoughts on the desirability of having crash dump code in the > kernel is understandable; your friend's comments about support ease with > crash dumps isn't though. I don't think I'm alone in having only a 20% > successful crash dump on catastrophic failure. The impression I had, talking to him, was that being able to get a crash dump was normal under Solaris, not a 1 in 5 chance. Hmm. > > Having lived through a few Linux panics, I have to agree that it has > > nothing like this - it has something only marginally better than > > Windows NT's blue screen of death. > > Slightly more than marginally; I'm in the process of restoring a corrupted > LVM partition of mine. The LVM code thinks there are more bits to the disk > than there really and regularly generates faults. Sorry, I wasn't clear. I meant, it didn't seem to be much better than NT as far as providing crash dump type info. I agree that it's much more robust in the face of serious errors. > I'm still using my machine despite it having `opps'ed about 45 minutes ago. > > I can't access the particular partition in question though, I'll need to > reboot, but having only that particular subsystem/hardware item be locked > off it damn handy. Agreed. > > (At least Linux has ksymoops so > > that after you have laboriously copied down a text screen full of hex > > numbers, and then typed them in, you can at least get some symbolic > > debug info. So it's better than Windows, but it's a painful process.) > > Normally ksymoops is tied into your logfile stuff so it automagically > decodes the entries that got logged without the need for you to copy > things down. I didn't know that until tonight. :-) Though in my case, no Oops stuff was logged, and the system completely and utterly froze. All I could do was copy stuff down from the screen. Fortunately this was for a problem that was so repeatable that I could provoke it without X running. (Normally, the console was not on display, only X. I mean the machine was utterly stopped.) > Even more important is that you can actually look at the code and see > where it all went to pieces. Agreed. > > Does anyone know whether something more like Solaris's kind of facility > > is being planned for Linux? > > Hopefully this is more info than you wanted to know. It's great - much appreciated. luke -- SLUG - Sydney Linux User Group Mailing List - http://slug.org.au/ More Info: http://lists.slug.org.au/listinfo/slug
