Re: Udev-165 (apparently) System Crash
On Thu Jan 6 2011 09:24 PM, William Immendorf wrote: Intriguing. This is something that you should report to both LKML and linux-hotplug (the Udev list). Also, be sure to provide the kdump of that kernel, the log, and the hardware that you think is causing the Udev issue. Hi William, First, apologies for getting back to you so late. I deduce you are familiar with Kdump _and_ (B)LFS so maybe you can help me further, beyond sending me to LKML and Udev list (anybody can obviously chime in here with their thoughts): In a nutshell, the basic functionality of a Kdump is, as we all know, for the second,dump-capture kernel to boot as soon as the first, system kernel, crashes and then to capture the preserved memory of the first (crashed) system and write it somewhere for later analysis. So far so good. On an (B)LFS system I see two problems (issues, as they say): 1. On boot, the dump-capture kernel goes through the same steps as the original (system) kernel (mountkernfs, modules, UDEV, swap ...), finds a corrupt file system, fsck's it and then re-reboots into the first, bad, system, and all is lost. 2. In a particular situation like mine (as described in the intro to this thread), the crash happens early in the boot (at UDEV activation) which makes the possible interaction between the crashing system and dump-capture one even more intriguing. As an aside, I suspect these two complications may have something to do with the (B)LFS procedures seemingly not covering the system crashes. Thanks, -- Alex -- http://linuxfromscratch.org/mailman/listinfo/lfs-support FAQ: http://www.linuxfromscratch.org/lfs/faq.html Unsubscribe: See the above information page
Re: Udev-165 (apparently) System Crash
On Thu, 2011-01-06 at 02:08 AM, Simon Geard wrote: ... udevd supports a couple of options that might help, --debug and --debug-trace. 1. FWIW, udevd-165 does not (no-longer?) have a --debug-trace option. Maybe it's undocumented now (shades of Undocumented DOS of yore :) Please see 'man udevd'. 2. For argument sake, udevd writes to sys.log and daemon.log (identically). There are a few things that intrigue me (which might help in my non-kdump troubleshooting, if answered positively): 2.1. Udevd is active way before the main partition becomes read-WRITE (stemming from a chicken and egg situation, I suppose). Be that as it may, where's the mechanism that writes the accumulated udevd log from memory(?) to disk (/var/log/) ultimately (i.e. delayed)? 2.2. Can udevd (syslogd?) output be originally redirected to a different partition where it can go instantaneously, so I can find it postmortem and deduce the hardware element that bothers Udev? syslogd started early for this test, etc. 2.3. More simplistically, can I temporarily mount the main partition writable as soon as Udev discovers it and somehow start syslogd? As I mentioned, the crash occurs a few lines before the end of Udev run, i.e. long after Udev sees the partitions. Would syslogd start writing udevd output to disk immediately? That would involve creating an appropriate rule with some action in it, etc. Maybe splitting the udevd logic in two phases..., etc. I'd appreciate your thoughts. If 2.1-3 above too crazy, please disregard with my apologies. Thanks, -- Alex -- http://linuxfromscratch.org/mailman/listinfo/lfs-support FAQ: http://www.linuxfromscratch.org/lfs/faq.html Unsubscribe: See the above information page
Re: Udev-165 (apparently) System Crash
On Saturday 08 January 2011 13:44:23 al...@verizon.net wrote: On Thu, 2011-01-06 at 02:08 AM, Simon Geard wrote: ... udevd supports a couple of options that might help, --debug and --debug-trace. 1. FWIW, udevd-165 does not (no-longer?) have a --debug-trace option. Maybe it's undocumented now (shades of Undocumented DOS of yore :) Please see 'man udevd'. 2. For argument sake, udevd writes to sys.log and daemon.log (identically). There are a few things that intrigue me (which might help in my non-kdump troubleshooting, if answered positively): 2.1. Udevd is active way before the main partition becomes read-WRITE (stemming from a chicken and egg situation, I suppose). Be that as it may, where's the mechanism that writes the accumulated udevd log from memory(?) to disk (/var/log/) ultimately (i.e. delayed)? 2.2. Can udevd (syslogd?) output be originally redirected to a different partition where it can go instantaneously, so I can find it postmortem and deduce the hardware element that bothers Udev? syslogd started early for this test, etc. 2.3. More simplistically, can I temporarily mount the main partition writable as soon as Udev discovers it and somehow start syslogd? As I mentioned, the crash occurs a few lines before the end of Udev run, i.e. long after Udev sees the partitions. Would syslogd start writing udevd output to disk immediately? That would involve creating an appropriate rule with some action in it, etc. Maybe splitting the udevd logic in two phases..., etc. I'd appreciate your thoughts. If 2.1-3 above too crazy, please disregard with my apologies. While I was integrating udev into my test/dev version of Smoothwall, I had to go through some 'contortions' when the system was in early boot (running in the initramfs). At this point, syslogd is not running and there is no hard drive available; logging is pretty much WYSIWYG. What follows is 'generic' Linux boot processes and may not mesh well with how LFS is actually booted. You may have to get your hands *really* dirty. You can modify the initramfs image (unpack it, modify it, repack it, reboot) by changing its init script to direct udev's STDOUT and STDERR to a file (in the tmpfs, of course). After udev runs (in the initramfs' init script), you can run a shell to examine the file. The bootup process will be suspended until you exit the shell. But you say 'the crash'. If the system is crashing (I, red-facedly, haven't paid attention), then you might try populating the initramfs' /dev with device nodes for your hard drive and modifying the initramfs' init script to mount the hard drive. Then run udev as verbosely as possible and redirect its stdout and stderr to a file on the hard drive. Be warned that playing in initramfs (early boot) is almost a black art. Things we take for granted, like standard TTY features, standard device nodes, and ordinary debugging tools may not exist. It's not as limiting as having only 18 toggles on the front panel, but it's still a bother. If you are truly desperate, you can always put the entire LFS / tree into the initramfs image. It'll be slow to unpack, but you'll have all the tools at your fingertips. You *will* need a lot of RAM to do this, though. -- http://linuxfromscratch.org/mailman/listinfo/lfs-support FAQ: http://www.linuxfromscratch.org/lfs/faq.html Unsubscribe: See the above information page
Re: Udev-165 (apparently) System Crash
On Saturday 08 January 2011 02:48 PM Neal Murphy wrote: While I was integrating udev into my test/dev version of Smoothwall ... Hi Neal, This is only to acknowledge and thank you for your detailed comments. I haven't had time to go into any depth at all, what with the NFL playoffs and all, but judging strictly by the size of your comments, I may be on the brink of going kdump after all. I am a bit traumatized by kdump, I remember finding little support for the heavy procedure, + the actual submission which is another put off. If desperate, I'll go the whole 9 yards (maybe even longer). My only hope is if I stall long enough I won't need to do anything any more: online.wsj.com/article/SB10001424052970204527804576043803826627110.html?mod=WSJ_hp_mostpop_read I'll try to absorb and follow your friendly and valuable advice soon. Thanks again, -- Alex -- http://linuxfromscratch.org/mailman/listinfo/lfs-support FAQ: http://www.linuxfromscratch.org/lfs/faq.html Unsubscribe: See the above information page
Re: Udev-165 (apparently) System Crash
On Thu, 2011-01-06 at 02:08:45 AM, Simon Geard wrote ... if the system is completely frozen, it may be hardware related ...Hi Simon,Very good points, overall.When it crashes it's frozen all right (i.e. a crash crash).No hardware changes as of late.Seems some old hardware the latest Udev iteration started disagreeing with.I have a software-"mirror" system, more modern, SATA based, which still boots verysmoothly.If I correctly read the frozen screen, and taking into account the Udev stepwhere the crash occurs, seems an IRQ conflict of some sort.I'll dig into it.The important thing for me is you appear to support me, even if indirectly,in my attempt to avoid kdump in my troubleshoot, if at all possible :)Thank you very much,Best Wishes,-- Alex -- http://linuxfromscratch.org/mailman/listinfo/lfs-support FAQ: http://www.linuxfromscratch.org/lfs/faq.html Unsubscribe: See the above information page
Re: Udev-165 (apparently) System Crash
On Thu, Jan 6, 2011 at 6:36 PM, al...@verizon.net wrote: If I correctly read the frozen screen, and taking into account the Udev step where the crash occurs, seems an IRQ conflict of some sort. I'll dig into it. Intriguing. This is something that you should report to both LKML and linux-hotplug (the Udev list). Also, be sure to provide the kdump of that kernel, the log, and the hardware that you think is causing the Udev issue. -- William Immendorf The ultimate in free computing. Messages in plain text, please, no HTML. GPG key ID: 1697BE98 If it's not signed, it's not from me. -- Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- http://linuxfromscratch.org/mailman/listinfo/lfs-support FAQ: http://www.linuxfromscratch.org/lfs/faq.html Unsubscribe: See the above information page
Re: Udev-165 (apparently) System Crash
On Thu, 2011-01-06 at 18:36 -0600, al...@verizon.net wrote: The important thing for me is you appear to support me, even if indirectly, in my attempt to avoid kdump in my troubleshoot, if at all possible :) Happy to help. Simon. signature.asc Description: This is a digitally signed message part -- http://linuxfromscratch.org/mailman/listinfo/lfs-support FAQ: http://www.linuxfromscratch.org/lfs/faq.html Unsubscribe: See the above information page
Re: Udev-165 (apparently) System Crash
On Wed, 2011-01-05 at 20:12 -0600, al...@verizon.net wrote: QUESTION Is there a simple way of analyzing the crash, maybe with help of the specialists here, as opposed to formally reporting it? Never tried, myself, but udevd supports a couple of options that might help, --debug and --debug-trace. You might try adding those to the command in /etc/rc.d/init.d/udev. Also, if you mount the partition from another system, you can look at the logs in /var/log, and see if anything looks relevant. Finally, if the system is completely frozen, it may be hardware related - either an actual hardware problem, or broken software talking to hardware. So, you could try removing/disconnecting any unnecessary devices, and see if the problem goes away. Simon. signature.asc Description: This is a digitally signed message part -- http://linuxfromscratch.org/mailman/listinfo/lfs-support FAQ: http://www.linuxfromscratch.org/lfs/faq.html Unsubscribe: See the above information page