On Wednesday, December 15, 2004 14:02:26 -0500 Terry Gliedt <[EMAIL PROTECTED]> wrote:
####### from /var/log/messages Watch for line wraps
Unable to handle kernel NULL pointer dereference at virtual address 00000004 printing eip: f8b73af8 *pde = 2bcc0001 *pte = 00000000 Oops: 0000 CPU: 2 EIP: 0010:[<f8b73af8>] Tainted: PF EFLAGS: 00010282 eax: 20003312 ebx: f8c4be14 ecx: ec6b5dfc edx: 00000000 esi: f8c4c038 edi: ec6b5da0 ebp: ec6b5da0 esp: ecbbfe40 ds: 0018 es: 0018 ss: 0018 Process cp (pid: 3288, stackpage=ecbbf000) Stack: f9417000 ecbbe000 00000000 f8c4be14 f8c4c038 ecbbfe90 ec6b5da0 f8b776b2 ec6b5da0 ec6b5dfc 00000002 ecbbfe90 c0360a00 ec71ad20 00000001 f9417000 ec6b5dfc f8c4c038 ec6b5dfc 0000ffff 0001e194 00000040 f8ba22c0 f8b78a00 Call Trace: [<f8b776b2>] [<f8ba22c0>] [<f8b78a00>] [<c01611ed>] [<c0161a22>] [<c01620c9>] [<c0162429>] [<c0153443>] [<c016c8d1>] [<c0155f88>] [<c01befd5>] [<c01bf0df>] [<c010b8bc>]
Code: 39 42 04 0f 84 c7 00 00 00 e8 3a e7 ff ff 89 c5 50 8d 44 24
That's not surprising. In all of the cases you described where a process randomly seg faults, you should see output like that in /var/log/messages or in dmesg output. There are a wide variety of bad things that, if user code does them, cause the program to exit on a signal like SIGSEGV or SIGBUS, and drop a core file. In Linux, if one of these things happens in kernel code, the process exits on SIGSEGV (no core), and you get an "oops" message which contains information about the state of the kernel at the time of the failure. That's what the message you quoted is.
Unfortunately, the oops message is not useful in its raw form. All of the numbers you see in [<>] are actually addresses inside the kernel. In order for the backtrace to be useful, these need to be converted to symbolic form. This is usually done automatically by the logging software, if it can find the kernel symbol table, which is usually available in a file called "System.map". Since the conversion did not happen automatically, you will need to either find and use ksymoops, or reconfigure the kernel logging software to do the translation, and then reproduce the problem again.
The simplest thing to do is to make sure that klogd is able to find the System.map file, and that it is not invoked with -x. You will probably get the best results by running klogd with -p, so it will reload symbol table information when it sees an error (otherwise it may not have a complete set of symbols for openafs).
FWIW, I have not heard of anyone getting OpenAFS and OpenMosix to work together, even to the extent that you've reported so far. We have had several reports of failures in the past, though...
-- Jeffrey T. Hutzelman (N3NHS) <[EMAIL PROTECTED]> Sr. Research Systems Programmer School of Computer Science - Research Computing Facility Carnegie Mellon University - Pittsburgh, PA
_______________________________________________ OpenAFS-devel mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-devel
