On Tuesday 12 October 2004 14:49, Herbert Poetzl wrote: > > > > $ ksymoops -o /lib/modules/2.4.27-piv-smp-vs1.29-rc2/ > > > > -m /boot/System.map-2.4.27-piv-smp-vs1.29-rc2 oops1.txt > > > > ksymoops 2.4.5 on i686 2.4.27-piv-smp-vs1.29-rc2. Options used > > > > -V (default) > > > > -k /proc/ksyms (default) > > > > -l /proc/modules (default) > > > > -o /lib/modules/2.4.27-piv-smp-vs1.29-rc2/ (specified) > > > > -m /boot/System.map-2.4.27-piv-smp-vs1.29-rc2 (specified) > > > > > > > > Oct 12 02:11:34 aton kernel: Unable to handle kernel paging request at > > virtual > > > > address 4c4d3760 > > > > Oct 12 02:11:34 aton kernel: c015a00b > > > > Oct 12 02:11:34 aton kernel: *pde = 00000000 > > > > Oct 12 02:11:34 aton kernel: Oops: 0000 > > > > Oct 12 02:11:34 aton kernel: CPU: 3 > > > > Oct 12 02:11:34 aton kernel: EIP: 0010:[do_select+379/576] Not > > tainted > > > > Oct 12 02:11:34 aton kernel: EFLAGS: 00010202 > > > > Oct 12 02:11:34 aton kernel: eax: 4c4d374c ebx: 00000000 ecx: 00000145 > > > > edx: ef741d00 > > > > Oct 12 02:11:34 aton kernel: esi: d0f9d600 edi: 00000015 ebp: 00200000 > > > > esp: f1b83f20 > > > > Oct 12 02:11:34 aton kernel: ds: 0018 es: 0018 ss: 0018 > > > > Oct 12 02:11:34 aton kernel: Process caspeng (pid: 1022, > > stackpage=f1b83000) > > > > Oct 12 02:11:34 aton kernel: Stack: c8218380 00000000 00000145 f1b82000 > > > > 00000000 00000000 00000000 00000000 > > > > Oct 12 02:11:34 aton kernel: c4bcb000 00000000 00000400 c429b300 > > > > bf7ff95c c015a449 00000020 f1b83f90 > > > > Oct 12 02:11:34 aton kernel: f1b83f8c 00000000 00000080 00000080 > > > > 0000041f c0380a08 fffffffd 00000020 > > > > Warning (Oops_read): Code line not seen, dumping what data is available > > > > > > > > > > > > >>eax; 4c4d374c Before first symbol > > > > >>edx; ef741d00 <_end+2f346ca8/38891008> > > > > >>esi; d0f9d600 <_end+10ba25a8/38891008> > > > > >>ebp; 00200000 Before first symbol > > > > >>esp; f1b83f20 <_end+31788ec8/38891008> > > > > > > > > I don't have any more lines of oops output. > > > > > > > > $ addr2line -f -e vmlinux1 c015a00b > > > > do_select > > > > /usr/src/2.4.27/linux-2.4.27/fs/select.c:197 > > > > > > > > mask = POLLNVAL; > > > > if (file) { > > > > mask = DEFAULT_POLLMASK; > > > > OOPS--> if (file->f_op && file->f_op->poll) > > > > mask = file->f_op->poll(file, > > wait); > > > > fput(file); > > > > } > > > > > > hmm, file is checked above, so file->f_op should be > > > fine ergo file->f_op->poll must be bad ... question > > > is, why ... > > > > > > could you disasm (objdump) the relevant function > > > to see how the deref is coded? > > > > $ objdump -d --start-address=0xC0159E90 --stop-address=0xc015a0d2 vmlinux1 > > > > vmlinux1: file format elf32-i386 > > > > Disassembly of section .text: > > > > c0159e90 <do_select>: > > ... > > > c015a004: 8b 46 10 mov 0x10(%esi),%eax > > c015a007: 85 c0 test %eax,%eax > > c015a009: 74 0b je c015a016 <do_select+0x186> > > =================== OOPS @ <do_select+0x17b> ======================== > > this means %eax contains file->f_op, but %eax > is 0x4c4d374c and kernel space starts at 0xc0000000 > (in your case) so the address is bogous ... > > but, 0xc = 0x4 + 0x8 so a single bit failure might > be enough to make this a valid address ... > OTOH, 0x4c = L, 0x4d = M and 0x37 = 7 ;) > > best, > Herbert Right.
Now this hardware has ECC on RAM and CPU caches, which should be able to correct 1 bit errors and detects more severe ones. After looking trough IPMI logs I did not find any ECC or CPU errors, nor ECC error corrections. I guess its one of the xeons itself, probably a ramification of a power supply that died a year ago. Thanks, very much. -- lg, Chris _______________________________________________ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver