On Tue, Oct 12, 2004 at 02:29:06PM +0200, Christian Mayrhuber wrote:
> On Tuesday 12 October 2004 13:45, Herbert Poetzl wrote:
> > On Tue, Oct 12, 2004 at 01:22:11PM +0200, Christian Mayrhuber wrote:
> > > Hi, 
> > > 
> > > caspeng strikes again...
> > > 
> > > $ cat /proc/version
> > > Linux version 2.4.27-piv-smp-vs1.29-rc2 ([EMAIL PROTECTED]) (gcc version 3.3.4 
> > > (Debian 1:3.3.4-6sarge1)) #1 SMP Tue Sep 21 13:33:16 CEST 2004
> > > 
> > > $ lsmod
> > > Module                  Size  Used by    Not tainted
> > > nfs                    74040   1  (autoclean)
> > > lockd                  50288   1  (autoclean) [nfs]
> > > sunrpc                 74304   1  (autoclean) [nfs lockd]
> > > autofs                 10388   1  (autoclean)
> > > loop                    9496   0  (autoclean)
> > > e1000                  68364   1
> > > rtc                     7080   0  (autoclean)
> > > 
> > > Reiserfs with chris masons data logging patch, scsi, scsi-disk and
> > > the megaraid2 driver are compiled into the kernel.
> > > 
> > > $ ksymoops  -o /lib/modules/2.4.27-piv-smp-vs1.29-rc2/ 
> > > -m /boot/System.map-2.4.27-piv-smp-vs1.29-rc2 oops1.txt
> > > ksymoops 2.4.5 on i686 2.4.27-piv-smp-vs1.29-rc2.  Options used
> > >      -V (default)
> > >      -k /proc/ksyms (default)
> > >      -l /proc/modules (default)
> > >      -o /lib/modules/2.4.27-piv-smp-vs1.29-rc2/ (specified)
> > >      -m /boot/System.map-2.4.27-piv-smp-vs1.29-rc2 (specified)
> > > 
> > > Oct 12 02:11:34 aton kernel: Unable to handle kernel paging request at 
> virtual 
> > > address 4c4d3760
> > > Oct 12 02:11:34 aton kernel: c015a00b
> > > Oct 12 02:11:34 aton kernel: *pde = 00000000
> > > Oct 12 02:11:34 aton kernel: Oops: 0000
> > > Oct 12 02:11:34 aton kernel: CPU:    3
> > > Oct 12 02:11:34 aton kernel: EIP:    0010:[do_select+379/576]    Not 
> tainted
> > > Oct 12 02:11:34 aton kernel: EFLAGS: 00010202
> > > Oct 12 02:11:34 aton kernel: eax: 4c4d374c   ebx: 00000000   ecx: 00000145   
> > > edx: ef741d00
> > > Oct 12 02:11:34 aton kernel: esi: d0f9d600   edi: 00000015   ebp: 00200000   
> > > esp: f1b83f20
> > > Oct 12 02:11:34 aton kernel: ds: 0018   es: 0018   ss: 0018
> > > Oct 12 02:11:34 aton kernel: Process caspeng (pid: 1022, 
> stackpage=f1b83000)
> > > Oct 12 02:11:34 aton kernel: Stack: c8218380 00000000 00000145 f1b82000 
> > > 00000000 00000000 00000000 00000000
> > > Oct 12 02:11:34 aton kernel:        c4bcb000 00000000 00000400 c429b300 
> > > bf7ff95c c015a449 00000020 f1b83f90
> > > Oct 12 02:11:34 aton kernel:        f1b83f8c 00000000 00000080 00000080 
> > > 0000041f c0380a08 fffffffd 00000020
> > > Warning (Oops_read): Code line not seen, dumping what data is available
> > > 
> > > 
> > > >>eax; 4c4d374c Before first symbol
> > > >>edx; ef741d00 <_end+2f346ca8/38891008>
> > > >>esi; d0f9d600 <_end+10ba25a8/38891008>
> > > >>ebp; 00200000 Before first symbol
> > > >>esp; f1b83f20 <_end+31788ec8/38891008>
> > > 
> > > I don't have any more lines of oops output.
> > > 
> > > $ addr2line -f -e vmlinux1 c015a00b
> > > do_select
> > > /usr/src/2.4.27/linux-2.4.27/fs/select.c:197
> > > 
> > >                         mask = POLLNVAL;
> > >                         if (file) {
> > >                                 mask = DEFAULT_POLLMASK;
> > > OOPS-->                         if (file->f_op && file->f_op->poll)
> > >                                         mask = file->f_op->poll(file, 
> wait);
> > >                                 fput(file);
> > >                         }
> > 
> > hmm, file is checked above, so file->f_op should be 
> > fine ergo file->f_op->poll must be bad ... question
> > is, why ...
> > 
> > could you disasm (objdump) the relevant function
> > to see how the deref is coded?
> 
> $ objdump -d --start-address=0xC0159E90 --stop-address=0xc015a0d2  vmlinux1
> 
> vmlinux1:     file format elf32-i386
> 
> Disassembly of section .text:
> 
> c0159e90 <do_select>:

 ...

> c015a004:       8b 46 10                mov    0x10(%esi),%eax
> c015a007:       85 c0                   test   %eax,%eax
> c015a009:       74 0b                   je     c015a016 <do_select+0x186>
> =================== OOPS @ <do_select+0x17b> ========================

this means %eax contains file->f_op, but %eax
is 0x4c4d374c and kernel space starts at 0xc0000000
(in your case) so the address is bogous ...

but, 0xc = 0x4 + 0x8 so a single bit failure might
be enough to make this a valid address ...
OTOH, 0x4c = L, 0x4d = M and 0x37 = 7 ;) 

best,
Herbert

> c015a00b:       8b 50 14                mov    0x14(%eax),%edx
> c015a00e:       85 d2                   test   %edx,%edx

> > > In 2.4.26-vs1.27 a oops triggered by caspeng occured at:
> > > sock_readv_writev
> > > /usr/src/2.4.26/linux-2.4.26-vs1.27/net/socket.c:636
> > > 
> > > Caspeng inflicting two oopses at two totally different
> > > locations looks very strange to me.
> > > 
> > > Neither the vs1.29, nor the reiserfs data logging patch
> > > touches fs/select.c.
> > > 
> > > Should I forward this to linux-kernel?
> > 
> > could be silent data corruption, you could also
> > look for reiser doing strange things with f_op(->poll)
> > for sure the linux-vserver code doesn't touch that
> > either ...
> 
> # cd /usr/src/2.4.27/linux-2.4.27/fs/reiserfs
> # find . -name "*.c" -exec grep f_op \{\} \;
> # find . -name "*.c" -exec grep poll \{\} \;
> 
> Reiserfs doesn't touch neither file->f_op, nor ->poll, it seems.
> 
> -- 
> lg, Chris
_______________________________________________
Vserver mailing list
[EMAIL PROTECTED]
http://list.linux-vserver.org/mailman/listinfo/vserver

Reply via email to