> Gah... My reading skills need help, apperently. That was
> 2.6.18-128.7.1.el5 with openafs 1.4.11.
I suspect we should probably move this into RT, but I thought recording the
steps taken so far might be of use to others.
I grabbed, and installed, the debug package for this module onto a RHEL5
system. I then grabbed the kmod itself, and extracted it using rpm2cpio <rpm> |
cpio -i -d
Starting up gdb on the kernel module then lets you do some poking...
First, we want to find where we've stopped. So, first we need to get the base
address of the afs_GetDCache function...
(gdb) info line afs_GetDCache
Line 1497 of
"/usr/src/debug/openafs-kmod-1.4.11/_kmod_build_/src/libafs/MODLOAD-2.6.18-128.7.1.el5-MP/afs_dcache.c"
starts at address 0xb3cf <afs_GetDCache>
and ends at 0xb3d9 <afs_GetDCache+10>.
Next, we want to find out where the problem we hit actually is...
(gdb) info line *(0xb3cf + 0x1c0a)
Line 2159 of
"/usr/src/debug/openafs-kmod-1.4.11/_kmod_build_/src/libafs/MODLOAD-2.6.18-128.7.1.el5-MP/afs_dcache.c"
starts at address 0xcfc8 <afs_GetDCache+7161 at
/usr/src/debug/openafs-kmod-1.4.11/_kmod_build_/src/libafs/MODLOAD-2.6.18-128.7.1.el5-MP/afs_dcache.c:2159>
and ends at 0xcfe6 <afs_GetDCache+7191 at
/usr/src/debug/openafs-kmod-1.4.11/_kmod_build_/src/libafs/MODLOAD-2.6.18-128.7.1.el5-MP/afs_dcache.c:2160>.
Line 2159 of afs_dcache.c (in this version) is:
if (code == RXGEN_OPCODE || afs_serverHasNo64Bit(tc)) {
afs_serverHasNo64Bit is a macro, which does:
((tc)->srvr->server->flags & SNO_64BIT)
So, in the code, we want to know whether we're possibly in the right place.
Let's take a look at the actual code we were running...
(gdb) disass 0xcfc8 0xcfe6
Dump of assembler code from 0xcfc8 to 0xcfe6:
0x0000cfc8 <afs_GetDCache+7161>: cmpl $0xfffffe39,0x3c(%esp)
0x0000cfd0 <afs_GetDCache+7169>: je 0xcfe6 <afs_GetDCache+7191 at
/usr/src/debug/openafs-kmod-1.4.11/_kmod_build_/src/libafs/MODLOAD-2.6.18-128.7.1.el5-MP/afs_dcache.c:2160>
0x0000cfd2 <afs_GetDCache+7171>: mov 0x68(%esp),%ebp
0x0000cfd6 <afs_GetDCache+7175>: mov 0xc(%ebp),%eax
0x0000cfd9 <afs_GetDCache+7178>: mov 0x8(%eax),%eax
0x0000cfdc <afs_GetDCache+7181>: testb $0x2,0x3d(%eax)
0x0000cfe0 <afs_GetDCache+7185>: je 0xd0de <afs_GetDCache+7439 at
/usr/src/debug/openafs-kmod-1.4.11/_kmod_build_/src/libafs/MODLOAD-2.6.18-128.7.1.el5-MP/afs_dcache.c:2176>
End of assembler dump.
We die at 0x0000cfd9. By looking at the stack offsets of that macro reference,
we can correlate them with the above code. Given that tc is an afs_conn, we
have ...
(gdb) print &((struct afs_conn *)0)->srvr
$1 = (struct srvAddr **) 0xc
(gdb) print &((struct srvAddr *)0)->server
$2 = (struct server **) 0x8
(So srvr is 0xc bytes into the structure pointed at by 'tc', and server is 0x8
bytes into this structure. These match with the offsets done by the mov
instructions at 0xcfd6 and 0xcfd9, indicating that we're looking in the right
place)
Given we fail at 0xcfd9, it looks like for some reason the structure pointed to
by 'tc' contains an invalid value (0x63, if you look at the contents of EAX in
the panic dump) for its srvr element.
S.
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info