Bart:

Try turning on the state machine gossip debug statements, so you can see
which request is trying to be processed:  ...bin/pvfs2-set-debugmask -m
<mountpoint> sm, while the servers are running.

My guess is that the pvfs lib has gotten corrupted somehow or there is
some sort of mismatch between the client request that is sent to the
server.

Becky
-- 
Becky Ligon
PVFS Developer
Clemson University
864-650-4065

> Hey guys,
>
> I am running into some server segfaults with the latest release. I have a
> single RHEL 5.5 64bit virtual machine with 4GB memory running four daemons
> using network attached storage. This is 2.8.2 plus a couple of recent
> patches.
>
> The log output looks like this:
>
> Sep 21 07:42:54 node1 PVFS2: [E] SM current state or trtbl is invalid
> (smcb
> = 0x11d9a390)
> Sep 21 07:42:54 node1 PVFS2: [E]      [bt]
> /usr/sbin//pvfs2-server(PINT_state_machine_next+0x81) [0x43c2a0]
> Sep 21 07:42:54 node1 PVFS2: [E]      [bt]
> /usr/sbin//pvfs2-server(PINT_state_machine_continue+0x1d) [0x43c4af]
> Sep 21 07:42:54 node1 PVFS2: [E]      [bt]
> /usr/sbin//pvfs2-server(main+0x54f) [0x410ef2]
> Sep 21 07:42:54 node1 PVFS2: [E]      [bt]
> /lib64/libc.so.6(__libc_start_main+0xf4) [0x368ae1d994]
> Sep 21 07:42:54 node1 PVFS2: [E]      [bt] /usr/sbin//pvfs2-server
> [0x4108e9]
>
> The GDB backtrace looks like this:
>
> #0  0x000000368ae30265 in raise () from /lib64/libc.so.6
> #1  0x000000368ae31d10 in abort () from /lib64/libc.so.6
> #2  0x000000368ae296e6 in __assert_fail () from /lib64/libc.so.6
> #3  0x000000000043c2b9 in PINT_state_machine_next (smcb=0x15d49f10,
> r=0x15c359e0) at ../pvfs2_src/src/common/misc/state-machine-fns.c:246
> #4  0x000000000043c4af in PINT_state_machine_continue (smcb=0x15d49f10,
> r=0x15c359e0) at ../pvfs2_src/src/common/misc/state-machine-fns.c:327
> #5  0x0000000000410ef2 in main (argc=6, argv=0x7fffb3baa138) at
> ../pvfs2_src/src/server/pvfs2-server.c:413
>
> It looks like an assert is failing in this block of state-machine-fns.c:
>
> if (!smcb->current_state || !smcb->current_state->trtbl)
> {
>     gossip_err("SM current state or trtbl is invalid "
>            "(smcb = %p)\n", smcb);
>     gossip_backtrace();
>     assert(0);
>     return -1;
> }
>
> Here is some GDB output of the relevant variables in the if test:
>
> (gdb) print smcb->current_state
> $1 = (struct PINT_state_s *) 0x6d1990
> (gdb) print smcb->current_state->trtbl
> $2 = (struct PINT_tran_tbl_s *) 0x0
>
> This particular server has segfaulted at least 3 times with the same logs
> and core data, so it does not seem to be a fluke. Does anyone recognize
> this
> error or know why that translation table might be empty?
>
> Bart.
> _______________________________________________
> Pvfs2-developers mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to