Hey Becky, Thanks for the recommendation. I turned on the debugging, but it could be days or longer before another segfault occurs. The three errors I have seen occurred over the course of about 5 weeks.
Is there any way I could test either of those scenarios you mentioned or otherwise narrow it down? I would really like to get this figured out before any other file systems start experiencing problems. Thanks, Bart. On Tue, Sep 21, 2010 at 2:39 PM, Becky Ligon <[email protected]> wrote: > Bart: > > Try turning on the state machine gossip debug statements, so you can see > which request is trying to be processed: ...bin/pvfs2-set-debugmask -m > <mountpoint> sm, while the servers are running. > > My guess is that the pvfs lib has gotten corrupted somehow or there is > some sort of mismatch between the client request that is sent to the > server. > > Becky > -- > Becky Ligon > PVFS Developer > Clemson University > 864-650-4065 > > > Hey guys, > > > > I am running into some server segfaults with the latest release. I have a > > single RHEL 5.5 64bit virtual machine with 4GB memory running four > daemons > > using network attached storage. This is 2.8.2 plus a couple of recent > > patches. > > > > The log output looks like this: > > > > Sep 21 07:42:54 node1 PVFS2: [E] SM current state or trtbl is invalid > > (smcb > > = 0x11d9a390) > > Sep 21 07:42:54 node1 PVFS2: [E] [bt] > > /usr/sbin//pvfs2-server(PINT_state_machine_next+0x81) [0x43c2a0] > > Sep 21 07:42:54 node1 PVFS2: [E] [bt] > > /usr/sbin//pvfs2-server(PINT_state_machine_continue+0x1d) [0x43c4af] > > Sep 21 07:42:54 node1 PVFS2: [E] [bt] > > /usr/sbin//pvfs2-server(main+0x54f) [0x410ef2] > > Sep 21 07:42:54 node1 PVFS2: [E] [bt] > > /lib64/libc.so.6(__libc_start_main+0xf4) [0x368ae1d994] > > Sep 21 07:42:54 node1 PVFS2: [E] [bt] /usr/sbin//pvfs2-server > > [0x4108e9] > > > > The GDB backtrace looks like this: > > > > #0 0x000000368ae30265 in raise () from /lib64/libc.so.6 > > #1 0x000000368ae31d10 in abort () from /lib64/libc.so.6 > > #2 0x000000368ae296e6 in __assert_fail () from /lib64/libc.so.6 > > #3 0x000000000043c2b9 in PINT_state_machine_next (smcb=0x15d49f10, > > r=0x15c359e0) at ../pvfs2_src/src/common/misc/state-machine-fns.c:246 > > #4 0x000000000043c4af in PINT_state_machine_continue (smcb=0x15d49f10, > > r=0x15c359e0) at ../pvfs2_src/src/common/misc/state-machine-fns.c:327 > > #5 0x0000000000410ef2 in main (argc=6, argv=0x7fffb3baa138) at > > ../pvfs2_src/src/server/pvfs2-server.c:413 > > > > It looks like an assert is failing in this block of state-machine-fns.c: > > > > if (!smcb->current_state || !smcb->current_state->trtbl) > > { > > gossip_err("SM current state or trtbl is invalid " > > "(smcb = %p)\n", smcb); > > gossip_backtrace(); > > assert(0); > > return -1; > > } > > > > Here is some GDB output of the relevant variables in the if test: > > > > (gdb) print smcb->current_state > > $1 = (struct PINT_state_s *) 0x6d1990 > > (gdb) print smcb->current_state->trtbl > > $2 = (struct PINT_tran_tbl_s *) 0x0 > > > > This particular server has segfaulted at least 3 times with the same logs > > and core data, so it does not seem to be a fluke. Does anyone recognize > > this > > error or know why that translation table might be empty? > > > > Bart. > > _______________________________________________ > > Pvfs2-developers mailing list > > [email protected] > > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers > > > >
_______________________________________________ Pvfs2-developers mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
