And, your log file is gonna get really big, too.  I would try recompiling
everything and restarting the servers and clients....just to be sure. 
Otherwise, I'm not sure.  I've never seen that particular problem before;
my guess is just a wild guess!

Becky
-- 
Becky Ligon
PVFS Developer
Clemson University
864-650-4065

> Hey Becky,
>
> Thanks for the recommendation. I turned on the debugging, but it could be
> days or longer before another segfault occurs. The three errors I have
> seen
> occurred over the course of about 5 weeks.
>
> Is there any way I could test either of those scenarios you mentioned or
> otherwise narrow it down? I would really like to get this figured out
> before
> any other file systems start experiencing problems.
>
> Thanks,
> Bart.
>
>
> On Tue, Sep 21, 2010 at 2:39 PM, Becky Ligon <[email protected]> wrote:
>
>> Bart:
>>
>> Try turning on the state machine gossip debug statements, so you can see
>> which request is trying to be processed:  ...bin/pvfs2-set-debugmask -m
>> <mountpoint> sm, while the servers are running.
>>
>> My guess is that the pvfs lib has gotten corrupted somehow or there is
>> some sort of mismatch between the client request that is sent to the
>> server.
>>
>> Becky
>> --
>> Becky Ligon
>> PVFS Developer
>> Clemson University
>> 864-650-4065
>>
>> > Hey guys,
>> >
>> > I am running into some server segfaults with the latest release. I
>> have a
>> > single RHEL 5.5 64bit virtual machine with 4GB memory running four
>> daemons
>> > using network attached storage. This is 2.8.2 plus a couple of recent
>> > patches.
>> >
>> > The log output looks like this:
>> >
>> > Sep 21 07:42:54 node1 PVFS2: [E] SM current state or trtbl is invalid
>> > (smcb
>> > = 0x11d9a390)
>> > Sep 21 07:42:54 node1 PVFS2: [E]      [bt]
>> > /usr/sbin//pvfs2-server(PINT_state_machine_next+0x81) [0x43c2a0]
>> > Sep 21 07:42:54 node1 PVFS2: [E]      [bt]
>> > /usr/sbin//pvfs2-server(PINT_state_machine_continue+0x1d) [0x43c4af]
>> > Sep 21 07:42:54 node1 PVFS2: [E]      [bt]
>> > /usr/sbin//pvfs2-server(main+0x54f) [0x410ef2]
>> > Sep 21 07:42:54 node1 PVFS2: [E]      [bt]
>> > /lib64/libc.so.6(__libc_start_main+0xf4) [0x368ae1d994]
>> > Sep 21 07:42:54 node1 PVFS2: [E]      [bt] /usr/sbin//pvfs2-server
>> > [0x4108e9]
>> >
>> > The GDB backtrace looks like this:
>> >
>> > #0  0x000000368ae30265 in raise () from /lib64/libc.so.6
>> > #1  0x000000368ae31d10 in abort () from /lib64/libc.so.6
>> > #2  0x000000368ae296e6 in __assert_fail () from /lib64/libc.so.6
>> > #3  0x000000000043c2b9 in PINT_state_machine_next (smcb=0x15d49f10,
>> > r=0x15c359e0) at ../pvfs2_src/src/common/misc/state-machine-fns.c:246
>> > #4  0x000000000043c4af in PINT_state_machine_continue
>> (smcb=0x15d49f10,
>> > r=0x15c359e0) at ../pvfs2_src/src/common/misc/state-machine-fns.c:327
>> > #5  0x0000000000410ef2 in main (argc=6, argv=0x7fffb3baa138) at
>> > ../pvfs2_src/src/server/pvfs2-server.c:413
>> >
>> > It looks like an assert is failing in this block of
>> state-machine-fns.c:
>> >
>> > if (!smcb->current_state || !smcb->current_state->trtbl)
>> > {
>> >     gossip_err("SM current state or trtbl is invalid "
>> >            "(smcb = %p)\n", smcb);
>> >     gossip_backtrace();
>> >     assert(0);
>> >     return -1;
>> > }
>> >
>> > Here is some GDB output of the relevant variables in the if test:
>> >
>> > (gdb) print smcb->current_state
>> > $1 = (struct PINT_state_s *) 0x6d1990
>> > (gdb) print smcb->current_state->trtbl
>> > $2 = (struct PINT_tran_tbl_s *) 0x0
>> >
>> > This particular server has segfaulted at least 3 times with the same
>> logs
>> > and core data, so it does not seem to be a fluke. Does anyone
>> recognize
>> > this
>> > error or know why that translation table might be empty?
>> >
>> > Bart.
>> > _______________________________________________
>> > Pvfs2-developers mailing list
>> > [email protected]
>> > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>> >
>>
>>
>

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to