ok for whatever reason, with debugging and backtraces enabled, I can't get it to fail as usual. Now everything just hangs where one of the server processes would normally die. I was able to find out which process was actually hung by trying to kill each of them. The one that would not die with a basic TERMINATE was then attached to with gdb and manually backtraced.
#0 0xb7f79410 in __kernel_vsyscall () #1 0x4e064a2e in __lll_mutex_lock_wait () from /lib/libc.so.6 #2 0x4e00b1bb in _L_lock_1782 () from /lib/libc.so.6 #3 0x4e00af54 in __tz_convert () from /lib/libc.so.6 #4 0x4e00956f in localtime () from /lib/libc.so.6 #5 0x0809c5e9 in gossip_debug_fp_va (fp=0x81000c8, prefix=69 'E', format=0x80d192c "\nPVFS2 server got signal %d (server_status_flag: %d)\n", ap=0xbfa3a5a4 "\017", ts=GOSSIP_LOGSTAMP_USEC) at src/common/gossip/gossip.c:474 #6 0x0809c39b in gossip_err ( format=0x80d192c "\nPVFS2 server got signal %d (server_status_flag: %d)\n") at src/common/gossip/gossip.c:359 #7 0x08056105 in server_sig_handler (sig=15) at src/server/pvfs2-server.c:1547 #8 <signal handler called> #9 0xb7f79410 in __kernel_vsyscall () #10 0x4e064a2e in __lll_mutex_lock_wait () from /lib/libc.so.6 #11 0x4e00b1bb in _L_lock_1782 () from /lib/libc.so.6 #12 0x4e00af54 in __tz_convert () from /lib/libc.so.6 #13 0x4e00956f in localtime () from /lib/libc.so.6 #14 0x0809c5e9 in gossip_debug_fp_va (fp=0x81000c8, prefix=69 'E', format=0x80d1380 "PVFS2 server: signal %d, faulty address is %p, from %p\n", ap=0xbfa3ae24 "\v", ts=GOSSIP_LOGSTAMP_USEC) at src/common/gossip/gossip.c:474 #15 0x0809c39b in gossip_err ( ---Type <return> to continue, or q <return> to quit--- format=0x80d1380 "PVFS2 server: signal %d, faulty address is %p, from %p\n") at src/common/gossip/gossip.c:359 #16 0x08055619 in bt_sighandler (sig=11, info=0xbfa3ae9c, secret=0xbfa3af1c) at src/server/pvfs2-server.c:1364 #17 <signal handler called> #18 0x4dff0872 in _int_free () from /lib/libc.so.6 #19 0x4dff45b0 in free () from /lib/libc.so.6 #20 0x08097e6c in PINT_smcb_free (smcb=0xb520c480) at src/common/misc/state-machine-fns.c:541 #21 0x08056f1d in server_state_machine_terminate (smcb=0xb520c480, js_p=0x8101500) at src/server/pvfs2-server.c:2004 #22 0x080971bf in PINT_state_machine_terminate (smcb=0xb520c480, r=0x8101500) at src/common/misc/state-machine-fns.c:101 #23 0x08097913 in PINT_state_machine_continue (smcb=0xb520c480, r=0x8101500) at src/common/misc/state-machine-fns.c:332 #24 0x08053fd3 in main (argc=205922, argv=0x0) at src/server/pvfs2-server.c:635 So, looks like a Sig11 during free()... from PINT_smcb_free. Where do we go from here? -- Ian Morgan Software Developer Teledyne Controls Simulation Ltd. 1-5480 Canotek Rd. Ottawa, ON K1J 9H5 613-749-6980 x354 On Nov 19, 2007 12:50 PM, Sam Lang <[EMAIL PROTECTED]> wrote: > > Hi Ian, > > The log doesn't include any errors, so I have to assume the server is > crashing before writing any to the log. Is the server compiled with > debug symbols? Is there a core dump on the node where the server > died? If so, can you send it to me? You might need to re-configure > and re-comile the source with debugging symbols enabled: > > make clean > CFLAGS=-g ./configure --enable-strict .... > make > > Thanks, > -sam _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
