ok for whatever reason, with debugging and backtraces enabled,  I
can't get it to fail as usual. Now everything just hangs where one of
the server processes would normally die. I was able to find out which
process was actually hung by trying to kill each of them. The one that
would not die with a basic TERMINATE was then attached to with gdb and
manually backtraced.

#0 0xb7f79410 in __kernel_vsyscall ()
#1 0x4e064a2e in __lll_mutex_lock_wait () from /lib/libc.so.6
#2 0x4e00b1bb in _L_lock_1782 () from /lib/libc.so.6
#3 0x4e00af54 in __tz_convert () from /lib/libc.so.6
#4 0x4e00956f in localtime () from /lib/libc.so.6
#5 0x0809c5e9 in gossip_debug_fp_va (fp=0x81000c8, prefix=69 'E',
format=0x80d192c "\nPVFS2 server got signal %d (server_status_flag:
%d)\n", ap=0xbfa3a5a4 "\017", ts=GOSSIP_LOGSTAMP_USEC)
at src/common/gossip/gossip.c:474
#6 0x0809c39b in gossip_err (
format=0x80d192c "\nPVFS2 server got signal %d (server_status_flag: %d)\n")
at src/common/gossip/gossip.c:359
#7 0x08056105 in server_sig_handler (sig=15) at src/server/pvfs2-server.c:1547
#8 <signal handler called>
#9 0xb7f79410 in __kernel_vsyscall ()
#10 0x4e064a2e in __lll_mutex_lock_wait () from /lib/libc.so.6
#11 0x4e00b1bb in _L_lock_1782 () from /lib/libc.so.6
#12 0x4e00af54 in __tz_convert () from /lib/libc.so.6
#13 0x4e00956f in localtime () from /lib/libc.so.6
#14 0x0809c5e9 in gossip_debug_fp_va (fp=0x81000c8, prefix=69 'E',
format=0x80d1380 "PVFS2 server: signal %d, faulty address is %p, from
%p\n", ap=0xbfa3ae24 "\v", ts=GOSSIP_LOGSTAMP_USEC) at
src/common/gossip/gossip.c:474
#15 0x0809c39b in gossip_err (
---Type <return> to continue, or q <return> to quit---
format=0x80d1380 "PVFS2 server: signal %d, faulty address is %p, from
%p\n") at src/common/gossip/gossip.c:359
#16 0x08055619 in bt_sighandler (sig=11, info=0xbfa3ae9c, secret=0xbfa3af1c)
at src/server/pvfs2-server.c:1364
#17 <signal handler called>
#18 0x4dff0872 in _int_free () from /lib/libc.so.6
#19 0x4dff45b0 in free () from /lib/libc.so.6
#20 0x08097e6c in PINT_smcb_free (smcb=0xb520c480)
at src/common/misc/state-machine-fns.c:541
#21 0x08056f1d in server_state_machine_terminate (smcb=0xb520c480,
js_p=0x8101500) at src/server/pvfs2-server.c:2004
#22 0x080971bf in PINT_state_machine_terminate (smcb=0xb520c480, r=0x8101500)
at src/common/misc/state-machine-fns.c:101
#23 0x08097913 in PINT_state_machine_continue (smcb=0xb520c480, r=0x8101500)
at src/common/misc/state-machine-fns.c:332
#24 0x08053fd3 in main (argc=205922, argv=0x0) at src/server/pvfs2-server.c:635

So, looks like a Sig11 during free()... from PINT_smcb_free. Where do
we go from here?

-- 
Ian Morgan
Software Developer
Teledyne Controls Simulation Ltd.
1-5480 Canotek Rd.
Ottawa, ON  K1J 9H5
613-749-6980 x354

On Nov 19, 2007 12:50 PM, Sam Lang <[EMAIL PROTECTED]> wrote:
>
> Hi Ian,
>
> The log doesn't include any errors, so I have to assume the server is
> crashing before writing any to the log.  Is the server compiled with
> debug symbols?  Is there a core dump on the node where the server
> died?  If so, can you send it to me?  You might need to re-configure
> and re-comile the source with debugging symbols enabled:
>
> make clean
> CFLAGS=-g ./configure --enable-strict ....
> make
>
> Thanks,
> -sam
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to