Hey everyone -

Nick and I are digging a little bit into trove and have found a bit of
a bug.  When trove debugging is enabled (by way of the config file
"trove" flag) the server will crash under I/O calls (namely pvfs2-cp).
 It sometimes runs for a few seconds before crashing but it's
consistent enough to seg fault every time I try to transfer a 256 MB
file onto or off of the server.  I tested on both 32 bit RHEL5 and 64
bit Fedora 10, release 2.8.1 on both.  Only one server was running and
it was acting as both a metadata and an I/O server.

I believe it's something to do with threading since it happens when
printing out a status message (I'm fairly certain the call to
gossip_debug() on line 327 of dbpf-bstream.c is the culprit).  Here is
the last bit of the log file and the stack trace from gdb on RHEL5 32
bit:


[D 05/26 13:39] aio_progress_notification: BSTREAM_READ_LIST complete:
aio_return() says 262144 [fd = 11]
[D 05/26 13:39] *** starting delayed ops if any (state is LIST_PROC_ALLPOSTED)
[D 05/26 13:39] DBPF I/O ops in progress: 1
[New Thread 0xb56a0b90 (LWP 1272)]
[Thread 0xb2cfeb90 (LWP 1271) exited]
[D 05/26 13:39] issue_or_delay_io_operation: lio_listio posted
0xa0d0ec8 (handle 9223372036854775805, ret 0)
[D 05/26 13:39]  --- aio_progress_notification called with handle
9223372036854775805 (0xa0d0ec8)
[D 05/26 13:39] aio_progress_notification: BSTREAM_READ_LIST complete:
aio_return() says 262144 [fd = 11]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb56a0b90 (LWP 1272)]
0x00c04993 in strlen () from /lib/libc.so.6
(gdb) bt
#0  0x00c04993 in strlen () from /lib/libc.so.6
#1  0x00bd4bce in vfprintf () from /lib/libc.so.6
#2  0x00bf53b4 in vsnprintf () from /lib/libc.so.6
#3  0x08059bed in gossip_debug_fp_va (fp=0xb569fb5c,
   prefix=<value optimized out>,
   format=0xb569fc80 "*** starting delayed ops if any (state is ST
complete: aio_return() says 262144 [fd = 11]\n", ap=0xb56a00d0 "t:
hpz\016\b", ts=13455348)
   at src/common/gossip/gossip.c:506
#4  0x0805a041 in __gossip_debug (mask=65536, prefix=63 '?',
   format=0x80dc3b0 "*** starting delayed ops if any (state is %s)\n")
   at src/common/gossip/gossip.c:281
#5  0x080a9ed9 in aio_progress_notification (sig=
     {sival_int = 168627912, sival_ptr = 0xa0d0ec8})
   at src/io/trove/trove-dbpf/dbpf-bstream.c:237
#6  0x080ba89c in alt_lio_thread (foo=0xa0d0ce8)
   at src/io/trove/trove-dbpf/dbpf-alt-aio.c:275
#7  0x00d0f49b in start_thread () from /lib/libpthread.so.0
#8  0x00c6642e in clone () from /lib/libc.so.6



Thanks,
- Dave

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to