Hi Randy,
I don't have any ideas where the problem is, but could you try running
the server in gdb? That may give you a better backtrace. The code
that generates the backtrace and writes it to the log when a segfault
occurs isn't that reliable and may not work on your system. Also, you
could try running the server in valgrind to see if there's memory
errors elsewhere. That may pinpoint the problem better.
Was PVFS configured with optimizations (--enable-fast), or without (--
enable-strict)? And did you specify CFLAGS=-g when running
configure? For debugging environments I usually run configure like
this:
CFLAGS=-g ./configure --enable-strict
That will enable the most debugging, and hopefully get better
backtraces.
-sam
On Jul 15, 2009, at 7:47 AM, Randall Martin wrote:
I occasionally get a server crash in what appears to be src/io/flow/
flowproto-bmi-trove/flowproto-multiqueue.c. The backtrace is
useless. I’m running off the head branch code that I compiled on 7/3.
[E 07/14 18:06] PVFS2 server: signal 11, faulty address is (nil),
from (nil)
[E 07/14 18:06] [bt] [(nil)]
[D 07/15 08:19] PVFS2 Server version 2.8.1pre1-2009-07-03-123548
starting.
I added a few extra gossip_err statements in the handle_io_error
routine and narrowed it down to the following few lines:
else if (src == TROVE_ENDPOINT && dest == BMI_ENDPOINT)
{
ret = cancel_pending_trove(&flow_data->src_list,
flow_data->parent->src.u.trove.coll_id);
flow_data->cleanup_pending_count += ret;
Any ideas?
Thanks,
Randy
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers