Hi Randy,

I don't have any ideas where the problem is, but could you try running the server in gdb? That may give you a better backtrace. The code that generates the backtrace and writes it to the log when a segfault occurs isn't that reliable and may not work on your system. Also, you could try running the server in valgrind to see if there's memory errors elsewhere. That may pinpoint the problem better.

Was PVFS configured with optimizations (--enable-fast), or without (-- enable-strict)? And did you specify CFLAGS=-g when running configure? For debugging environments I usually run configure like this:

CFLAGS=-g ./configure --enable-strict

That will enable the most debugging, and hopefully get better backtraces.

-sam

On Jul 15, 2009, at 7:47 AM, Randall Martin wrote:

I occasionally get a server crash in what appears to be src/io/flow/ flowproto-bmi-trove/flowproto-multiqueue.c. The backtrace is useless. I’m running off the head branch code that I compiled on 7/3.

[E 07/14 18:06] PVFS2 server: signal 11, faulty address is (nil), from (nil)
[E 07/14 18:06] [bt] [(nil)]
[D 07/15 08:19] PVFS2 Server version 2.8.1pre1-2009-07-03-123548 starting.

I added a few extra gossip_err statements in the handle_io_error routine and narrowed it down to the following few lines:

        else if (src == TROVE_ENDPOINT && dest == BMI_ENDPOINT)
        {
ret = cancel_pending_trove(&flow_data->src_list, flow_data->parent->src.u.trove.coll_id);
            flow_data->cleanup_pending_count += ret;

Any ideas?

Thanks,
Randy
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to