FWIW, I've run the server in gdb to see what is causing the seg fault when
a client tries to connect remotely:
> [D 08/28 19:40] bmi_mx: CONN_REQ from mx://begbie:0:0.
> [D 08/28 19:40] bmi_mx: bmx_unexpected_recv rx match= 0xc000000100000100
length= 16.
> [D 08/28 19:40] bmi_mx: bmx_handle_conn_req returned RX match
0xc000000100000100 with Success.
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff06c0910 (LWP 3026)]
> bmx_handle_conn_req () at src/io/bmi/bmi_mx/mx.c:2403
> 2403 } else if (sid != peer->mxp_sid) { /*
reconnecting peer */
> (gdb) bt
> #0 bmx_handle_conn_req () at src/io/bmi/bmi_mx/mx.c:2403
> #1 bmx_connection_handlers () at src/io/bmi/bmi_mx/mx.c:2561
> #2 0x0000000000476102 in BMI_mx_testunexpected (incount=196610,
outcount=0xb8f31b2a007fc118, ui=0x7ffff06bfe38,
max_idle_time=-261357980)
> at src/io/bmi/bmi_mx/mx.c:2820
> #3 0x00000000004549b2 in BMI_testunexpected (incount=<value optimised
out>, outcount=<value optimised out>, info_array=<value optimised out>,
max_idle_time_ms=0)
> at src/io/bmi/bmi.c:1000
> #4 0x000000000044d5c0 in bmi_thread_function (ptr=<value optimised
out>) > at src/io/job/thread-mgr.c:182
> #5 0x00007ffff7292a04 in start_thread () from /lib/libpthread.so.0
> #6 0x00007ffff6bcdd4d in clone () from /lib/libc.so.6
> #7 0x0000000000000000 in ?? ()
Line 2403 of mx.c is:
> } else if (sid != peer->mxp_sid) { /* reconnecting peer */
So I examined those variables in gdb:
> (gdb) print sid
> $1 = 3102939946
> (gdb) print peer
> $2 = (struct bmx_peer *) 0x5fb7569100030002
> (gdb) print peer->mxp_sid
> Cannot access memory at address 0x5fb756910003002a
I looked back to where peer was getting set, to lines 2358 to 2362:
> {
> void *peerp = &peer;
> mx_get_endpoint_addr_context(status.source,
&peerp);
> peer = (struct bmx_peer *) peerp;
> }
I checked status source, which seems to be ok:
> (gdb) print status.source
> $3 = {stuff = {8415496, 13327025589530378520}}
I set a breakpoint at line 2361 and ran it again:
>Breakpoint 1, bmx_handle_conn_req () at src/io/bmi/bmi_mx/mx.c:2361
>2361 peer = (struct bmx_peer *) peerp;
>(gdb) print peerp
> $2 = (void *) 0xcfa4b15e00030001
> (gdb) step
> 2363 if (peer == NULL) { /* new peer */
> (gdb) print peer
> $3 = (struct bmx_peer *) 0x0
> (gdb) print peer->mxp_sid
> Cannot access memory at address 0x28
It seems that mx_get_endpoint_addr_context() is potentially not returning
the expected structure?
Josh.
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users