FWIW, I've run the server in gdb to see what is causing the seg fault when
a client tries to connect remotely:

> [D 08/28 19:40] bmi_mx: CONN_REQ from mx://begbie:0:0.
> [D 08/28 19:40] bmi_mx: bmx_unexpected_recv rx match= 0xc000000100000100
length= 16.
> [D 08/28 19:40] bmi_mx: bmx_handle_conn_req returned RX match
0xc000000100000100 with Success.
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff06c0910 (LWP 3026)]
> bmx_handle_conn_req () at src/io/bmi/bmi_mx/mx.c:2403
> 2403                            } else if (sid != peer->mxp_sid) { /*
reconnecting peer */
> (gdb) bt
> #0  bmx_handle_conn_req () at src/io/bmi/bmi_mx/mx.c:2403
> #1  bmx_connection_handlers () at src/io/bmi/bmi_mx/mx.c:2561
> #2  0x0000000000476102 in BMI_mx_testunexpected (incount=196610,
outcount=0xb8f31b2a007fc118, ui=0x7ffff06bfe38,
max_idle_time=-261357980)
>     at src/io/bmi/bmi_mx/mx.c:2820
> #3  0x00000000004549b2 in BMI_testunexpected (incount=<value optimised
out>, outcount=<value optimised out>, info_array=<value optimised out>,
max_idle_time_ms=0)
>     at src/io/bmi/bmi.c:1000
> #4  0x000000000044d5c0 in bmi_thread_function (ptr=<value optimised
out>) > at src/io/job/thread-mgr.c:182
> #5  0x00007ffff7292a04 in start_thread () from /lib/libpthread.so.0
> #6  0x00007ffff6bcdd4d in clone () from /lib/libc.so.6
> #7  0x0000000000000000 in ?? ()


Line 2403 of mx.c is:

> } else if (sid != peer->mxp_sid) { /* reconnecting peer */

So I examined those variables in gdb:

> (gdb) print sid
> $1 = 3102939946
> (gdb) print peer
> $2 = (struct bmx_peer *) 0x5fb7569100030002
> (gdb) print peer->mxp_sid
> Cannot access memory at address 0x5fb756910003002a

I looked back to where peer was getting set, to lines 2358 to 2362:

>                         {
>                                 void *peerp = &peer;
>                                 mx_get_endpoint_addr_context(status.source,
&peerp);
>                                 peer = (struct bmx_peer *) peerp;
>                         }

I checked status source, which seems to be ok:
> (gdb) print status.source
> $3 = {stuff = {8415496, 13327025589530378520}}

I set a breakpoint at line 2361 and ran it again:

>Breakpoint 1, bmx_handle_conn_req () at src/io/bmi/bmi_mx/mx.c:2361
>2361                                    peer = (struct bmx_peer *) peerp;
>(gdb) print peerp
> $2 = (void *) 0xcfa4b15e00030001
> (gdb) step
> 2363                            if (peer == NULL) { /* new peer */
> (gdb) print peer
> $3 = (struct bmx_peer *) 0x0
> (gdb) print peer->mxp_sid
> Cannot access memory at address 0x28

It seems that mx_get_endpoint_addr_context() is potentially not returning
the expected structure?

Josh.



_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to