I thought this looked eerily familiar.
thanks for the patch Pete!
> [EMAIL PROTECTED] wrote on Tue, 30 Jan 2007 16:16 -0600:
>> Working off of release 2.6.2, I found a reproducible segfault in
>> ib_close_connection
>> by doing `pvfs2-ls` (ppc64-openib) hardware appears to be functioning
>> properly, and have reproduced on both eHCA and Mellanox cards. I'm doing
>> netpipe over ib right now.
>>
>> heres the backtrace:
>>
>>
>> [E 15:53:25.692442] Warning: exchange_data: partial read, 1/4 bytes.
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 4398046676096 (LWP 25430)]
>> 0x00000000100b441c in ib_close_connection (c=0x10129790)
>> at src/io/bmi/bmi_ib/ib.c:1613
>> 1613 ibmap = c->remote_map->method_data;
>> (gdb) bt
>> #0 0x00000000100b441c in ib_close_connection (c=0x10129790)
>> at src/io/bmi/bmi_ib/ib.c:1613
>> #1 0x00000000100b41f4 in ib_new_connection (sock=10,
>> peername=0xfffffcc4e9c "da5:3336", is_server=0)
>> at src/io/bmi/bmi_ib/ib.c:1583
>
> Thanks. This was fixed in head on 29 dec. The 2.6 branch is pretty
> old as far as IB goes. I don't have enough discipline to separate
> the "fixes" from the "new development" required to maintain a
> branch. Here's a bit of a diff. The numbers are probably off as I
> cut out just the relevant bits.
>
> Note that your setup still won't work. The server side must not be
> running the same version, or is somehow different. This crash
> happens after the client realizes it is not getting a good answer
> from the server (hence your warning).
>
> Also know that the head has a bunch of nice improvements that should
> make it perform better too.
>
> -- Pete
>
>
> --- src/io/bmi/bmi_ib/ib.c 2007-01-17 12:09:49.000000000 -0500
> +++ ../pvfs2/src/io/bmi/bmi_ib/ib.c 2007-01-21 15:56:26.000000000 -0500
> @@ -1593,8 +1662,6 @@
> */
> static void ib_close_connection(ib_connection_t *c)
> {
> - ib_method_addr_t *ibmap;
> -
> debug(2, "%s: closing connection to %s", __func__, c->peername);
> c->closed = 1;
> if (c->refcnt != 0) {
> @@ -1610,8 +1677,10 @@
> free(c->eager_recv_buf_head_contig);
> /* never free the remote map, for the life of the executable, just
> * mark it unconnected since BMI will always have this structure. */
> - ibmap = c->remote_map->method_data;
> - ibmap->c = NULL;
> + if (c->remote_map) {
> + ib_method_addr_t *ibmap = c->remote_map->method_data;
> + ibmap->c = NULL;
> + }
> free(c->peername);
> qlist_del(&c->list);
> free(c);
> @@ -1792,8 +1792,7 @@ static int ib_tcp_server_check_new_conne
> c = ib_new_connection(s, peername, 1);
> if (!c) {
> free(hostname);
> - close(s);
> - return 0;
> + goto out_unlock;
> }
>
> c->remote_map = ib_alloc_method_addr(c, hostname, port);
> @@ -1804,12 +1803,12 @@ static int ib_tcp_server_check_new_conne
>
> debug(2, "%s: accepted new connection %s at server", __func__,
> c->peername);
> + ret = 1;
>
> +out_unlock:
> gen_mutex_unlock(&interface_mutex);
> -
> if (close(s) < 0)
> error_errno("%s: close new sock", __func__);
> - ret = 1;
> }
> return ret;
> }
>
>
> !DSPAM:45bfd1bd117328992556831!
>
>
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers