I thought this looked eerily familiar.
thanks for the patch Pete!

> [EMAIL PROTECTED] wrote on Tue, 30 Jan 2007 16:16 -0600:
>> Working off of release 2.6.2, I found a reproducible segfault in
>> ib_close_connection
>> by doing `pvfs2-ls` (ppc64-openib)  hardware appears to be functioning
>> properly, and have reproduced on both eHCA and Mellanox cards. I'm doing
>> netpipe over ib right now.
>>
>> heres the backtrace:
>>
>>
>> [E 15:53:25.692442] Warning: exchange_data: partial read, 1/4 bytes.
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 4398046676096 (LWP 25430)]
>> 0x00000000100b441c in ib_close_connection (c=0x10129790)
>>    at src/io/bmi/bmi_ib/ib.c:1613
>> 1613        ibmap = c->remote_map->method_data;
>> (gdb) bt
>> #0  0x00000000100b441c in ib_close_connection (c=0x10129790)
>>    at src/io/bmi/bmi_ib/ib.c:1613
>> #1  0x00000000100b41f4 in ib_new_connection (sock=10,
>>    peername=0xfffffcc4e9c "da5:3336", is_server=0)
>>    at src/io/bmi/bmi_ib/ib.c:1583
>
> Thanks.  This was fixed in head on 29 dec.  The 2.6 branch is pretty
> old as far as IB goes.  I don't have enough discipline to separate
> the "fixes" from the "new development" required to maintain a
> branch.  Here's a bit of a diff.  The numbers are probably off as I
> cut out just the relevant bits.
>
> Note that your setup still won't work.  The server side must not be
> running the same version, or is somehow different.  This crash
> happens after the client realizes it is not getting a good answer
> from the server (hence your warning).
>
> Also know that the head has a bunch of nice improvements that should
> make it perform better too.
>
>               -- Pete
>
>
> --- src/io/bmi/bmi_ib/ib.c    2007-01-17 12:09:49.000000000 -0500
> +++ ../pvfs2/src/io/bmi/bmi_ib/ib.c   2007-01-21 15:56:26.000000000 -0500
> @@ -1593,8 +1662,6 @@
>   */
>  static void ib_close_connection(ib_connection_t *c)
>  {
> -    ib_method_addr_t *ibmap;
> -
>      debug(2, "%s: closing connection to %s", __func__, c->peername);
>      c->closed = 1;
>      if (c->refcnt != 0) {
> @@ -1610,8 +1677,10 @@
>      free(c->eager_recv_buf_head_contig);
>      /* never free the remote map, for the life of the executable, just
>       * mark it unconnected since BMI will always have this structure. */
> -    ibmap = c->remote_map->method_data;
> -    ibmap->c = NULL;
> +    if (c->remote_map) {
> +     ib_method_addr_t *ibmap = c->remote_map->method_data;
> +     ibmap->c = NULL;
> +    }
>      free(c->peername);
>      qlist_del(&c->list);
>      free(c);
> @@ -1792,8 +1792,7 @@ static int ib_tcp_server_check_new_conne
>         c = ib_new_connection(s, peername, 1);
>         if (!c) {
>             free(hostname);
> -           close(s);
> -           return 0;
> +           goto out_unlock;
>         }
>
>         c->remote_map = ib_alloc_method_addr(c, hostname, port);
> @@ -1804,12 +1803,12 @@ static int ib_tcp_server_check_new_conne
>
>         debug(2, "%s: accepted new connection %s at server", __func__,
>           c->peername);
> +       ret = 1;
>
> +out_unlock:
>         gen_mutex_unlock(&interface_mutex);
> -
>         if (close(s) < 0)
>             error_errno("%s: close new sock", __func__);
> -       ret = 1;
>      }
>      return ret;
>  }
>
>
> !DSPAM:45bfd1bd117328992556831!
>
>


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to