[EMAIL PROTECTED] wrote on Fri, 29 Dec 2006 14:43 -0500:
> [EMAIL PROTECTED] wrote on Fri, 29 Dec 2006 11:58 -0600:
> > [E 11:54:12.041386] Warning: exchange_data: partial read, 1/8 bytes.
> > [D 11:54:12.041441] ib_close_connection: closing connection to 
> > 10.1.4.57:34756.
> > [E 11:54:12.042161] SIGSEGV: skipping cleanup; exit now!
> > 
> > I realize this may also be a hardware issue, but I'd like to see the 
> > server not barf when clients fail to connect..
> > I also tried commenting out those checks, yes its a hardware problem on 
> > the client side for now, oddly enough though, netpipe over ib works 
> > fine, so do all of my standard IB tests.
> 
> Is this a multi-arch setup again?  Perhaps I'll send you a patch
> in a jiffy guessing that that may be the issue.

Try this to see if the exchange_data part is related, although it
won't do anything about the insufficient WR issue.

                -- Pete

Index: src/io/bmi/bmi_ib/openib.c
===================================================================
RCS file: /projects/cvsroot/pvfs2/src/io/bmi/bmi_ib/openib.c,v
retrieving revision 1.10
diff -u -p -r1.10 openib.c
--- src/io/bmi/bmi_ib/openib.c  7 Dec 2006 21:47:47 -0000       1.10
+++ src/io/bmi/bmi_ib/openib.c  29 Dec 2006 19:47:02 -0000
@@ -99,13 +99,15 @@ static int openib_new_connection(ib_conn
     int num_wr;
     size_t len;
     struct ibv_qp_init_attr att;
+
     /*
      * Values passed through TCP to permit IB connection.  These
      * are transformed to appear in network byte order (big endian)
-     * on the network.
+     * on the network.  The lid is pushed up to 32 bits to avoid struct
+     * alignment issues.
      */
     struct {
-       uint16_t lid;
+       uint32_t lid;
        uint32_t qp_num;
     } ch_in, ch_out;
 
@@ -182,14 +184,14 @@ static int openib_new_connection(ib_conn
              att.cap.max_send_wr);
 
     /* exchange data, converting info to network order and back */
-    ch_out.lid = htobmi16(od->nic_lid);
+    ch_out.lid = htobmi32(od->nic_lid);
     ch_out.qp_num = htobmi32(oc->qp->qp_num);
 
     ret = exchange_data(sock, is_server, &ch_in, &ch_out, sizeof(ch_in));
     if (ret)
        goto out;
 
-    oc->remote_lid = bmitoh16(ch_in.lid);
+    oc->remote_lid = bmitoh32(ch_in.lid);
     oc->remote_qp_num = bmitoh32(ch_in.qp_num);
 
     /* bring the two QPs up to RTR */
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to