[EMAIL PROTECTED] wrote on Fri, 29 Dec 2006 14:43 -0500:
> [EMAIL PROTECTED] wrote on Fri, 29 Dec 2006 11:58 -0600:
> > [E 11:54:12.041386] Warning: exchange_data: partial read, 1/8 bytes.
> > [D 11:54:12.041441] ib_close_connection: closing connection to
> > 10.1.4.57:34756.
> > [E 11:54:12.042161] SIGSEGV: skipping cleanup; exit now!
> >
> > I realize this may also be a hardware issue, but I'd like to see the
> > server not barf when clients fail to connect..
> > I also tried commenting out those checks, yes its a hardware problem on
> > the client side for now, oddly enough though, netpipe over ib works
> > fine, so do all of my standard IB tests.
>
> Is this a multi-arch setup again? Perhaps I'll send you a patch
> in a jiffy guessing that that may be the issue.
Try this to see if the exchange_data part is related, although it
won't do anything about the insufficient WR issue.
-- Pete
Index: src/io/bmi/bmi_ib/openib.c
===================================================================
RCS file: /projects/cvsroot/pvfs2/src/io/bmi/bmi_ib/openib.c,v
retrieving revision 1.10
diff -u -p -r1.10 openib.c
--- src/io/bmi/bmi_ib/openib.c 7 Dec 2006 21:47:47 -0000 1.10
+++ src/io/bmi/bmi_ib/openib.c 29 Dec 2006 19:47:02 -0000
@@ -99,13 +99,15 @@ static int openib_new_connection(ib_conn
int num_wr;
size_t len;
struct ibv_qp_init_attr att;
+
/*
* Values passed through TCP to permit IB connection. These
* are transformed to appear in network byte order (big endian)
- * on the network.
+ * on the network. The lid is pushed up to 32 bits to avoid struct
+ * alignment issues.
*/
struct {
- uint16_t lid;
+ uint32_t lid;
uint32_t qp_num;
} ch_in, ch_out;
@@ -182,14 +184,14 @@ static int openib_new_connection(ib_conn
att.cap.max_send_wr);
/* exchange data, converting info to network order and back */
- ch_out.lid = htobmi16(od->nic_lid);
+ ch_out.lid = htobmi32(od->nic_lid);
ch_out.qp_num = htobmi32(oc->qp->qp_num);
ret = exchange_data(sock, is_server, &ch_in, &ch_out, sizeof(ch_in));
if (ret)
goto out;
- oc->remote_lid = bmitoh16(ch_in.lid);
+ oc->remote_lid = bmitoh32(ch_in.lid);
oc->remote_qp_num = bmitoh32(ch_in.qp_num);
/* bring the two QPs up to RTR */
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers