[ewg] RE: [ofa-general] [PATCH] ib_send_bw -b can hang due to too few CQ entries

2009-08-06 Thread Sean Hefty
-  ctx-cq = ibv_create_cq(ctx-context, ctx-rx_depth, NULL, ctx-channel,
0);
+  ctx-cq = ibv_create_cq(ctx-context, ctx-tx_depth + ctx-rx_depth,
+  NULL, ctx-channel, 0);

I'm looking at a windows port of this test, but at least there, rx_depth is set
to rx_depth + tx_depth.

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: [ofa-general] [PATCH] ib_send_bw -b can hang due to too few CQ entries

2009-08-06 Thread Ralph Campbell
On Thu, 2009-08-06 at 14:37 -0700, Sean Hefty wrote:
 -ctx-cq = ibv_create_cq(ctx-context, ctx-rx_depth, NULL, ctx-channel,
 0);
 +ctx-cq = ibv_create_cq(ctx-context, ctx-tx_depth + ctx-rx_depth,
 +NULL, ctx-channel, 0);
 
 I'm looking at a windows port of this test, but at least there, rx_depth is 
 set
 to rx_depth + tx_depth.

Sure. Just above the call to ibv_create_cq(), ctx-rx_depth is set to
ctx-rx_depth = rx_depth + tx_depth
but the rest of the code does ibv_post_send() and ibv_post_recv()
based on ctx-tx_depth and ctx-rx_depth which means the CQ needs
to be ctx-tx_depth + ctx-rx_depth big.

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: [ofa-general] [PATCH] ib_send_bw -b can hang due to too few CQ entries

2009-08-06 Thread Sean Hefty
Sure. Just above the call to ibv_create_cq(), ctx-rx_depth is set to
   ctx-rx_depth = rx_depth + tx_depth
but the rest of the code does ibv_post_send() and ibv_post_recv()
based on ctx-tx_depth and ctx-rx_depth which means the CQ needs
to be ctx-tx_depth + ctx-rx_depth big.

If the tx_depth is the same on both sides, why would there ever be more than the
initial tx_depth and rx_depth completions on the CQ?  How many receive
completions can there be on the CQ, and what throttles the sender? 

- Sean

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: [ofa-general] [PATCH] ib_send_bw -b can hang due to too few CQ entries

2009-08-06 Thread Ralph Campbell
On Thu, 2009-08-06 at 14:56 -0700, Sean Hefty wrote:
 Sure. Just above the call to ibv_create_cq(), ctx-rx_depth is set to
  ctx-rx_depth = rx_depth + tx_depth
 but the rest of the code does ibv_post_send() and ibv_post_recv()
 based on ctx-tx_depth and ctx-rx_depth which means the CQ needs
 to be ctx-tx_depth + ctx-rx_depth big.
 
 If the tx_depth is the same on both sides, why would there ever be more than 
 the
 initial tx_depth and rx_depth completions on the CQ?  How many receive
 completions can there be on the CQ, and what throttles the sender? 
 
 - Sean

Remember that this fix only affects the bi-directional test.
Both client and sever are going to post ctx-rx_depth receives
and ctx-tx_depth sends and then check for completions.
It won't post more sends or receives until the completions are
seen.

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: [ofa-general] [PATCH] ib_send_bw -b can hang due to too few CQ entries

2009-08-06 Thread Sean Hefty
Remember that this fix only affects the bi-directional test.
Both client and sever are going to post ctx-rx_depth receives
and ctx-tx_depth sends and then check for completions.
It won't post more sends or receives until the completions are
seen.

Okay - I think I understand what's happening.

The maximum number of outstanding sends is limited to tx_depth / 2.  After
posting that many sends, the code waits for completions.  Once some sends
complete, additional sends may be posted, up to the iteration count.  There's
nothing that coordinates posting the sends with completing receives on the
remote side.  (This is what I was missing.)  Eventually, all posted receives
could be complete and generate CQ entries.  The send side is basically throttled
by RNR NACKs.

Now I don't understand the purpose behind doubling the rx_depth...

- Sean

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg