Sayantan> I am getting a segmentation fault after a couple of
Sayantan> thousand messages are sent over SRQ (using ping-pong
Sayantan> latency test). Here is a snippet from the core
Sayantan> generated.
Is it possible that you are posting one more receive to the SRQ than
the max capacity you requested when creating the SRQ?
What happens with the patch below applied to libmthca?
Thanks,
Roland
--- libmthca/src/srq.c (revision 3664)
+++ libmthca/src/srq.c (working copy)
@@ -110,6 +110,13 @@ int mthca_tavor_post_srq_recv(struct ibv
wqe = get_wqe(srq, ind);
next_ind = *wqe_to_link(wqe);
+
+ if (next_ind < 0) {
+ err = -1;
+ *bad_wr = wr;
+ break;
+ }
+
prev_wqe = srq->last;
srq->last = wqe;
@@ -197,6 +204,12 @@ int mthca_arbel_post_srq_recv(struct ibv
wqe = get_wqe(srq, ind);
next_ind = *wqe_to_link(wqe);
+ if (next_ind < 0) {
+ err = -1;
+ *bad_wr = wr;
+ break;
+ }
+
((struct mthca_next_seg *) wqe)->nda_op =
htonl((next_ind << srq->wqe_shift) | 1);
((struct mthca_next_seg *) wqe)->ee_nds = 0;
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general