Sayantan> I am getting a segmentation fault after a couple of
    Sayantan> thousand messages are sent over SRQ (using ping-pong
    Sayantan> latency test). Here is a snippet from the core
    Sayantan> generated.

Is it possible that you are posting one more receive to the SRQ than
the max capacity you requested when creating the SRQ?

What happens with the patch below applied to libmthca?

Thanks,
  Roland


--- libmthca/src/srq.c  (revision 3664)
+++ libmthca/src/srq.c  (working copy)
@@ -110,6 +110,13 @@ int mthca_tavor_post_srq_recv(struct ibv
 
                wqe       = get_wqe(srq, ind);
                next_ind  = *wqe_to_link(wqe);
+
+               if (next_ind < 0) {
+                       err = -1;
+                       *bad_wr = wr;
+                       break;
+               }
+
                prev_wqe  = srq->last;
                srq->last = wqe;
 
@@ -197,6 +204,12 @@ int mthca_arbel_post_srq_recv(struct ibv
                wqe       = get_wqe(srq, ind);
                next_ind  = *wqe_to_link(wqe);
 
+               if (next_ind < 0) {
+                       err = -1;
+                       *bad_wr = wr;
+                       break;
+               }
+
                ((struct mthca_next_seg *) wqe)->nda_op =
                        htonl((next_ind << srq->wqe_shift) | 1);
                ((struct mthca_next_seg *) wqe)->ee_nds = 0;
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to