> Apparently this is because writes to the doorbells from > different CPUs are clobbering one another. The following > patch adds mmiowb() calls after doorbell rings to ensure > the doorbell register updates are ordered.
Makes sense. I was wondering if there would be any problems like this after John's message... > We discovered a problem when running IPoIB applications on > multiple CPUs on an Altix system. Many messages such as: > > ib_mthca 0002:01:00.0: SQ 000014 full (19941644 head, 19941707 tail, 64 max, > 0 nreq) > > appear in syslog, and the driver wedges up. However, this is a somewhat weird symptom, although I can imagine that out-of-order doorbells cause extra completions or something like that, which causes IPoIB to overrun the send queue. Adding the mmiowb()s definitely fixes things? > Signed-off-by: <[EMAIL PROTECTED]> Should this be Signed-off-by: Arthur Kepner <[EMAIL PROTECTED]> actually? (I just looked through the kernel git log to guess your name) > @@ -1730,6 +1732,9 @@ out: > mthca_write64(doorbell, > dev->kar + MTHCA_SEND_DOORBELL, > MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock)); > + /* use mmiowb to ensure write to doorbell is ordered > + * before releasing spinlock */ > + mmiowb(); > } > > qp->sq.next_ind = ind; Any reason why this mmiowb() is placed slightly differently from the others (which are right before the spin_unlock)? Thanks, Roland _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
