On Sun, 15 Oct 2006, Roland Dreier wrote: > .... > However, this is a somewhat weird symptom, although I can imagine that > out-of-order doorbells cause extra completions or something like that, > which causes IPoIB to overrun the send queue. > > Adding the mmiowb()s definitely fixes things? >
At least with the workload that we used to reproduce this bug, yes. (The workload was simply 2 ttcp processes, each placed on a different node of an Altix.) Without the mmiowb()s things would hang very reliably and very quickly (within a second). With the additional mmiowb() calls I never observed a problem after 10's of minutes. > > Signed-off-by: <[EMAIL PROTECTED]> > > Should this be > > Signed-off-by: Arthur Kepner <[EMAIL PROTECTED]> > That's correct. Thanks. > actually? (I just looked through the kernel git log to guess your name) > > > @@ -1730,6 +1732,9 @@ out: > > mthca_write64(doorbell, > > dev->kar + MTHCA_SEND_DOORBELL, > > MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock)); > > + /* use mmiowb to ensure write to doorbell is ordered > > + * before releasing spinlock */ > > + mmiowb(); > > } > > > > qp->sq.next_ind = ind; > > Any reason why this mmiowb() is placed slightly differently from the > others (which are right before the spin_unlock)? > I wanted to put it in the "if (likely(nreq))" block so that we don't do the mmiowb() unless it's really necessary. A very minor optimization (but a co-worker reports that it does produce a measurable, but small performance improvement.) -- Arthur _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
