> Apparently this is because writes to the doorbells from
 > different CPUs are clobbering one another. The following
 > patch adds mmiowb() calls after doorbell rings to ensure
 > the doorbell register updates are ordered.

Makes sense.  I was wondering if there would be any problems like this
after John's message...

 > We discovered a problem when running IPoIB applications on
 > multiple CPUs on an Altix system. Many messages such as:
 > 
 > ib_mthca 0002:01:00.0: SQ 000014 full (19941644 head, 19941707 tail, 64 max, 
 > 0 nreq)
 > 
 > appear in syslog, and the driver wedges up.

However, this is a somewhat weird symptom, although I can imagine that
out-of-order doorbells cause extra completions or something like that,
which causes IPoIB to overrun the send queue.

Adding the mmiowb()s definitely fixes things?

 > Signed-off-by: <[EMAIL PROTECTED]>

Should this be

Signed-off-by: Arthur Kepner <[EMAIL PROTECTED]>

actually?  (I just looked through the kernel git log to guess your name)

 > @@ -1730,6 +1732,9 @@ out:
 >              mthca_write64(doorbell,
 >                            dev->kar + MTHCA_SEND_DOORBELL,
 >                            MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock));
 > +            /* use mmiowb to ensure write to doorbell is ordered 
 > +             * before releasing spinlock */
 > +            mmiowb();
 >      }
 > 
 >      qp->sq.next_ind = ind;

Any reason why this mmiowb() is placed slightly differently from the
others (which are right before the spin_unlock)?

Thanks,
  Roland

_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to