Helen> Not in realtime.  My observations were made after the fact.
    Helen> I supose I can launch another test and watch the cunter in
    Helen> realtime if you believe that is necessary?

That might be interesting.

Assuming the HCA continues to work fine, and IPoIB recovers, the only
theory I can come up is that something is causing interrupts to be
held off for a long time, so the IPoIB driver doesn't get to see sends
completing.  But I don't know what such a workload might be.  Perhaps
something else you're running (Lustre?, iSCSI?) holds a lock for a
long time and causes the timeout.  But it's not clear to me why the TX
watchdog would get to run if the interrupt handler doesn't get to run.

 - R.
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to