Helen> Not in realtime. My observations were made after the fact.
Helen> I supose I can launch another test and watch the cunter in
Helen> realtime if you believe that is necessary?
That might be interesting.
Assuming the HCA continues to work fine, and IPoIB recovers, the only
theory I can come up is that something is causing interrupts to be
held off for a long time, so the IPoIB driver doesn't get to see sends
completing. But I don't know what such a workload might be. Perhaps
something else you're running (Lustre?, iSCSI?) holds a lock for a
long time and causes the timeout. But it's not clear to me why the TX
watchdog would get to run if the interrupt handler doesn't get to run.
- R.
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general