Helen> BTW, the state of the IPoIB network seemed fine after the
Helen> failed test, nd the mthca counters are moving up nicely.
Even on the server on3-ib?
Helen> Do you still think this is a crash of the HCA firmware?
Helen> Should I call Mellanox?
Not if IPoIB is working on the systems printing the TX time out
messages. However, if everything stops working on one of your
systems, then yes, an HCA crash is likely.
I'm still a unclear on what is happening. Do you see TX time
out messages on a particular server, but IPoIB and mthca counters
still work fine on that same server? Or is it just the rest of the
fabric that continues working?
Thanks,
Roland
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general