On Thu, 2005-07-07 at 15:27, Roland Dreier wrote: > Hal> 1. NETDEV WATCHDOG: ib0: transmit timed out ib0: transmit > Hal> timeout: latency 360052 > > Hal> This occurs once a minute on heavy pings. > > Exactly once a minute, or is this just a rough frequency? What is a > heavy ping? Does the IPoIB interface still work when you see this?
I should have said they are exactly once per second (rather than 1/minute). Jun 19 19:34:01 mo1 kernel: NETDEV WATCHDOG: ib0: transmit timed out Jun 19 19:34:01 mo1 kernel: ib0: transmit timeout: latency 3879 Jun 19 19:34:02 mo1 kernel: NETDEV WATCHDOG: ib0: transmit timed out Jun 19 19:34:02 mo1 kernel: ib0: transmit timeout: latency 4879 Jun 19 19:34:03 mo1 kernel: NETDEV WATCHDOG: ib0: transmit timed out Jun 19 19:34:03 mo1 kernel: ib0: transmit timeout: latency 5879 ... A "heavy ping" is twenty concurrent flood pings. During these warnings the ipoib interface does seem to work. > Is there more traceback that shows who is doing the allocation that > fails? In any case it looks like you are just running low on memory. > > Hal> Any idea on what could be causing these or how to go about > Hal> isolating them ? > > The __alloc_pages() warnings look fairly benign and can probably be > fixed by tuning /proc/sys/vm/min_free_kbytes appropriately. > > The TX timeout is somewhat odd. I guess we need to figure out why the > netdevice queue is stopped for a long time. How can that be determined ? Also, is there missing code on handling ipoib timeouts ? -- Hal _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
