On Thu, 2005-07-07 at 15:27, Roland Dreier wrote:
>     Hal> 1. NETDEV WATCHDOG: ib0: transmit timed out ib0: transmit
>     Hal> timeout: latency 360052
> 
>     Hal> This occurs once a minute on heavy pings.
> 
> Exactly once a minute, or is this just a rough frequency?  What is a
> heavy ping?  Does the IPoIB interface still work when you see this?

I should have said they are exactly once per second (rather than
1/minute).

Jun 19 19:34:01 mo1 kernel: NETDEV WATCHDOG: ib0: transmit timed out
Jun 19 19:34:01 mo1 kernel: ib0: transmit timeout: latency 3879
Jun 19 19:34:02 mo1 kernel: NETDEV WATCHDOG: ib0: transmit timed out
Jun 19 19:34:02 mo1 kernel: ib0: transmit timeout: latency 4879
Jun 19 19:34:03 mo1 kernel: NETDEV WATCHDOG: ib0: transmit timed out
Jun 19 19:34:03 mo1 kernel: ib0: transmit timeout: latency 5879
...

A "heavy ping" is twenty concurrent flood pings.
During these warnings the ipoib interface does seem to work.

> Is there more traceback that shows who is doing the allocation that
> fails?  In any case it looks like you are just running low on memory.
> 
>     Hal> Any idea on what could be causing these or how to go about
>     Hal> isolating them ?
> 
> The __alloc_pages() warnings look fairly benign and can probably be
> fixed by tuning /proc/sys/vm/min_free_kbytes appropriately.
> 
> The TX timeout is somewhat odd.  I guess we need to figure out why the
> netdevice queue is stopped for a long time.

How can that be determined ?

Also, is there missing code on handling ipoib timeouts ?

-- Hal

_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to