Helen> It doesn't seem like shrinking the TCP window had helped.
Helen> I captured the Dmesg log from Lustre server and associated
Helen> client reporting IOZONE error.
What is the state of the system after you start seeing the ib0
transmit time out messages? Does IPoIB work at all? Is the HCA
responsive at all -- for example what do you see if you do
cat /sys/class/infiniband/mthca0/ports/1/state
or
cat /sys/class/infiniband/mthca0/ports/1/counters/*
Helen> BTW, this problem is a moving target so it is hard to
Helen> believe that it is hardware related(?) BTW, I am using the
Helen> mellanox DDR switch and HCA.
Not sure what you mean by a moving target... the symptoms really look
like a crash of the HCA firmware to me.
Thanks,
Roland
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general