On Jul 31, 2008, at 12:42 PM, Rafael Folco wrote:
Thanks for the response, Pasha. Yes, I agree this is some issue with
the
IB network. I came to the list looking for some previous experience of
other users... I wonder why 10.2.1.90 works with all other nodes,
10.2.1.50 works with all other nod
Thanks for the response, Pasha. Yes, I agree this is some issue with the
IB network. I came to the list looking for some previous experience of
other users... I wonder why 10.2.1.90 works with all other nodes,
10.2.1.50 works with all other nodes as well, but they can't work
together. Maybe OFED li
The "RETRY EXCEEDED ERROR" error is related to IB and not MTT.
The error says that IB failed to send IB packet from
machine 10.2.1.90 to 10.2.1.50
You need to run your IB network monitoring tool and found the issue.
Usually it is some bad cable in IB fabric that causes such errors.
Regards,
P
Hi,
I need some help, please.
I'm running a set of MTT tests on my cluster and I have issues in a
particular node.
[0,1,7][btl_openib_component.c:1332:btl_openib_component_progress] from
10.2.1.90 to: 10.2.1.50 error polling HP CQ with status RETRY EXCEEDED
ERROR status number 12 for wr_id 268