2009/2/26 Brett Pemberton <br...@vpac.org>:
> [[1176,1],0][btl_openib_component.c:2905:handle_wc] from tango092.vpac.org
> to: tango090 error polling LP CQ with status RETRY EXCEEDED ERROR status
> number 12 for wr_id 38996224 opcode 0 qp_idx 0

What OS are you using?  I've seen this error and many other Infiniband
related errors on RedHat enterprise linux 4 update 4, with ConnectX
cards and various versions of OFED, up to version 1.3.  Depending on
the MCA parameters, I also see hangs often enough to make native
Infiniband unusable on this OS.

However, the openib btl works just fine on the same hardware and the
same OFED/OpenMPI stack when used with Centos 4.6.  I suspect there
may be something about the kernel that is contributing to these
problems, but I haven't had a chance to test the kernel from 4.6 on
4.4.

mch

Reply via email to