I had recently problems running ompi master in our Omnipath cluster, 3.0 and 
3.1 work however without problems. After some digging, I found that I have to 
set the environment variable   PSM2_MULTI_EP for master to work at all for us. 
Not sure whether this is intended or an inadvertent consequence of something 
else, but without setting this environment variable all jobs abort with the 
following error message:

--snip-
--------------------------------------------------------------------------
PSM2 was unable to open an endpoint. Please make sure that the network link is
active on the node and the hardware is functioning.

  Error: Exceeded supported amount of endpoints
--snip-

As a side note, the ofi mtl component on master also fails on the same cluster, 
with a different error message though, namely,

--snip-
mtl_ofi.h:318: fi_tinjectddata failed: Function not implemented(-38)
--snip-

Looks like a version mismatch that maybe the configure script should catch.
Anyway, just wanted to bring this up as a datapoint.

Thanks
Edgar

--
Edgar Gabriel
Associate Professor
Department of Computer Science
University of Houston
Philip G. Hoffman Hall, Room 228        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Reply via email to