I had recently problems running ompi master in our Omnipath cluster, 3.0 and 3.1 work however without problems. After some digging, I found that I have to set the environment variable PSM2_MULTI_EP for master to work at all for us. Not sure whether this is intended or an inadvertent consequence of something else, but without setting this environment variable all jobs abort with the following error message:
--snip- -------------------------------------------------------------------------- PSM2 was unable to open an endpoint. Please make sure that the network link is active on the node and the hardware is functioning. Error: Exceeded supported amount of endpoints --snip- As a side note, the ofi mtl component on master also fails on the same cluster, with a different error message though, namely, --snip- mtl_ofi.h:318: fi_tinjectddata failed: Function not implemented(-38) --snip- Looks like a version mismatch that maybe the configure script should catch. Anyway, just wanted to bring this up as a datapoint. Thanks Edgar -- Edgar Gabriel Associate Professor Department of Computer Science University of Houston Philip G. Hoffman Hall, Room 228 Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel