Matt Thompson reported an issue over the summer that prevented one from using OpenMPI when built with the PGI compilers and Mellanox OFED (At least the 3.x series). The thread is here https://www.mail-archive.com/users@lists.open-mpi.org/msg29698.html. Information about how PGI is reacting with libibverbs is in a thread here (https://www.pgroup.com/userforum/viewtopic.php?t=5249).

For those googling for a solution the error message you get at runtime is this:

[borgr138][[16866,1],38][btl_openib_component.c:1648:init_one_device] error obtaining device attributes for mlx5_0 errno says Success [borgr137][[16866,1],4][btl_openib_component.c:1648:init_one_device] error obtaining device attributes for mlx5_0 errno says Success [borgr137][[16866,1],14][btl_openib_component.c:1648:init_one_device] error obtaining device attributes for mlx5_0 errno says Success

To get this combination to work I tried disabling experimental verbs through various mechanisms which I can't recall anymore but none of them worked.

The solution I came up with, in case anyone else runs into this problem, is this:

1. Build OpenMPI with PGI (e.g. CC=pgcc CXX=pgc++ F77=pgf77 FC=pgf90 ./configure --with-verbs=/usr && make)
2. cd into opal/mca/btl/openib in the source/build directory
3. make clean
4. make CC="gcc -std=gnu99"
5. make install

It's not elegant but it appears to work.

-Aaron

--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to