ABoris, as Gilles says - first do som elower level checkouts of your Infiniband network. I suggest running: ibdiagnet ibhosts and then as Gilles says 'ibstat' on each node
On 14 July 2017 at 03:58, Gilles Gouaillardet <gil...@rist.or.jp> wrote: > Boris, > > > Open MPI should automatically detect the infiniband hardware, and use > openib (and *not* tcp) for inter node communications > > and a shared memory optimized btl (e.g. sm or vader) for intra node > communications. > > > note if you "-mca btl openib,self", you tell Open MPI to use the openib > btl between any tasks, > > including tasks running on the same node (which is less efficient than > using sm or vader) > > > at first, i suggest you make sure infiniband is up and running on all your > nodes. > > (just run ibstat, at least one port should be listed, state should be > Active, and all nodes should have the same SM lid) > > > then try to run two tasks on two nodes. > > > if this does not work, you can > > mpirun --mca btl_base_verbose 100 ... > > and post the logs so we can investigate from there. > > > Cheers, > > > Gilles > > > > On 7/14/2017 6:43 AM, Boris M. Vulovic wrote: > >> >> I would like to know how to invoke InfiniBand hardware on CentOS 6x >> cluster with OpenMPI (static libs.) for running my C++ code. This is how I >> compile and run: >> >> /usr/local/open-mpi/1.10.7/bin/mpic++ -L/usr/local/open-mpi/1.10.7/lib >> -Bstatic main.cpp -o DoWork >> >> usr/local/open-mpi/1.10.7/bin/mpiexec -mca btl tcp,self --hostfile >> hostfile5 -host node01,node02,node03,node04,node05 -n 200 DoWork >> >> Here, "*-mca btl tcp,self*" reveals that *TCP* is used, and the cluster >> has InfiniBand. >> >> What should be changed in compiling and running commands for InfiniBand >> to be invoked? If I just replace "*-mca btl tcp,self*" with "*-mca btl >> openib,self*" then I get plenty of errors with relevant one saying: >> >> /At least one pair of MPI processes are unable to reach each other for >> MPI communications. This means that no Open MPI device has indicated that >> it can be used to communicate between these processes. This is an error; >> Open MPI requires that all MPI processes be able to reach each other. This >> error can sometimes be the result of forgetting to specify the "self" BTL./ >> >> Thanks very much!!! >> >> >> *Boris * >> >> >> >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >> > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users