ABoris, as Gilles says - first do som elower level checkouts of your
Infiniband network.
I suggest running:
ibdiagnet
ibhosts
and then as Gilles says 'ibstat' on each node



On 14 July 2017 at 03:58, Gilles Gouaillardet <gil...@rist.or.jp> wrote:

> Boris,
>
>
> Open MPI should automatically detect the infiniband hardware, and use
> openib (and *not* tcp) for inter node communications
>
> and a shared memory optimized btl (e.g. sm or vader) for intra node
> communications.
>
>
> note if you "-mca btl openib,self", you tell Open MPI to use the openib
> btl between any tasks,
>
> including tasks running on the same node (which is less efficient than
> using sm or vader)
>
>
> at first, i suggest you make sure infiniband is up and running on all your
> nodes.
>
> (just run ibstat, at least one port should be listed, state should be
> Active, and all nodes should have the same SM lid)
>
>
> then try to run two tasks on two nodes.
>
>
> if this does not work, you can
>
> mpirun --mca btl_base_verbose 100 ...
>
> and post the logs so we can investigate from there.
>
>
> Cheers,
>
>
> Gilles
>
>
>
> On 7/14/2017 6:43 AM, Boris M. Vulovic wrote:
>
>>
>> I would like to know how to invoke InfiniBand hardware on CentOS 6x
>> cluster with OpenMPI (static libs.) for running my C++ code. This is how I
>> compile and run:
>>
>> /usr/local/open-mpi/1.10.7/bin/mpic++ -L/usr/local/open-mpi/1.10.7/lib
>> -Bstatic main.cpp -o DoWork
>>
>> usr/local/open-mpi/1.10.7/bin/mpiexec -mca btl tcp,self --hostfile
>> hostfile5 -host node01,node02,node03,node04,node05 -n 200 DoWork
>>
>> Here, "*-mca btl tcp,self*" reveals that *TCP* is used, and the cluster
>> has InfiniBand.
>>
>> What should be changed in compiling and running commands for InfiniBand
>> to be invoked? If I just replace "*-mca btl tcp,self*" with "*-mca btl
>> openib,self*" then I get plenty of errors with relevant one saying:
>>
>> /At least one pair of MPI processes are unable to reach each other for
>> MPI communications. This means that no Open MPI device has indicated that
>> it can be used to communicate between these processes. This is an error;
>> Open MPI requires that all MPI processes be able to reach each other. This
>> error can sometimes be the result of forgetting to specify the "self" BTL./
>>
>> Thanks very much!!!
>>
>>
>> *Boris *
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to