Michael,

this is odd, I will have a look.

Can you confirm you are running on a single node ?


At first, you need to understand which component is used by Open MPI for communications.

There are several options here, and since I do not know how Open MPI was built, nor which dependencies are installed,

I can only list a few


- pml/cm uses mtl/psm2 => omnipath is used for both inter and intra node communications

- pml/cm uses mtl/ofi => libfabric is used for both inter and intra node communications. it definitely uses libpsm2 for inter node communications, and I do not know enough about the internals to tell how inter communications are handled

- pml/ob1 is used, I guess it uses btl/ofi for inter node communications and btl/vader for intra node communications (in that case the NIC device is not used for intra node communications

there could be other I am missing (does UCX support OmniPath ? could btl/ofi also be used for intra node communications ?)


mpirun --mca pml_base_verbose 10 --mca btl_base_verbose 10 --mca mtl_base_verbose 10 ...

should tell you what is used (feel free to compress and post the full output if you have some hard time understanding the logs)


Cheers,


Gilles

On 3/12/2019 1:41 AM, Michael Di Domenico wrote:
On Mon, Mar 11, 2019 at 12:09 PM Gilles Gouaillardet
<gilles.gouaillar...@gmail.com> wrote:
You can force
mpirun --mca pml ob1 ...
And btl/vader (shared memory) will be used for intra node communications ... 
unless MPI tasks are from different jobs (read MPI_Comm_spawn())
if i run

mpirun -n 16 IMB-MPI1 alltoallv
things run fine, 12us on average for all ranks

if i run

mpirun -n 16 --mca pml ob1 IMB-MPI1 alltoallv
the program runs, but then it hangs at "List of benchmarks to run:
#Alltoallv"  and no tests run
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to