Michael,
this is odd, I will have a look.
Can you confirm you are running on a single node ?
At first, you need to understand which component is used by Open MPI for
communications.
There are several options here, and since I do not know how Open MPI was
built, nor which dependencies are installed,
I can only list a few
- pml/cm uses mtl/psm2 => omnipath is used for both inter and intra node
communications
- pml/cm uses mtl/ofi => libfabric is used for both inter and intra node
communications. it definitely uses libpsm2 for inter node
communications, and I do not know enough about the internals to tell how
inter communications are handled
- pml/ob1 is used, I guess it uses btl/ofi for inter node communications
and btl/vader for intra node communications (in that case the NIC device
is not used for intra node communications
there could be other I am missing (does UCX support OmniPath ? could
btl/ofi also be used for intra node communications ?)
mpirun --mca pml_base_verbose 10 --mca btl_base_verbose 10 --mca
mtl_base_verbose 10 ...
should tell you what is used (feel free to compress and post the full
output if you have some hard time understanding the logs)
Cheers,
Gilles
On 3/12/2019 1:41 AM, Michael Di Domenico wrote:
On Mon, Mar 11, 2019 at 12:09 PM Gilles Gouaillardet
<gilles.gouaillar...@gmail.com> wrote:
You can force
mpirun --mca pml ob1 ...
And btl/vader (shared memory) will be used for intra node communications ...
unless MPI tasks are from different jobs (read MPI_Comm_spawn())
if i run
mpirun -n 16 IMB-MPI1 alltoallv
things run fine, 12us on average for all ranks
if i run
mpirun -n 16 --mca pml ob1 IMB-MPI1 alltoallv
the program runs, but then it hangs at "List of benchmarks to run:
#Alltoallv" and no tests run
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users