Re: [OMPI users] local rank to rank comms
unfortunately it takes a while to export the data, but here's what i see On Mon, Mar 11, 2019 at 11:02 PM Gilles Gouaillardet wrote: > > Michael, > > > this is odd, I will have a look. > > Can you confirm you are running on a single node ? > > > At first, you need to understand which component is used by Open MPI for > communications. > > There are several options here, and since I do not know how Open MPI was > built, nor which dependencies are installed, > > I can only list a few > > > - pml/cm uses mtl/psm2 => omnipath is used for both inter and intra node > communications > > - pml/cm uses mtl/ofi => libfabric is used for both inter and intra node > communications. it definitely uses libpsm2 for inter node > communications, and I do not know enough about the internals to tell how > inter communications are handled > > - pml/ob1 is used, I guess it uses btl/ofi for inter node communications > and btl/vader for intra node communications (in that case the NIC device > is not used for intra node communications > > there could be other I am missing (does UCX support OmniPath ? could > btl/ofi also be used for intra node communications ?) > > > mpirun --mca pml_base_verbose 10 --mca btl_base_verbose 10 --mca > mtl_base_verbose 10 ... > > should tell you what is used (feel free to compress and post the full > output if you have some hard time understanding the logs) > > > Cheers, > > > Gilles > > On 3/12/2019 1:41 AM, Michael Di Domenico wrote: > > On Mon, Mar 11, 2019 at 12:09 PM Gilles Gouaillardet > > wrote: > >> You can force > >> mpirun --mca pml ob1 ... > >> And btl/vader (shared memory) will be used for intra node communications > >> ... unless MPI tasks are from different jobs (read MPI_Comm_spawn()) > > if i run > > > > mpirun -n 16 IMB-MPI1 alltoallv > > things run fine, 12us on average for all ranks > > > > if i run > > > > mpirun -n 16 --mca pml ob1 IMB-MPI1 alltoallv > > the program runs, but then it hangs at "List of benchmarks to run: > > #Alltoallv" and no tests run > > ___ > > users mailing list > > users@lists.open-mpi.org > > https://lists.open-mpi.org/mailman/listinfo/users > > > ___ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users ompi.run.ob1 Description: Binary data ompi.run.cm Description: Binary data ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] local rank to rank comms
Michael, this is odd, I will have a look. Can you confirm you are running on a single node ? At first, you need to understand which component is used by Open MPI for communications. There are several options here, and since I do not know how Open MPI was built, nor which dependencies are installed, I can only list a few - pml/cm uses mtl/psm2 => omnipath is used for both inter and intra node communications - pml/cm uses mtl/ofi => libfabric is used for both inter and intra node communications. it definitely uses libpsm2 for inter node communications, and I do not know enough about the internals to tell how inter communications are handled - pml/ob1 is used, I guess it uses btl/ofi for inter node communications and btl/vader for intra node communications (in that case the NIC device is not used for intra node communications there could be other I am missing (does UCX support OmniPath ? could btl/ofi also be used for intra node communications ?) mpirun --mca pml_base_verbose 10 --mca btl_base_verbose 10 --mca mtl_base_verbose 10 ... should tell you what is used (feel free to compress and post the full output if you have some hard time understanding the logs) Cheers, Gilles On 3/12/2019 1:41 AM, Michael Di Domenico wrote: On Mon, Mar 11, 2019 at 12:09 PM Gilles Gouaillardet wrote: You can force mpirun --mca pml ob1 ... And btl/vader (shared memory) will be used for intra node communications ... unless MPI tasks are from different jobs (read MPI_Comm_spawn()) if i run mpirun -n 16 IMB-MPI1 alltoallv things run fine, 12us on average for all ranks if i run mpirun -n 16 --mca pml ob1 IMB-MPI1 alltoallv the program runs, but then it hangs at "List of benchmarks to run: #Alltoallv" and no tests run ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] local rank to rank comms
On Mon, Mar 11, 2019 at 12:09 PM Gilles Gouaillardet wrote: > You can force > mpirun --mca pml ob1 ... > And btl/vader (shared memory) will be used for intra node communications ... > unless MPI tasks are from different jobs (read MPI_Comm_spawn()) if i run mpirun -n 16 IMB-MPI1 alltoallv things run fine, 12us on average for all ranks if i run mpirun -n 16 --mca pml ob1 IMB-MPI1 alltoallv the program runs, but then it hangs at "List of benchmarks to run: #Alltoallv" and no tests run ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] local rank to rank comms
On Mon, Mar 11, 2019 at 12:19 PM Ralph H Castain wrote: > OFI uses libpsm2 underneath it when omnipath detected > > > On Mar 11, 2019, at 9:06 AM, Gilles Gouaillardet > > wrote: > > It might show that pml/cm and mtl/psm2 are used. In that case, then yes, > > the OmniPath library is used even for intra node communications. If this > > library is optimized for intra node, then it will internally uses shared > > memory instead of the NIC. would it be fair to assume that, if we assume the opa library is optimized for intra-node using shared memory, there shouldn't be much of a difference between the opa library and the ompi library for local rank to rank comms is there a way or tool to measure that? i'd like to run the tests toggling opa vs ompi libraries and see if or really how much a difference there is ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] local rank to rank comms
OFI uses libpsm2 underneath it when omnipath detected Sent from my iPhone > On Mar 11, 2019, at 9:06 AM, Gilles Gouaillardet > wrote: > > Michael, > > You can > > mpirun --mca pml_base_verbose 10 --mca btl_base_verbose 10 --mca > mtl_base_verbose 10 ... > > It might show that pml/cm and mtl/psm2 are used. In that case, then yes, the > OmniPath library is used even for intra node communications. If this library > is optimized for intra node, then it will internally uses shared memory > instead of the NIC. > > > You can force > > mpirun --mca pml ob1 ... > > > And btl/vader (shared memory) will be used for intra node communications ... > unless MPI tasks are from different jobs (read MPI_Comm_spawn()) > > Cheers, > > Gilles > > Michael Di Domenico wrote: >> i have a user that's claiming when two ranks on the same node want to >> talk with each other, they're using the NIC to talk rather then just >> talking directly. >> >> i've never had to test such a scenario. is there a way for me to >> prove one way or another whether two ranks are talking through say the >> kernel (or however it actually works) or using the nic? >> >> i didn't set any flags when i compiled openmpi to change this. >> >> i'm running ompi 3.1, pmix 2.2.1, and slurm 18.05 running atop omnipath >> ___ >> users mailing list >> users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/users > ___ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] local rank to rank comms
Michael, You can mpirun --mca pml_base_verbose 10 --mca btl_base_verbose 10 --mca mtl_base_verbose 10 ... It might show that pml/cm and mtl/psm2 are used. In that case, then yes, the OmniPath library is used even for intra node communications. If this library is optimized for intra node, then it will internally uses shared memory instead of the NIC. You can force mpirun --mca pml ob1 ... And btl/vader (shared memory) will be used for intra node communications ... unless MPI tasks are from different jobs (read MPI_Comm_spawn()) Cheers, Gilles Michael Di Domenico wrote: >i have a user that's claiming when two ranks on the same node want to >talk with each other, they're using the NIC to talk rather then just >talking directly. > >i've never had to test such a scenario. is there a way for me to >prove one way or another whether two ranks are talking through say the >kernel (or however it actually works) or using the nic? > >i didn't set any flags when i compiled openmpi to change this. > >i'm running ompi 3.1, pmix 2.2.1, and slurm 18.05 running atop omnipath >___ >users mailing list >users@lists.open-mpi.org >https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] local rank to rank comms
On Mon, Mar 11, 2019 at 11:51 AM Ralph H Castain wrote: > You are probably using the ofi mtl - could be psm2 uses loopback method? according to ompi_info i do in fact have mtl's ofi,psm,psm2. i haven't changed any of the defaults, so are you saying order to change the behaviour i have to run mpirun --mca mtl psm2? if true, what's the recourse to not using the ofi mtl? ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] local rank to rank comms
You are probably using the ofi mtl - could be psm2 uses loopback method? Sent from my iPhone > On Mar 11, 2019, at 8:40 AM, Michael Di Domenico > wrote: > > i have a user that's claiming when two ranks on the same node want to > talk with each other, they're using the NIC to talk rather then just > talking directly. > > i've never had to test such a scenario. is there a way for me to > prove one way or another whether two ranks are talking through say the > kernel (or however it actually works) or using the nic? > > i didn't set any flags when i compiled openmpi to change this. > > i'm running ompi 3.1, pmix 2.2.1, and slurm 18.05 running atop omnipath > ___ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
[OMPI users] local rank to rank comms
i have a user that's claiming when two ranks on the same node want to talk with each other, they're using the NIC to talk rather then just talking directly. i've never had to test such a scenario. is there a way for me to prove one way or another whether two ranks are talking through say the kernel (or however it actually works) or using the nic? i didn't set any flags when i compiled openmpi to change this. i'm running ompi 3.1, pmix 2.2.1, and slurm 18.05 running atop omnipath ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users