Re: [OMPI users] Linkage problem

2018-04-05 Thread Quentin Faure
I solved my problem. I uninstalled all the mpi software that were on the computer reinstalled openmpi. It was still not working so I uninstalled it again and reinstalled it again and it is working now. Apparently there was a problem with the installation. Thanks for the help. Quentin > On 4

Re: [OMPI users] disabling libraries?

2018-04-05 Thread Gilles Gouaillardet
Michael, in this case, you can mpirun --mca oob ^ud ... in order to blacklist the oob/ud component. an alternative is to add oob = ^ud in /.../etc/openmpi-mca-params.conf If Open MPI is installed on a local filesystem, then this setting can be node specific. That being said, the error suggest

Re: [OMPI users] mpi send/recv pair hangin

2018-04-05 Thread Gilles Gouaillardet
Noam, you might also want to try mpirun --mca btl tcp,self ... to rule out btl (shared memory and/or infiniband) related issues. Once you rebuild Open MPI with --enable-debug, I recommend you first check the arguments of the MPI_Send() and MPI_Recv() functions and make sure - same

Re: [OMPI users] Eager RDMA causing slow osu_bibw with 3.0.0

2018-04-05 Thread Ben Menadue
Hi Nathan, Howard, Thanks for the feedback. Yes, we do already have UCX compiled in to our OpenMPI installations, but it’s disabled by default on our system because some users were reporting problems with it previously. But I’m not sure what the status of these are with OpenMPI 3.0, something

Re: [OMPI users] mpi send/recv pair hangin

2018-04-05 Thread George Bosilca
Yes, you can do this by adding --enable-debug to OMPI configure (and make sure your don't have the configure flag --with-platform=optimize). George. On Thu, Apr 5, 2018 at 4:20 PM, Noam Bernstein wrote: > > On Apr 5, 2018, at 4:11 PM, George Bosilca

Re: [OMPI users] mpi send/recv pair hangin

2018-04-05 Thread Noam Bernstein
> On Apr 5, 2018, at 4:11 PM, George Bosilca wrote: > > I attach with gdb on the processes and do a "call mca_pml_ob1_dump(comm, 1)". > This allows the debugger to make a call our function, and output internal > information about the library status. Great. But I guess I

Re: [OMPI users] mpi send/recv pair hangin

2018-04-05 Thread George Bosilca
I attach with gdb on the processes and do a "call mca_pml_ob1_dump(comm, 1)". This allows the debugger to make a call our function, and output internal information about the library status. George. On Thu, Apr 5, 2018 at 4:03 PM, Noam Bernstein wrote: > On Apr

Re: [OMPI users] mpi send/recv pair hangin

2018-04-05 Thread Noam Bernstein
> On Apr 5, 2018, at 3:55 PM, George Bosilca wrote: > > Noam, > > The OB1 provide a mechanism to dump all pending communications in a > particular communicator. To do this I usually call mca_pml_ob1_dump(comm, 1), > with comm being the MPI_Comm and 1 being the verbose

Re: [OMPI users] mpi send/recv pair hangin

2018-04-05 Thread George Bosilca
Noam, The OB1 provide a mechanism to dump all pending communications in a particular communicator. To do this I usually call mca_pml_ob1_dump(comm, 1), with comm being the MPI_Comm and 1 being the verbose mode. I have no idea how you can find the pointer to the communicator out of your code, but

[OMPI users] disabling libraries?

2018-04-05 Thread Michael Di Domenico
i'm trying to compile openmpi to support all of our interconnects, psm/openib/mxm/etc this works fine, openmpi finds all the libs, compiles and runs on each of the respective machines however, we don't install the libraries for everything everywhere so when i run things like ompi_info and

Re: [OMPI users] Eager RDMA causing slow osu_bibw with 3.0.0

2018-04-05 Thread Nathan Hjelm
Honestly, this is a configuration issue with the openib btl. There is no reason to keep either eager RDMA nor is there a reason to pipeline RDMA. I haven't found an app where either of these "features" helps you with infiniband. You have the right idea with the parameter changes but Howard is

Re: [OMPI users] mpi send/recv pair hangin

2018-04-05 Thread Edgar Gabriel
is the file I/O that you mentioned using MPI I/O for that? If yes, what file system are you writing to? Edgar On 4/5/2018 10:15 AM, Noam Bernstein wrote: On Apr 5, 2018, at 11:03 AM, Reuti wrote: Hi, Am 05.04.2018 um 16:16 schrieb Noam Bernstein

Re: [OMPI users] Eager RDMA causing slow osu_bibw with 3.0.0

2018-04-05 Thread Howard Pritchard
Hello Ben, Thanks for the info. You would probably be better off installing UCX on your cluster and rebuilding your Open MPI with the --with-ucx configure option. Here's what I'm seeing with Open MPI 3.0.1 on a ConnectX5 based cluster using ob1/openib BTL: mpirun -map-by ppr:1:node -np 2

Re: [OMPI users] mpi send/recv pair hangin

2018-04-05 Thread Noam Bernstein
> On Apr 5, 2018, at 11:32 AM, Edgar Gabriel wrote: > > is the file I/O that you mentioned using MPI I/O for that? If yes, what file > system are you writing to? No MPI I/O. Just MPI calls to gather the data, and plain Fortran I/O on the head node only. I should

Re: [OMPI users] mpi send/recv pair hangin

2018-04-05 Thread Noam Bernstein
> On Apr 5, 2018, at 11:03 AM, Reuti wrote: > > Hi, > >> Am 05.04.2018 um 16:16 schrieb Noam Bernstein : >> >> Hi all - I have a code that uses MPI (vasp), and it’s hanging in a strange >> way. Basically, there’s a Cartesian

Re: [OMPI users] mpi send/recv pair hangin

2018-04-05 Thread Reuti
Hi, > Am 05.04.2018 um 16:16 schrieb Noam Bernstein : > > Hi all - I have a code that uses MPI (vasp), and it’s hanging in a strange > way. Basically, there’s a Cartesian communicator, 4x16 (64 processes total), > and despite the fact that the communication

[OMPI users] mpi send/recv pair hangin

2018-04-05 Thread Noam Bernstein
Hi all - I have a code that uses MPI (vasp), and it’s hanging in a strange way. Basically, there’s a Cartesian communicator, 4x16 (64 processes total), and despite the fact that the communication pattern is rather regular, one particular send/recv pair hangs consistently. Basically, across

Re: [OMPI users] Eager RDMA causing slow osu_bibw with 3.0.0

2018-04-05 Thread Ben Menadue
Hi, Another interesting point. I noticed that the last two message sizes tested (2MB and 4MB) are lower than expected for both osu_bw and osu_bibw. Increasing the minimum size to use the RDMA pipeline to above these sizes brings those two data-points up to scratch for both benchmarks: 3.0.0,

[OMPI users] Eager RDMA causing slow osu_bibw with 3.0.0

2018-04-05 Thread Ben Menadue
Hi, We’ve just been running some OSU benchmarks with OpenMPI 3.0.0 and noticed that osu_bibw gives nowhere near the bandwidth I’d expect (this is on FDR IB). However, osu_bw is fine. If I disable eager RDMA, then osu_bibw gives the expected numbers. Similarly, if I increase the number of