Re: [OMPI users] Help Getting Started with Open MPI and PMIx and UCX

2019-01-18 Thread Matt Thompson
On Fri, Jan 18, 2019 at 1:13 PM Jeff Squyres (jsquyres) via users < users@lists.open-mpi.org> wrote: > On Jan 18, 2019, at 12:43 PM, Matt Thompson wrote: > > > > With some help, I managed to build an Open MPI 4.0.0 with: > > We can discuss each of these params to let you know what they are. > >

Re: [OMPI users] Help Getting Started with Open MPI and PMIx and UCX

2019-01-18 Thread Jeff Squyres (jsquyres) via users
On Jan 18, 2019, at 12:43 PM, Matt Thompson wrote: > > With some help, I managed to build an Open MPI 4.0.0 with: We can discuss each of these params to let you know what they are. > ./configure --disable-wrapper-rpath --disable-wrapper-runpath Did you have a reason for disabling these?

Re: [OMPI users] Help Getting Started with Open MPI and PMIx and UCX

2019-01-18 Thread Matt Thompson
All, With some help, I managed to build an Open MPI 4.0.0 with: ./configure --disable-wrapper-rpath --disable-wrapper-runpath --with-psm2 --with-slurm --enable-mpi1-compatibility --with-ucx --with-pmix=/usr/nlocal/pmix/2.1 --with-libevent=/usr CC=icc CXX=icpc FC=ifort The MPI 1 is because I

Re: [OMPI users] Help Getting Started with Open MPI and PMIx and UCX

2019-01-18 Thread Cabral, Matias A
Hi Matt, Few comments/questions: - If your cluster has Omni-Path, you won’t need UCX. Instead you can run using PSM2, or alternatively OFI (a.k.a. Libfabric) - With the command you shared below (4 ranks on the local node) (I think) a shared mem transport is being selected

[OMPI users] Fwd: pmix and srun

2019-01-18 Thread Michael Di Domenico
i compilied pmix slurm openmpi ---pmix ./configure --prefix=/hpc/pmix/2.2 --with-munge=/hpc/munge/0.5.13 --disable-debug ---slurm ./configure --prefix=/hpc/slurm/18.08 --with-munge=/hpc/munge/0.5.13 --with-pmix=/hpc/pmix/2.2 ---openmpi ./configure --prefix=/hpc/ompi/3.1 --with-hwloc=external

Re: [OMPI users] Fwd: Minimum time between MPI_Bcast or MPI_Reduce calls?

2019-01-18 Thread Jeff Wentworth via users
Hi, Thanks for the quick response. But it looks like I am missing something because neither -mca nor --mca is being recognized by my mpirun command. % mpirun --mca coll_sync_priority 100 --mca coll_sync_barrier_after 10 -q -np 2 a.out

Re: [OMPI users] Fwd: Minimum time between MPI_Bcast or MPI_Reduce calls?

2019-01-18 Thread Gilles Gouaillardet
Jeff, that could be a copy/paste error and/or an email client issue. The syntax is mpirun --mca variable value ... (short hyphen, short hyphen, m, c, a) The error message is about the missing —-mca executable (long hyphen, short hyphen, m, c, a) This is most likely the root cause of this

[OMPI users] Minimum time between MPI_Bcast or MPI_Reduce calls?

2019-01-18 Thread Jeff Wentworth via users
Greetings everyone, I have a scientific code using Open MPI (v3.1.3) that seems to work fine when MPI_Bcast() and MPI_Reduce() calls are well spaced out in time. Yet if the time between these calls is short, eventually one of the nodes hangs at some random point, never returning from the

[OMPI users] Fwd: Minimum time between MPI_Bcast or MPI_Reduce calls?

2019-01-18 Thread Nathan Hjelm via users
Since neither bcast nor reduce acts as a barrier it is possible to run out of resources if either of these calls (or both) are used in a tight loop. The sync coll component exists for this scenario. You can enable it by adding the following to mpirun (or setting these variables through the

Re: [OMPI users] Fwd: pmix and srun

2019-01-18 Thread Ralph H Castain
Aha - I found it. It’s a typo in the v2.2.1 release. Sadly, our Slurm plugin folks seem to be off somewhere for awhile and haven’t been testing it. Sigh. I’ll patch the branch and let you know - we’d appreciate the feedback. Ralph > On Jan 18, 2019, at 2:09 PM, Michael Di Domenico > wrote: >

Re: [OMPI users] pmix and srun

2019-01-18 Thread Michael Di Domenico
seems to be better now. jobs are running On Fri, Jan 18, 2019 at 6:17 PM Ralph H Castain wrote: > > I have pushed a fix to the v2.2 branch - could you please confirm it? > > > > On Jan 18, 2019, at 2:23 PM, Ralph H Castain wrote: > > > > Aha - I found it. It’s a typo in the v2.2.1 release.

Re: [OMPI users] Fwd: pmix and srun

2019-01-18 Thread Ralph H Castain
Looks strange. I’m pretty sure Mellanox didn’t implement the event notification system in the Slurm plugin, but you should only be trying to call it if OMPI is registering a system-level event code - which OMPI 3.1 definitely doesn’t do. If you are using PMIx v2.2.0, then please note that there

Re: [OMPI users] pmix and srun

2019-01-18 Thread Ralph H Castain
I have pushed a fix to the v2.2 branch - could you please confirm it? > On Jan 18, 2019, at 2:23 PM, Ralph H Castain wrote: > > Aha - I found it. It’s a typo in the v2.2.1 release. Sadly, our Slurm plugin > folks seem to be off somewhere for awhile and haven’t been testing it. Sigh. > > I’ll

Re: [OMPI users] pmix and srun

2019-01-18 Thread Ralph H Castain
Good - thanks! > On Jan 18, 2019, at 3:25 PM, Michael Di Domenico > wrote: > > seems to be better now. jobs are running > > On Fri, Jan 18, 2019 at 6:17 PM Ralph H Castain wrote: >> >> I have pushed a fix to the v2.2 branch - could you please confirm it? >> >> >>> On Jan 18, 2019, at

Re: [OMPI users] Fwd: pmix and srun

2019-01-18 Thread Michael Di Domenico
here's the branches i'm using. i did a git clone on the repo's and then a git checkout [ec2-user@labhead bin]$ cd /hpc/src/pmix/ [ec2-user@labhead pmix]$ git branch master * v2.2 [ec2-user@labhead pmix]$ cd ../slurm/ [ec2-user@labhead slurm]$ git branch * (detached from origin/slurm-18.08)