Re: [OMPI users] Can't start jobs with srun.

2020-04-27 Thread Riebs, Andy via users
Lost a line… Also helpful to check $ srun -N3 which ompi_info From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Riebs, Andy via users Sent: Monday, April 27, 2020 10:59 AM To: Open MPI Users Cc: Riebs, Andy Subject: Re: [OMPI users] Can't start jobs with srun. Y’know, a

Re: [OMPI users] Can't start jobs with srun.

2020-04-27 Thread Riebs, Andy via users
Y’know, a quick check on versions and PATHs might be a good idea here. I suggest something like $ srun -N3 ompi_info |& grep "MPI repo" to confirm that all nodes are running the same version of OMPI. From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Prentice Bisbal via

Re: [OMPI users] [External] Re: Can't start jobs with srun.

2020-04-27 Thread Prentice Bisbal via users
Ralph, PMI2 support works just fine. It's just PMIx that seems to be the problem. We rebuilt Slurm with PMIx 3.1.5, but the problem persists. I've opened a ticket with Slurm support to see if it's a problem on Slurm's end. Prentice On 4/26/20 2:12 PM, Ralph Castain via users wrote: It is

Re: [OMPI users] [External] RE: Re: Can't start jobs with srun.

2020-04-27 Thread Prentice Bisbal via users
Yes. "srun -N3 hostname" works. The problem only seems to occur when I specify the --mpi option, so the problem seems related to PMI. On 4/24/20 2:28 PM, Riebs, Andy wrote: Prentice, have you tried something trivial, like "srun -N3 hostname", to rule out non-OMPI problems? Andy

[OMPI users] Handle Ctrl+C in subprocesses

2020-04-27 Thread Jérémie Wenger via users
Hi, I recently installed open mpi (4.0.3) using the procedure described here , as I'm trying to use Horovod for multiple gpu acceleration. I am looking for a way to handle a keyboard interrupt (save a deep learning model before shutting everything

Re: [OMPI users] RMA in openmpi

2020-04-27 Thread Joseph Schuchart via users
Hi Claire, You cannot use MPI_Get (or any other RMA communication routine) on a window for which no access epoch has been started. MPI_Win_fence starts an active target access epoch, MPI_Win_lock[_all] start a passive target access epoch. Window locks are synchronizing in the sense that they

Re: [OMPI users] RMA in openmpi

2020-04-27 Thread Claire Cashmore via users
Hi Joseph Thank you for your reply. From what I had been reading I thought they were both called "synchronization calls" just that one was passive (lock) and one was active (fence), sorry if I've got confused! So I'm asking do need either MPI_Win_fence or MPI_Win_unlock/lock in order to use

Re: [OMPI users] RMA in openmpi

2020-04-27 Thread Joseph Schuchart via users
Claire, > Is it possible to use the one-sided communication without combining it with synchronization calls? What exactly do you mean by "synchronization calls"? MPI_Win_fence is indeed synchronizing (basically flush+barrier) but MPI_Win_lock (and the passive target synchronization