[slurm-dev] PMIx at SC'17

2017-11-03 Thread r...@open-mpi.org
My apologies for the shameless promotion, but for those interested, there will be a PMIx BoF meeting this year at SC’17 on Thursday, November 16, 2017, at 12:15pm: http://sc17.supercomputing.org/presentation/?id=bof104&sess=sess308

[slurm-dev] Re: Selecting a network interface with srun

2017-10-25 Thread r...@open-mpi.org
Good points. I would also caution against renaming nodes using interfaces. This frequently causes failure of 3rd party software packages that compare the return value of “hostname” to the list of allocated nodes for optimization or placement purposes - e.g., mpirun! A quick grep of the mailing l

[slurm-dev] Re: Selecting a network interface with srun

2017-10-24 Thread r...@open-mpi.org
“ibface” isn’t an OpenMPI cmd line option, so I suspect you are using something other than OpenMPI. For OMPI, you could specify the interface via MCA param in the environment or default MCA parameter file. Most MPI implementations have a similar mechanism - you might check your documentation.

[slurm-dev] Re: Slurm 17.02.7 and PMIx

2017-10-09 Thread r...@open-mpi.org
> On Oct 9, 2017, at 5:32 PM, Christopher Samuel wrote: > > > On 05/10/17 11:27, Christopher Samuel wrote: > >> PMIX v1.2.2: Slurm complains and tells me it wants v2. > > I think that was due to a config issue on the system I was helping out > with, after having to install some extra package

[slurm-dev] Re: Tasks distribution

2017-10-09 Thread r...@open-mpi.org
Just to clarify something here: OMPI 1.8 does not support PMIx. You need at least OMPI 2.0 for that. > On Oct 9, 2017, at 4:11 AM, Sysadmin CAOS wrote: > > After a lot of changes, I have recompiled all. I have executed these steps: > First of all, I have compiled contrib "pmi" package allocate

[slurm-dev] Re: srun vs mpirun

2017-10-06 Thread r...@open-mpi.org
Not stupid at all. I suspect the problem is that OMPI was not configured --with-pmi=. As a result, when you srun the application, each processes thinks it is a singleton and nothing works correctly. OMPI does not pickup the slurm pmi support by default due to license issues, so you have to ma

[slurm-dev] Re: Setting up Environment Modules package

2017-10-05 Thread r...@open-mpi.org
> > On Oct 5, 2017, at 12:08 AM, Ole Holm Nielsen > wrote: > > > On 10/04/2017 06:11 PM, Mike Cammilleri wrote: >> I'm in search of a best practice for setting up Environment Modules for our >> Slurm 16.05.6 installation (we have not had the time to upgrade t

[slurm-dev] Re: openmpi, slurm and pmix

2017-08-30 Thread r...@open-mpi.org
SLURM currently supports PMIx v1.2, which is what you’d find in the OMPI v2.x series. As long as you stay within that OMPI release series, you should be fine as the internal OMPI library will match what you used for SLURM. I’m afraid that OMPI will always use its internal version unless you expl

[slurm-dev] Re: multiple MPI versions with slurm

2017-07-21 Thread r...@open-mpi.org
If you are using a recent (as in v16.05 or greater) version of SLURM, then you can also build it with the PMIx support and use that for OpenMPI - it will launch faster > On Jul 21, 2017, at 7:12 AM, Paul Edmon wrote: > > > I would build MPI using the pmi libraries and slurm with pmi support.

[slurm-dev] Re: slurm + openmpi + suspend problem

2017-07-19 Thread r...@open-mpi.org
> > > On Tue, Jul 18, 2017 at 1:07 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> > mailto:r...@open-mpi.org>> wrote: > Okay, I tracked it down and have a fix pending for OMPI master: > https://github.com/open-mpi/ompi/pull/3930 > <https://github.com/open-mpi

[slurm-dev] Re: slurm + openmpi + suspend problem

2017-07-18 Thread r...@open-mpi.org
elease managers. > On Jul 18, 2017, at 7:33 AM, r...@open-mpi.org wrote: > > Just looking at it today... > >> On Jul 18, 2017, at 7:25 AM, Eugene Dedits > <mailto:eugene.ded...@gmail.com>> wrote: >> >> Hi Ralph, >> >> >> di

[slurm-dev] Re: slurm + openmpi + suspend problem

2017-07-18 Thread r...@open-mpi.org
t xhpl processes are stopped. Resuming > them with -CONT also works. > > Again, this is with OpenMPI 1.8.3 > > Once again, thank you for all the help. > > Cheers, > Eugene. > > > > >> On Jul 11, 2017, at 12:08 PM, r...@open-mpi.org <mailto:r...@o

[slurm-dev] Re: slurm + openmpi + suspend problem

2017-07-11 Thread r...@open-mpi.org
observer that there were 16 mpi processes > running > at 100% on all 10 nodes where the job was started. > > Thanks, > Eugene. > > > > > > >> On Jul 11, 2017, at 10:35 AM, r...@open-mpi.org wrote: >> >> >> Odd - I'm on trav

[slurm-dev] Re: slurm + openmpi + suspend problem

2017-07-11 Thread r...@open-mpi.org
ried 3.0.0rc1 and problems still persists there… > > Thanks, > E. > > > >> On Jul 11, 2017, at 10:20 AM, r...@open-mpi.org wrote: >> >> >> Just checked the planning board and saw that my PR to bring that change to >> 2.1.2 is pending and not

[slurm-dev] Re: slurm + openmpi + suspend problem

2017-07-11 Thread r...@open-mpi.org
Okay, it has been committed so you can grab a tarball tomorrow if you like. Sent from my iPad > On Jul 11, 2017, at 9:20 AM, "r...@open-mpi.org" wrote: > > Just checked the planning board and saw that my PR to bring that change to > 2.1.2 is pending and not yet in th

[slurm-dev] Re: slurm + openmpi + suspend problem

2017-07-11 Thread r...@open-mpi.org
Just checked the planning board and saw that my PR to bring that change to 2.1.2 is pending and not yet in the release branch. I’ll try to make that happen soon Sent from my iPad > On Jul 11, 2017, at 8:03 AM, "r...@open-mpi.org" wrote: > > > There is an mca param es

[slurm-dev] Re: slurm + openmpi + suspend problem

2017-07-11 Thread r...@open-mpi.org
Could you point me to some discussion of this? > > Thanks, > Eugene. > >> On Jul 11, 2017, at 6:17 AM, r...@open-mpi.org wrote: >> >> >> There is an issue with how the signal is forwarded. This has been fixed in >> the latest OMPI release so you might

[slurm-dev] Re: slurm + openmpi + suspend problem

2017-07-11 Thread r...@open-mpi.org
There is an issue with how the signal is forwarded. This has been fixed in the latest OMPI release so you might want to upgrade Ralph Sent from my iPad > On Jul 11, 2017, at 2:53 AM, Dennis Tants > wrote: > > > Hello Eugene, > > it is just a wild guess, but could you try "srun --mpi=pmi2

[slurm-dev] Re: Slurm with Torque

2017-04-16 Thread r...@open-mpi.org
Sure - all you have to do is pull some nodes out of the Torque configuration (so Torque doesn’t know they exist), and then install Slurm on those nodes (adding just those nodes to the slurm.conf file). > On Apr 16, 2017, at 7:12 AM, Mahmood Naderan > wrote: > > Hi

[slurm-dev] Re: Questions about Openmpi, PMI-1 and slurm.conf

2017-04-08 Thread r...@open-mpi.org
> On Apr 8, 2017, at 8:44 PM, Doug Meyer wrote: > > Running 15.x and have run into a next step that is probably us tripping over > our feet. > > Engineers were happy clams with SGE but it was time to move one. We have > adopted slurm and are moving users forward. So far, much joy. As we w

[slurm-dev] Re: strange srun problem

2017-01-23 Thread r...@open-mpi.org
Note that 16.05 contains support for PMIx, so if you are using OMPI 2.0 or above, you should ensure that the slurm PMIx support is configured “on” and use that for srun (I believe you have to tell srun the pmi version to use, so perhaps “srun -mpi=pmix”?) > On Jan 23, 2017, at 7:10 AM, TO_Web

[slurm-dev] Re: slurm-dev srun openmpi errors

2017-01-20 Thread r...@open-mpi.org
Alternatively, you could upgrade OMPI to something more recent - we don’t even support the 1.6 series any more. I’d upgrade to at least 1.10.5 > On Jan 20, 2017, at 6:26 AM, Vicker, Darby (JSC-EG311) > wrote: > > I would try using mpiexec or mpirun instead of srun to launch the job. Those >

[slurm-dev] Run maintenance job

2016-10-26 Thread r...@open-mpi.org
Hey folks This is likely a dumb question, so I appreciate your patience in advance. I need to schedule a job that takes a node down, flashes the firmware, and reboots it. I can obviously ask SLURM to allocate two nodes for me, and run my job script on the node I don’t intend to service. Howeve

[slurm-dev] Fwd: Supercomputing 2016: PMIx Birds-of-a-Feather meeting

2016-10-23 Thread r...@open-mpi.org
FYI: pardon the promotion, but this might be of interest to some. Begin forwarded message: From: "r...@open-mpi.org" Subject: Supercomputing 2016: Birds-of-a-Feather meeting Date: October 14, 2016 at 8:50:01 AM PDT To: pmix Hello all This year, we will again be hosting a Birds-of

[slurm-dev] Re: strange going-ons with OpenMPI and Infiniband

2016-08-25 Thread r...@open-mpi.org
Check your IB setup, Michael - you probably don’t have UD enabled on it > On Aug 25, 2016, at 11:42 AM, Michael Di Domenico > wrote: > > > although i see this with and without slurm, so there very well maybe > something wrong with my ompi compile > > On Thu, Aug 25, 2016 at 2:04 PM, Michael