[slurm-dev] Re: slurm + openmpi + suspend problem

2017-07-19 Thread Eugene Dedits
; > On Jul 18, 2017, at 7:25 AM, Eugene Dedits <eugene.ded...@gmail.com> > wrote: > > Hi Ralph, > > > did you have a chance to take a look at this problem? > > Thanks! > Eugene. > > > > > On Tue, Jul 11, 2017 at 12:51 PM, Eugene Dedits <eugene.ded..

[slurm-dev] Re: slurm + openmpi + suspend problem

2017-07-18 Thread Eugene Dedits
Hi Ralph, did you have a chance to take a look at this problem? Thanks! Eugene. On Tue, Jul 11, 2017 at 12:51 PM, Eugene Dedits <eugene.ded...@gmail.com> wrote: > Thanks! I really appreciate your help. > In a meantime I’ve tried experimenting with 1.8.3. Here is what I’ve >

[slurm-dev] Re: slurm + openmpi + suspend problem

2017-07-11 Thread Eugene Dedits
1.8.3 Once again, thank you for all the help. Cheers, Eugene. > On Jul 11, 2017, at 12:08 PM, r...@open-mpi.org wrote: > > Very odd - let me explore when I get back. Sorry for delay > > Sent from my iPad > > On Jul 11, 2017, at 10:59 AM, Eugene Dedits <

[slurm-dev] Re: slurm + openmpi + suspend problem

2017-07-11 Thread Eugene Dedits
1, 2017, at 10:35 AM, r...@open-mpi.org wrote: > > > Odd - I'm on travel this week but can look at it next week. One possibility - > have you tried hitting us with SIGTSTOP instead of SIGSTOP? Difference in > ability to trap and forward > > Sent from my iPad > >>

[slurm-dev] Re: slurm + openmpi + suspend problem

2017-07-11 Thread Eugene Dedits
I just looked at source code and see that it wasn't >> backported. Sigh. >> >> You could try the 3.0.0 branch as it is in release candidate and should go >> out within a week. I'd suggest just cloning that branch of the OMPI repo to >> get the latest state. T

[slurm-dev] Re: slurm + openmpi + suspend problem

2017-07-11 Thread Eugene Dedits
; it is just a wild guess, but could you try "srun --mpi=pmi2"(you said >> you built OMPI with pmi support) instead of "mpirun". >> srun is build-in and I think the preferred way of running parallel >> processes. Maybe scontrol is able to suspend it this way. >

[slurm-dev] Re: slurm + openmpi + suspend problem

2017-07-11 Thread Eugene Dedits
is build-in and I think the preferred way of running parallel > processes. Maybe scontrol is able to suspend it this way. > > Regards, > Dennis > > Am 10.07.2017 um 22:20 schrieb Eugene Dedits: >> Hello SLURM-DEV >> >> >> I have a problem with slurm, ope

[slurm-dev] slurm + openmpi + suspend problem

2017-07-10 Thread Eugene Dedits
Hello SLURM-DEV I have a problem with slurm, openmpi, and “scontrol suspend”. My setup is: 96-node cluster with IB, running rhel 6.8 slurm 17.02.1 openmpi 2.0.0 (built using Intel 2016 compiler) I am running some application (hpl in this particular case) using batch script similar to: