[slurm-dev] Re: slurm + openmpi + suspend problem

Eugene Dedits Tue, 18 Jul 2017 07:24:01 -0700

Hi Ralph,


did you have a chance to take a look at this problem?

Thanks!
Eugene.




On Tue, Jul 11, 2017 at 12:51 PM, Eugene Dedits <[email protected]>
wrote:

> Thanks! I really appreciate your help.
> In a meantime I’ve tried experimenting with 1.8.3. Here is what I’ve
> noticed.
>
> 1. Running the job with “sbatch ./my_script” where my script calls
> mpirun -np 160 -mca orte_forward_job_control 1 ./xhpl
>
> and then suspending the job with “scontrol suspend JOBID”
> does not work. Of 10 nodes assigned to my job 4 are still running
> 16 mpi threads of xhpl.
>
> 2. Running exactly the same job and then sending TSPT to mpirun process
> does work: all 10 nodes show that xhpl processes are stopped. Resuming
> them with -CONT also works.
>
> Again, this is with OpenMPI 1.8.3
>
> Once again, thank you for all the help.
>
> Cheers,
> Eugene.
>
>
>
>
> On Jul 11, 2017, at 12:08 PM, [email protected] wrote:
>
> Very odd - let me explore when I get back. Sorry for delay
>
> Sent from my iPad
>
> On Jul 11, 2017, at 10:59 AM, Eugene Dedits <[email protected]>
> wrote:
>
> Ralph,
>
>
> Are you suggesting doing something similar to this:
> https://www.open-mpi.org/faq/?category=sge#sge-suspend-resume
>
> If yes, here is what I’ve done:
> - start a job using slurm and "mpirun -mca orte_forward_job_control 1 -np 160
> xhpl”
> - ssh to the node where mpirun is launched
> - “kill -STOP PID” where PID is mpirun pid
> - “kill -TSTP PID”
>
> In both cases (STOP and TSTP) I observer that there were 16 mpi processes
> running
> at 100% on all 10 nodes where the job was started.
>
> Thanks,
> Eugene.
>
>
>
>
>
>
> On Jul 11, 2017, at 10:35 AM, [email protected] wrote:
>
>
> Odd - I'm on travel this week but can look at it next week. One
> possibility - have you tried hitting us with SIGTSTOP instead of SIGSTOP?
> Difference in ability to trap and forward
>
> Sent from my iPad
>
> On Jul 11, 2017, at 9:29 AM, Eugene Dedits <[email protected]>
> wrote:
>
>
> I’ve just tried 3.0.0rc1 and problems still persists there…
>
> Thanks,
> E.
>
>
>
> On Jul 11, 2017, at 10:20 AM, [email protected] wrote:
>
>
> Just checked the planning board and saw that my PR to bring that change to
> 2.1.2 is pending and not yet in the release branch. I’ll try to make that
> happen soon
>
> Sent from my iPad
>
> On Jul 11, 2017, at 8:03 AM, "[email protected]" <[email protected]> wrote:
>
>
> There is an mca param ess_base_forward_signals that controls which signals
> to forward. However, I just looked at source code and see that it wasn't
> backported. Sigh.
>
> You could try the 3.0.0 branch as it is in release candidate and should go
> out within a week. I'd suggest just cloning that branch of the OMPI repo to
> get the latest state. The fix is definitely there
>
> Sent from my iPad
>
> On Jul 11, 2017, at 7:45 AM, Eugene Dedits <[email protected]>
> wrote:
>
>
> Hi Ralph,
>
>
> thanks for reply. I’ve just tried upgrading to ompi 2.1.1. The same
> problem… :-\
> Could you point me to some discussion of this?
>
> Thanks,
> Eugene.
>
> On Jul 11, 2017, at 6:17 AM, [email protected] wrote:
>
>
> There is an issue with how the signal is forwarded. This has been fixed in
> the latest OMPI release so you might want to upgrade
>
> Ralph
>
> Sent from my iPad
>
> On Jul 11, 2017, at 2:53 AM, Dennis Tants <[email protected]>
> wrote:
>
>
> Hello Eugene,
>
> it is just a wild guess, but could you try "srun --mpi=pmi2"(you said
> you built OMPI with pmi support) instead of "mpirun".
> srun is build-in and I think the preferred way of running parallel
> processes. Maybe scontrol is able to suspend it this way.
>
> Regards,
> Dennis
>
> Am 10.07.2017 um 22:20 schrieb Eugene Dedits:
> Hello SLURM-DEV
>
>
> I have a problem with slurm, openmpi, and “scontrol suspend”.
>
> My setup is:
> 96-node cluster with IB, running rhel 6.8
> slurm 17.02.1
> openmpi 2.0.0 (built using Intel 2016 compiler)
>
>
> I am running some application (hpl in this particular case) using batch
> script similar to:
> -----------------------------
> #!/bin/bash
> #SBATCH —partiotion=standard
> #SBATCH -N 10
> #SBATCH —ntasks-per-node=16
>
> mpirun -np 160 xhpl | tee LOG
> -----------------------------
>
> So I am running it on 160 cores, 2 nodes.
>
> Once job is submitted to the queue and is running I suspend it using
> ~# scontrol suspend JOBID
>
> I see that indeed my job stopped producing output. I go to each of the 10
> nodes that were assigned for my job and see if the xhpl processes are
> running
> there with :
>
> ~# for i in {10..19}; do ssh node$i “top -b -n | head -n 50 | grep xhpl |
> wc -l”; done
>
> I expect this little script to return 0 from every node (because suspend
> sent the
> SIGSTOP and they shouldn’t show up in top). However I see that processes
> are reliable suspended only on node10. I get:
> 0
> 16
> 16
> …
> 16
>
> So 9 out of 10 nodes still have 16 MPI threads of my xhpl application
> running at 100%.
>
> If I run “scontrol resume JOBID” and then suspend it again I see that
> (sometimes) more
> nodes have “xhpl” processes properly suspended. Every time I resume and
> suspend the
> job, I see different nodes returning 0 in my “ssh-run-top” script.
>
> So all together it looks like the suspend mechanism doesn’t properly work
> in SLURM with
> OpenMPI. I’ve tried compiling OpenMPI with “—with-slurm
> —with-pmi=/path/to/my/slurm”.
> I’ve observed the same behavior.
>
> I would appreciate any help.
>
>
> Thanks,
> Eugene.
>
>
>
>
>
> --
> Dennis Tants
> Auszubildender: Fachinformatiker für Systemintegration
>
> ZARM - Zentrum für angewandte Raumfahrttechnologie und Mikrogravitation
> ZARM - Center of Applied Space Technology and Microgravity
>
> Universität Bremen
> Am Fallturm
> 28359 Bremen, Germany
>
> Telefon: 0421 218 57940
> E-Mail: [email protected]
>
> www.zarm.uni-bremen.de
>
>
>
>

[slurm-dev] Re: slurm + openmpi + suspend problem

Reply via email to