[slurm-dev] Re: slurm + openmpi + suspend problem

Eugene Dedits Wed, 19 Jul 2017 11:21:28 -0700

Ralph,

it seems to work now. Thanks a bunch!
When do you think we should expect 3.0.0 release?


Best,
Eugene.




On Tue, Jul 18, 2017 at 1:07 PM, [email protected] <[email protected]> wrote:

> Okay, I tracked it down and have a fix pending for OMPI master:
> https://github.com/open-mpi/ompi/pull/3930
>
> Once that cycles thru, I’ll create a PR for the 3.0 release. I’m not sure
> about taking it back to v2.x - I’ll have to check with those release
> managers.
>
> On Jul 18, 2017, at 7:33 AM, [email protected] wrote:
>
> Just looking at it today...
>
> On Jul 18, 2017, at 7:25 AM, Eugene Dedits <[email protected]>
> wrote:
>
> Hi Ralph,
>
>
> did you have a chance to take a look at this problem?
>
> Thanks!
> Eugene.
>
>
>
>
> On Tue, Jul 11, 2017 at 12:51 PM, Eugene Dedits <[email protected]>
> wrote:
>
>> Thanks! I really appreciate your help.
>> In a meantime I’ve tried experimenting with 1.8.3. Here is what I’ve
>> noticed.
>>
>> 1. Running the job with “sbatch ./my_script” where my script calls
>> mpirun -np 160 -mca orte_forward_job_control 1 ./xhpl
>>
>> and then suspending the job with “scontrol suspend JOBID”
>> does not work. Of 10 nodes assigned to my job 4 are still running
>> 16 mpi threads of xhpl.
>>
>> 2. Running exactly the same job and then sending TSPT to mpirun process
>> does work: all 10 nodes show that xhpl processes are stopped. Resuming
>> them with -CONT also works.
>>
>> Again, this is with OpenMPI 1.8.3
>>
>> Once again, thank you for all the help.
>>
>> Cheers,
>> Eugene.
>>
>>
>>
>>
>> On Jul 11, 2017, at 12:08 PM, [email protected] wrote:
>>
>> Very odd - let me explore when I get back. Sorry for delay
>>
>> Sent from my iPad
>>
>> On Jul 11, 2017, at 10:59 AM, Eugene Dedits <[email protected]>
>> wrote:
>>
>> Ralph,
>>
>>
>> Are you suggesting doing something similar to this:
>> https://www.open-mpi.org/faq/?category=sge#sge-suspend-resume
>>
>> If yes, here is what I’ve done:
>> - start a job using slurm and "mpirun -mca orte_forward_job_control 1 -np 160
>> xhpl”
>> - ssh to the node where mpirun is launched
>> - “kill -STOP PID” where PID is mpirun pid
>> - “kill -TSTP PID”
>>
>> In both cases (STOP and TSTP) I observer that there were 16 mpi processes
>> running
>> at 100% on all 10 nodes where the job was started.
>>
>> Thanks,
>> Eugene.
>>
>>
>>
>>
>>
>>
>> On Jul 11, 2017, at 10:35 AM, [email protected] wrote:
>>
>>
>> Odd - I'm on travel this week but can look at it next week. One
>> possibility - have you tried hitting us with SIGTSTOP instead of SIGSTOP?
>> Difference in ability to trap and forward
>>
>> Sent from my iPad
>>
>> On Jul 11, 2017, at 9:29 AM, Eugene Dedits <[email protected]>
>> wrote:
>>
>>
>> I’ve just tried 3.0.0rc1 and problems still persists there…
>>
>> Thanks,
>> E.
>>
>>
>>
>> On Jul 11, 2017, at 10:20 AM, [email protected] wrote:
>>
>>
>> Just checked the planning board and saw that my PR to bring that change
>> to 2.1.2 is pending and not yet in the release branch. I’ll try to make
>> that happen soon
>>
>> Sent from my iPad
>>
>> On Jul 11, 2017, at 8:03 AM, "[email protected]" <[email protected]> wrote:
>>
>>
>> There is an mca param ess_base_forward_signals that controls which
>> signals to forward. However, I just looked at source code and see that it
>> wasn't backported. Sigh.
>>
>> You could try the 3.0.0 branch as it is in release candidate and should
>> go out within a week. I'd suggest just cloning that branch of the OMPI repo
>> to get the latest state. The fix is definitely there
>>
>> Sent from my iPad
>>
>> On Jul 11, 2017, at 7:45 AM, Eugene Dedits <[email protected]>
>> wrote:
>>
>>
>> Hi Ralph,
>>
>>
>> thanks for reply. I’ve just tried upgrading to ompi 2.1.1. The same
>> problem… :-\
>> Could you point me to some discussion of this?
>>
>> Thanks,
>> Eugene.
>>
>> On Jul 11, 2017, at 6:17 AM, [email protected] wrote:
>>
>>
>> There is an issue with how the signal is forwarded. This has been fixed
>> in the latest OMPI release so you might want to upgrade
>>
>> Ralph
>>
>> Sent from my iPad
>>
>> On Jul 11, 2017, at 2:53 AM, Dennis Tants <[email protected].
>> de> wrote:
>>
>>
>> Hello Eugene,
>>
>> it is just a wild guess, but could you try "srun --mpi=pmi2"(you said
>> you built OMPI with pmi support) instead of "mpirun".
>> srun is build-in and I think the preferred way of running parallel
>> processes. Maybe scontrol is able to suspend it this way.
>>
>> Regards,
>> Dennis
>>
>> Am 10.07.2017 um 22:20 schrieb Eugene Dedits:
>> Hello SLURM-DEV
>>
>>
>> I have a problem with slurm, openmpi, and “scontrol suspend”.
>>
>> My setup is:
>> 96-node cluster with IB, running rhel 6.8
>> slurm 17.02.1
>> openmpi 2.0.0 (built using Intel 2016 compiler)
>>
>>
>> I am running some application (hpl in this particular case) using batch
>> script similar to:
>> -----------------------------
>> #!/bin/bash
>> #SBATCH —partiotion=standard
>> #SBATCH -N 10
>> #SBATCH —ntasks-per-node=16
>>
>> mpirun -np 160 xhpl | tee LOG
>> -----------------------------
>>
>> So I am running it on 160 cores, 2 nodes.
>>
>> Once job is submitted to the queue and is running I suspend it using
>> ~# scontrol suspend JOBID
>>
>> I see that indeed my job stopped producing output. I go to each of the 10
>> nodes that were assigned for my job and see if the xhpl processes are
>> running
>> there with :
>>
>> ~# for i in {10..19}; do ssh node$i “top -b -n | head -n 50 | grep xhpl |
>> wc -l”; done
>>
>> I expect this little script to return 0 from every node (because suspend
>> sent the
>> SIGSTOP and they shouldn’t show up in top). However I see that processes
>> are reliable suspended only on node10. I get:
>> 0
>> 16
>> 16
>> …
>> 16
>>
>> So 9 out of 10 nodes still have 16 MPI threads of my xhpl application
>> running at 100%.
>>
>> If I run “scontrol resume JOBID” and then suspend it again I see that
>> (sometimes) more
>> nodes have “xhpl” processes properly suspended. Every time I resume and
>> suspend the
>> job, I see different nodes returning 0 in my “ssh-run-top” script.
>>
>> So all together it looks like the suspend mechanism doesn’t properly work
>> in SLURM with
>> OpenMPI. I’ve tried compiling OpenMPI with “—with-slurm
>> —with-pmi=/path/to/my/slurm”.
>> I’ve observed the same behavior.
>>
>> I would appreciate any help.
>>
>>
>> Thanks,
>> Eugene.
>>
>>
>>
>>
>>
>> --
>> Dennis Tants
>> Auszubildender: Fachinformatiker für Systemintegration
>>
>> ZARM - Zentrum für angewandte Raumfahrttechnologie und Mikrogravitation
>> ZARM - Center of Applied Space Technology and Microgravity
>>
>> Universität Bremen
>> Am Fallturm
>> 28359 Bremen, Germany
>>
>> Telefon: 0421 218 57940
>> E-Mail: [email protected]
>>
>> www.zarm.uni-bremen.de
>>
>>
>>
>>
>
>
>

[slurm-dev] Re: slurm + openmpi + suspend problem

Reply via email to