[slurm-dev] Re: slurm + openmpi + suspend problem

Eugene Dedits Tue, 11 Jul 2017 08:58:44 -0700

Ralph,


Are you suggesting doing something similar to this:
https://www.open-mpi.org/faq/?category=sge#sge-suspend-resume 
<https://www.open-mpi.org/faq/?category=sge#sge-suspend-resume>

If yes, here is what I’ve done:
- start a job using slurm and "mpirun -mca orte_forward_job_control 1 -np 160 
xhpl”
- ssh to the node where mpirun is launched
- “kill -STOP PID” where PID is mpirun pid
- “kill -TSTP PID” 

In both cases (STOP and TSTP) I observer that there were 16 mpi processes 
running
at 100% on all 10 nodes where the job was started. 

Thanks,
Eugene. 






> On Jul 11, 2017, at 10:35 AM, [email protected] wrote:
> 
> 
> Odd - I'm on travel this week but can look at it next week. One possibility - 
> have you tried hitting us with SIGTSTOP instead of SIGSTOP? Difference in 
> ability to trap and forward
> 
> Sent from my iPad
> 
>> On Jul 11, 2017, at 9:29 AM, Eugene Dedits <[email protected]> wrote:
>> 
>> 
>> I’ve just tried 3.0.0rc1 and problems still persists there… 
>> 
>> Thanks,
>> E. 
>> 
>> 
>> 
>>> On Jul 11, 2017, at 10:20 AM, [email protected] wrote:
>>> 
>>> 
>>> Just checked the planning board and saw that my PR to bring that change to 
>>> 2.1.2 is pending and not yet in the release branch. I’ll try to make that 
>>> happen soon
>>> 
>>> Sent from my iPad
>>> 
>>>> On Jul 11, 2017, at 8:03 AM, "[email protected]" <[email protected]> wrote:
>>>> 
>>>> 
>>>> There is an mca param ess_base_forward_signals that controls which signals 
>>>> to forward. However, I just looked at source code and see that it wasn't 
>>>> backported. Sigh.
>>>> 
>>>> You could try the 3.0.0 branch as it is in release candidate and should go 
>>>> out within a week. I'd suggest just cloning that branch of the OMPI repo 
>>>> to get the latest state. The fix is definitely there 
>>>> 
>>>> Sent from my iPad
>>>> 
>>>>> On Jul 11, 2017, at 7:45 AM, Eugene Dedits <[email protected]> 
>>>>> wrote:
>>>>> 
>>>>> 
>>>>> Hi Ralph, 
>>>>> 
>>>>> 
>>>>> thanks for reply. I’ve just tried upgrading to ompi 2.1.1. The same 
>>>>> problem… :-\
>>>>> Could you point me to some discussion of this? 
>>>>> 
>>>>> Thanks,
>>>>> Eugene. 
>>>>> 
>>>>>> On Jul 11, 2017, at 6:17 AM, [email protected] wrote:
>>>>>> 
>>>>>> 
>>>>>> There is an issue with how the signal is forwarded. This has been fixed 
>>>>>> in the latest OMPI release so you might want to upgrade 
>>>>>> 
>>>>>> Ralph
>>>>>> 
>>>>>> Sent from my iPad
>>>>>> 
>>>>>>> On Jul 11, 2017, at 2:53 AM, Dennis Tants 
>>>>>>> <[email protected]> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> Hello Eugene,
>>>>>>> 
>>>>>>> it is just a wild guess, but could you try "srun --mpi=pmi2"(you said
>>>>>>> you built OMPI with pmi support) instead of "mpirun".
>>>>>>> srun is build-in and I think the preferred way of running parallel
>>>>>>> processes. Maybe scontrol is able to suspend it this way.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Dennis
>>>>>>> 
>>>>>>>> Am 10.07.2017 um 22:20 schrieb Eugene Dedits:
>>>>>>>> Hello SLURM-DEV
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I have a problem with slurm, openmpi, and “scontrol suspend”. 
>>>>>>>> 
>>>>>>>> My setup is:
>>>>>>>> 96-node cluster with IB, running rhel 6.8
>>>>>>>> slurm 17.02.1
>>>>>>>> openmpi 2.0.0 (built using Intel 2016 compiler)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I am running some application (hpl in this particular case) using 
>>>>>>>> batch script similar to:
>>>>>>>> -----------------------------
>>>>>>>> #!/bin/bash
>>>>>>>> #SBATCH —partiotion=standard
>>>>>>>> #SBATCH -N 10
>>>>>>>> #SBATCH —ntasks-per-node=16
>>>>>>>> 
>>>>>>>> mpirun -np 160 xhpl | tee LOG
>>>>>>>> -----------------------------
>>>>>>>> 
>>>>>>>> So I am running it on 160 cores, 2 nodes. 
>>>>>>>> 
>>>>>>>> Once job is submitted to the queue and is running I suspend it using
>>>>>>>> ~# scontrol suspend JOBID
>>>>>>>> 
>>>>>>>> I see that indeed my job stopped producing output. I go to each of the 
>>>>>>>> 10
>>>>>>>> nodes that were assigned for my job and see if the xhpl processes are 
>>>>>>>> running
>>>>>>>> there with :
>>>>>>>> 
>>>>>>>> ~# for i in {10..19}; do ssh node$i “top -b -n | head -n 50 | grep 
>>>>>>>> xhpl | wc -l”; done
>>>>>>>> 
>>>>>>>> I expect this little script to return 0 from every node (because 
>>>>>>>> suspend sent the
>>>>>>>> SIGSTOP and they shouldn’t show up in top). However I see that 
>>>>>>>> processes 
>>>>>>>> are reliable suspended only on node10. I get:
>>>>>>>> 0
>>>>>>>> 16
>>>>>>>> 16
>>>>>>>> …
>>>>>>>> 16
>>>>>>>> 
>>>>>>>> So 9 out of 10 nodes still have 16 MPI threads of my xhpl application 
>>>>>>>> running at 100%. 
>>>>>>>> 
>>>>>>>> If I run “scontrol resume JOBID” and then suspend it again I see that 
>>>>>>>> (sometimes) more
>>>>>>>> nodes have “xhpl” processes properly suspended. Every time I resume 
>>>>>>>> and suspend the
>>>>>>>> job, I see different nodes returning 0 in my “ssh-run-top” script. 
>>>>>>>> 
>>>>>>>> So all together it looks like the suspend mechanism doesn’t properly 
>>>>>>>> work in SLURM with 
>>>>>>>> OpenMPI. I’ve tried compiling OpenMPI with “—with-slurm 
>>>>>>>> —with-pmi=/path/to/my/slurm”. 
>>>>>>>> I’ve observed the same behavior. 
>>>>>>>> 
>>>>>>>> I would appreciate any help.   
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Eugene. 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> Dennis Tants
>>>>>>> Auszubildender: Fachinformatiker für Systemintegration
>>>>>>> 
>>>>>>> ZARM - Zentrum für angewandte Raumfahrttechnologie und Mikrogravitation
>>>>>>> ZARM - Center of Applied Space Technology and Microgravity
>>>>>>> 
>>>>>>> Universität Bremen
>>>>>>> Am Fallturm
>>>>>>> 28359 Bremen, Germany
>>>>>>> 
>>>>>>> Telefon: 0421 218 57940
>>>>>>> E-Mail: [email protected]
>>>>>>> 
>>>>>>> www.zarm.uni-bremen.de

[slurm-dev] Re: slurm + openmpi + suspend problem

Reply via email to