[slurm-dev] Re: slurm + openmpi + suspend problem

[email protected] Tue, 11 Jul 2017 09:08:20 -0700

Very odd - let me explore when I get back. Sorry for delay 

Sent from my iPad


> On Jul 11, 2017, at 10:59 AM, Eugene Dedits <[email protected]> wrote:
> 
> Ralph, 
> 
> 
> Are you suggesting doing something similar to this:
> https://www.open-mpi.org/faq/?category=sge#sge-suspend-resume
> 
> If yes, here is what I’ve done:
> - start a job using slurm and "mpirun -mca orte_forward_job_control 1 -np 160 
> xhpl”
> - ssh to the node where mpirun is launched
> - “kill -STOP PID” where PID is mpirun pid
> - “kill -TSTP PID” 
> 
> In both cases (STOP and TSTP) I observer that there were 16 mpi processes 
> running
> at 100% on all 10 nodes where the job was started. 
> 
> Thanks,
> Eugene. 
> 
> 
> 
> 
> 
> 
>> On Jul 11, 2017, at 10:35 AM, [email protected] wrote:
>> 
>> 
>> Odd - I'm on travel this week but can look at it next week. One possibility 
>> - have you tried hitting us with SIGTSTOP instead of SIGSTOP? Difference in 
>> ability to trap and forward
>> 
>> Sent from my iPad
>> 
>>> On Jul 11, 2017, at 9:29 AM, Eugene Dedits <[email protected]> wrote:
>>> 
>>> 
>>> I’ve just tried 3.0.0rc1 and problems still persists there… 
>>> 
>>> Thanks,
>>> E. 
>>> 
>>> 
>>> 
>>>> On Jul 11, 2017, at 10:20 AM, [email protected] wrote:
>>>> 
>>>> 
>>>> Just checked the planning board and saw that my PR to bring that change to 
>>>> 2.1.2 is pending and not yet in the release branch. I’ll try to make that 
>>>> happen soon
>>>> 
>>>> Sent from my iPad
>>>> 
>>>>> On Jul 11, 2017, at 8:03 AM, "[email protected]" <[email protected]> 
>>>>> wrote:
>>>>> 
>>>>> 
>>>>> There is an mca param ess_base_forward_signals that controls which 
>>>>> signals to forward. However, I just looked at source code and see that it 
>>>>> wasn't backported. Sigh.
>>>>> 
>>>>> You could try the 3.0.0 branch as it is in release candidate and should 
>>>>> go out within a week. I'd suggest just cloning that branch of the OMPI 
>>>>> repo to get the latest state. The fix is definitely there 
>>>>> 
>>>>> Sent from my iPad
>>>>> 
>>>>>> On Jul 11, 2017, at 7:45 AM, Eugene Dedits <[email protected]> 
>>>>>> wrote:
>>>>>> 
>>>>>> 
>>>>>> Hi Ralph, 
>>>>>> 
>>>>>> 
>>>>>> thanks for reply. I’ve just tried upgrading to ompi 2.1.1. The same 
>>>>>> problem… :-\
>>>>>> Could you point me to some discussion of this? 
>>>>>> 
>>>>>> Thanks,
>>>>>> Eugene. 
>>>>>> 
>>>>>>> On Jul 11, 2017, at 6:17 AM, [email protected] wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> There is an issue with how the signal is forwarded. This has been fixed 
>>>>>>> in the latest OMPI release so you might want to upgrade 
>>>>>>> 
>>>>>>> Ralph
>>>>>>> 
>>>>>>> Sent from my iPad
>>>>>>> 
>>>>>>>> On Jul 11, 2017, at 2:53 AM, Dennis Tants 
>>>>>>>> <[email protected]> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Hello Eugene,
>>>>>>>> 
>>>>>>>> it is just a wild guess, but could you try "srun --mpi=pmi2"(you said
>>>>>>>> you built OMPI with pmi support) instead of "mpirun".
>>>>>>>> srun is build-in and I think the preferred way of running parallel
>>>>>>>> processes. Maybe scontrol is able to suspend it this way.
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Dennis
>>>>>>>> 
>>>>>>>>> Am 10.07.2017 um 22:20 schrieb Eugene Dedits:
>>>>>>>>> Hello SLURM-DEV
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I have a problem with slurm, openmpi, and “scontrol suspend”. 
>>>>>>>>> 
>>>>>>>>> My setup is:
>>>>>>>>> 96-node cluster with IB, running rhel 6.8
>>>>>>>>> slurm 17.02.1
>>>>>>>>> openmpi 2.0.0 (built using Intel 2016 compiler)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I am running some application (hpl in this particular case) using 
>>>>>>>>> batch script similar to:
>>>>>>>>> -----------------------------
>>>>>>>>> #!/bin/bash
>>>>>>>>> #SBATCH —partiotion=standard
>>>>>>>>> #SBATCH -N 10
>>>>>>>>> #SBATCH —ntasks-per-node=16
>>>>>>>>> 
>>>>>>>>> mpirun -np 160 xhpl | tee LOG
>>>>>>>>> -----------------------------
>>>>>>>>> 
>>>>>>>>> So I am running it on 160 cores, 2 nodes. 
>>>>>>>>> 
>>>>>>>>> Once job is submitted to the queue and is running I suspend it using
>>>>>>>>> ~# scontrol suspend JOBID
>>>>>>>>> 
>>>>>>>>> I see that indeed my job stopped producing output. I go to each of 
>>>>>>>>> the 10
>>>>>>>>> nodes that were assigned for my job and see if the xhpl processes are 
>>>>>>>>> running
>>>>>>>>> there with :
>>>>>>>>> 
>>>>>>>>> ~# for i in {10..19}; do ssh node$i “top -b -n | head -n 50 | grep 
>>>>>>>>> xhpl | wc -l”; done
>>>>>>>>> 
>>>>>>>>> I expect this little script to return 0 from every node (because 
>>>>>>>>> suspend sent the
>>>>>>>>> SIGSTOP and they shouldn’t show up in top). However I see that 
>>>>>>>>> processes 
>>>>>>>>> are reliable suspended only on node10. I get:
>>>>>>>>> 0
>>>>>>>>> 16
>>>>>>>>> 16
>>>>>>>>> …
>>>>>>>>> 16
>>>>>>>>> 
>>>>>>>>> So 9 out of 10 nodes still have 16 MPI threads of my xhpl application 
>>>>>>>>> running at 100%. 
>>>>>>>>> 
>>>>>>>>> If I run “scontrol resume JOBID” and then suspend it again I see that 
>>>>>>>>> (sometimes) more
>>>>>>>>> nodes have “xhpl” processes properly suspended. Every time I resume 
>>>>>>>>> and suspend the
>>>>>>>>> job, I see different nodes returning 0 in my “ssh-run-top” script. 
>>>>>>>>> 
>>>>>>>>> So all together it looks like the suspend mechanism doesn’t properly 
>>>>>>>>> work in SLURM with 
>>>>>>>>> OpenMPI. I’ve tried compiling OpenMPI with “—with-slurm 
>>>>>>>>> —with-pmi=/path/to/my/slurm”. 
>>>>>>>>> I’ve observed the same behavior. 
>>>>>>>>> 
>>>>>>>>> I would appreciate any help.   
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Eugene. 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> Dennis Tants
>>>>>>>> Auszubildender: Fachinformatiker für Systemintegration
>>>>>>>> 
>>>>>>>> ZARM - Zentrum für angewandte Raumfahrttechnologie und Mikrogravitation
>>>>>>>> ZARM - Center of Applied Space Technology and Microgravity
>>>>>>>> 
>>>>>>>> Universität Bremen
>>>>>>>> Am Fallturm
>>>>>>>> 28359 Bremen, Germany
>>>>>>>> 
>>>>>>>> Telefon: 0421 218 57940
>>>>>>>> E-Mail: [email protected]
>>>>>>>> 
>>>>>>>> www.zarm.uni-bremen.de
>

[slurm-dev] Re: slurm + openmpi + suspend problem

Reply via email to