Slurm uses SIGSTOP/SIGCONT to time slice the jobs. It sends the signal
to all processes that it knows about (part of the job container).
Tasks spawned outside of Slurm are not signaled, but mpirun should be
using srun to spawn its tasks.
Quoting Jason Bacon <[email protected]>:
On 11/09/14 09:32, Marcin Stolarek wrote:
Re: [slurm-dev] OpenMPI, mpirun and suspend/gang
W dniu niedziela, 9 listopada 2014 Ralph Castain <[email protected]
<mailto:[email protected]>> napisaĆ(a):
What stop signal is being sent, and where? We will catch and
suspend the job on receipt of a SIGTSTP signal by mpirun.
> On Nov 9, 2014, at 6:47 AM, Jason Bacon <[email protected]
<javascript:;>> wrote:
>
>
>
> Does anyone have SUSPEND,GANG working with openmpi via mpirun?
>
> I've set up a low-priority queue, which seems to be working,
except that for openmpi jobs, only the processes on the MPI root
node seem to be getting the stop signal.
>
> From slurm.conf:
>
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core_Memory
> PreemptMode=SUSPEND,GANG
> PreemptType=preempt/partition_prio
>
> MpiDefault=none
>
> I've also tried --mca orte_forward_job_control 1, but it had no
apparent effect.
>
> Thanks,
>
> Jason
Hi jason,
I've been testing the setup with freezer cgroup. It's working fine
with mpi jobs, but intensive tests shown that in case of multinode
jobs the job which should be suspended is using cpus on one of the
nodes, this happends only for small percent of tests.
Currently i'm not able to fully reproduce the issue, so i'm
allowing only one node jobs on the lower priority partition.
Cheers,
Marcin
Ralph,
My understanding is that the signal is sent by SLURM to the mpirun
process automatically due to a job in a higher priority queue
needing the resources. There seems to be limited documentation on
this for SLURM and for OpenMPI, but from what I've read, I think
mpirun should be able to forward the signal to all processes.
However, it only reaches those on the MPI root node, and all others
continue to run.
I know MPI jobs can also be run using srun if reserved ports are set
up, but I want to be sure that jobs are reliably suspended no matter
how a user submits them.
Marcin,
Thanks for the info.
I'll follow up if/when I find a solution. I'm going to check the
openmpi build options to see if the signal forwarding feature
requires anything specific to enable it.
Regards,
Jason
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jason W. Bacon
[email protected]
Circumstances don't make a man:
They reveal him.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--
Morris "Moe" Jette
CTO, SchedMD LLC