On 11/09/14 09:32, Marcin Stolarek wrote:
Re: [slurm-dev] OpenMPI, mpirun and suspend/gang
W dniu niedziela, 9 listopada 2014 Ralph Castain <[email protected]
<mailto:[email protected]>> napisaĆ(a):
What stop signal is being sent, and where? We will catch and
suspend the job on receipt of a SIGTSTP signal by mpirun.
> On Nov 9, 2014, at 6:47 AM, Jason Bacon <[email protected]
<javascript:;>> wrote:
>
>
>
> Does anyone have SUSPEND,GANG working with openmpi via mpirun?
>
> I've set up a low-priority queue, which seems to be working,
except that for openmpi jobs, only the processes on the MPI root
node seem to be getting the stop signal.
>
> From slurm.conf:
>
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core_Memory
> PreemptMode=SUSPEND,GANG
> PreemptType=preempt/partition_prio
>
> MpiDefault=none
>
> I've also tried --mca orte_forward_job_control 1, but it had no
apparent effect.
>
> Thanks,
>
> Jason
Hi jason,
I've been testing the setup with freezer cgroup. It's working fine
with mpi jobs, but intensive tests shown that in case of
multinode jobs the job which should be suspended is using cpus on one
of the nodes, this happends only for small percent of tests.
Currently i'm not able to fully reproduce the issue, so i'm allowing
only one node jobs on the lower priority partition.
Cheers,
Marcin
Ralph,
My understanding is that the signal is sent by SLURM to the mpirun
process automatically due to a job in a higher priority queue needing
the resources. There seems to be limited documentation on this for
SLURM and for OpenMPI, but from what I've read, I think mpirun should be
able to forward the signal to all processes. However, it only reaches
those on the MPI root node, and all others continue to run.
I know MPI jobs can also be run using srun if reserved ports are set up,
but I want to be sure that jobs are reliably suspended no matter how a
user submits them.
Marcin,
Thanks for the info.
I'll follow up if/when I find a solution. I'm going to check the
openmpi build options to see if the signal forwarding feature requires
anything specific to enable it.
Regards,
Jason
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jason W. Bacon
[email protected]
Circumstances don't make a man:
They reveal him.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~