What stop signal is being sent, and where? We will catch and suspend the job on receipt of a SIGTSTP signal by mpirun.
> On Nov 9, 2014, at 6:47 AM, Jason Bacon <[email protected]> wrote: > > > > Does anyone have SUSPEND,GANG working with openmpi via mpirun? > > I've set up a low-priority queue, which seems to be working, except that for > openmpi jobs, only the processes on the MPI root node seem to be getting the > stop signal. > > From slurm.conf: > > SelectType=select/cons_res > SelectTypeParameters=CR_Core_Memory > PreemptMode=SUSPEND,GANG > PreemptType=preempt/partition_prio > > MpiDefault=none > > I've also tried --mca orte_forward_job_control 1, but it had no apparent > effect. > > Thanks, > > Jason
