Hello SLURM-DEV

I have a problem with slurm, openmpi, and “scontrol suspend”. 

My setup is:
96-node cluster with IB, running rhel 6.8
slurm 17.02.1
openmpi 2.0.0 (built using Intel 2016 compiler)


I am running some application (hpl in this particular case) using batch script 
similar to:
-----------------------------
#!/bin/bash
#SBATCH —partiotion=standard
#SBATCH -N 10
#SBATCH —ntasks-per-node=16

mpirun -np 160 xhpl | tee LOG
-----------------------------

So I am running it on 160 cores, 2 nodes. 

Once job is submitted to the queue and is running I suspend it using
~# scontrol suspend JOBID

I see that indeed my job stopped producing output. I go to each of the 10
nodes that were assigned for my job and see if the xhpl processes are running
there with :

~# for i in {10..19}; do ssh node$i “top -b -n | head -n 50 | grep xhpl | wc 
-l”; done

I expect this little script to return 0 from every node (because suspend sent 
the
SIGSTOP and they shouldn’t show up in top). However I see that processes 
are reliable suspended only on node10. I get:
0
16
16
…
16

So 9 out of 10 nodes still have 16 MPI threads of my xhpl application running 
at 100%. 

If I run “scontrol resume JOBID” and then suspend it again I see that 
(sometimes) more
nodes have “xhpl” processes properly suspended. Every time I resume and suspend 
the
job, I see different nodes returning 0 in my “ssh-run-top” script. 

So all together it looks like the suspend mechanism doesn’t properly work in 
SLURM with 
OpenMPI. I’ve tried compiling OpenMPI with “—with-slurm 
—with-pmi=/path/to/my/slurm”. 
I’ve observed the same behavior. 

I would appreciate any help.   


Thanks,
Eugene. 



 

Reply via email to