We have an application that uses MPI in a master-slave mechanism.  The master 
(rank 0) does almost nothing, while the slaves (rank 1-N) are directed by the 
master.  The application uses GPUs (Nvidia k80 with 4 GPUs per node), a 
precious commodity on our cluster.

Ideally, the application should be run with an uneven distribution of tasks, 
such that the first node allocated will have one additional task to serve as 
the master.  This can be done for 1 or 2 nodes running 4 or 8 slaves (5 or 9 
tasks) respectively, like so:

$ sbatch --nodes=1 --ntasks=5 batch.script

places 5 tasks on a single 4xGPU node.

$ sbatch --ntasks=9 --ntasks-per-node=5 batch.script

places 5 tasks on the first 4xGPU node and 4 tasks on the second.

However for anything more than 2 nodes, slurm does not allow this because of a 
conflict between --nodes, --ntasks, and --ntasks-per-node:

$ sbatch --nodes=3 --ntasks=13 --ntasks-per-node=4 batch.script
$ sbatch --nodes=4 --ntasks=17 --ntasks-per-node=4 batch.script

Is there a way of placing N+1 tasks on the first node, and N tasks on the 
remaining nodes allocated?  Ideally, something like this:

$ sbatch --nodes=4 --ntasks=5,4,4,4 batch.script

or

$ sbatch --nodes=4 --ntasks-per-node=5,4,4,4 batch.script

Wondering,

David Hoover
HPC @ NIH

Reply via email to