Sorry, I should have done a better job of vetting my concern before
sending the earlier mail.
If I have a script, say runner.sh, as follows:
------
#!/bin/bash
srun --ntasks-per-node=4 hostname
_____
And execute it with
$ salloc -N2 ./runner.sh
The results from Slurm 14.11.2 and earlier will be
-----
node01
node01
node01
node01
node02
node02
node02
node02
-----
But under Slurm 14.11.3, the result will be
-----
node01
node02
-----
According to the salloc and srun man pages, --ntasks-per-node
"Request[s] that ntasks be invoked on each node. If used with the
--ntasks option, the --ntasks option will take precedence and the
--ntasks-per-node will be treated as a maximum count of tasks per
node."
If srun is executed on its own, the correct thing happens here. If
it is invoked within a Slurm environment, it seems to get confused.
Again, apologies for the misinformation in the earlier note.
Andy
On 02/24/15 09:12, Andy Riebs wrote:
Serves me right for always running a version behind -- thanks for
the info!
Andy
On 02/24/15 09:10, CB wrote:
Re: [slurm-dev] Problem with --nnodes, --ntasks, and
--ntasks-per-node?
It seems that it's fixed with Slurm 14.11.4
$ sinfo -V
slurm 14.11.4
$ srun -N2 --ntasks-per-node=2 hostname
compute-1
compute-0
compute-1
compute-0
On Tue, Feb 24, 2015 at 8:51 AM, Andy
Riebs <[email protected]>
wrote:
When we moved from Slurm 14.11.2 to 14.11.3, a bunch of
our Slurm scripts broke!
In the past, if --nnodes and --ntasks-per-node were
specified, --ntasks would default to
(nnodes*ntasks-per-cpu); i.e.,
$ srun -N2 --ntasks-per-node=2 hostname
hadesn02
hadesn02
hadesn01
hadesn01
$
With Slurm 14.11.3, we see
$ srun -N2 --ntasks-per-node=2 hostname
hadesn02
hadesn01
$
Was this change intentional?
Andy
--
Andy Riebs
Hewlett-Packard Company
High Performance Computing
+1 404 648 9024
My opinions are not necessarily those of HP