Quoting [email protected]:
-----Original Message-----
From: Moe Jette [mailto:[email protected]]
Sent: Tuesday, 3 March 2015 9:54 AM
-snip-
The options for srun, sbatch, and salloc are almost identical with
respect to specification of a job's allocation requirements.
Yes. Part of my problem comes down to what it means to nest them,
since they are common options. I was surprised to find that this
(submitted with sbatch) runs one task for each core, rather than one
for each node:
#!/bin/bash
#SBATCH --nodes=2 --ntasks-per-node=16
srun --ntasks-per-node=1 uname -a
I picked apart what openmpi/mpirun does with --pernode and found I
can do the following:
#!/bin/bash
#SBATCH --nodes=2 --ntasks-per-node=16
srun --ntasks=$SLURM_NNODES uname -a
That is a fine workaround (as is mpirun --pernode), though I suspect
it might break with other layout options which might pack the tasks
in to the first node. The intent of --ntasks-per-node=1 seems
clearer so it unfortunate that it does not work as I want (or that
my understanding of what should be wanted is poor).
Yes, some of the srun options apply to a job allocation, some to a job
step, and some to both. I'll agree this needs more documentation and
will put that on our to-do list.
> 2) We currently have a few unrelated usage patterns where jobs
request
> multiple nodes but only some of the cores (perhaps to match jobs that
> they used on our previous cluster configuration). How would you deal
> with that case where --exclusive is not necessarily appropriate? A
big
> stick might be an option (and advice to use whole
> nodes) though the users are in different cities so it might have to
be
> a virtual stick.
Perhaps the salloc/sbatch/srun options: --cpus-per-task and/or --
ntasks-per-node
The problem only really arises when mixing different job steps that
need to use the resources with different patterns. This should be
unusual expect perhaps the per node pre/post process case. I think
I'd ideally prefer all the layout info to be in the sbatch request
and the 'main' step to run mpirun or srun with no particular
options, so the remaining part is how to handle the special per node
pre/post processing case.
The sbatch options controlling job layout (task count, node count,
cpus per task, etc.) are used to construct environment variables for
the batch script (same for salloc commands), so the srun command gets
those options by default. It is a typical use case for all resource
specification options to appear on the sbatch submit line (or in the
script), then have the srun commands within the script only identify
the application to be run.
There is another mode of operation in which a single job executes a
multitude of srun commands within an allocation, using different size
and layout options. These various job steps (each srun invocation),
can run serially or in parallel using overlapping or separate
resources (see srun's --exclusive option).
--
Morris "Moe" Jette
CTO, SchedMD LLC
Commercial Slurm Development and Support