Hello,

Consider the following batch script:
#SBATCH --nodes=6                 # number of nodes
#SBATCH --ntasks-per-node=3       # processes per node

srun --ntasks-per-node=2 -n 3  ./<executable>

The environment of some process would look like (only topic-related part):

SLURM_JOB_CPUS_PER_NODE='4(x6)'
SLURM_JOB_NODELIST='cndev[1-4,8-9]'       <-- As expected 6 nodes
SLURM_JOB_NUM_NODES=3                        <-- Why 3 and not 6???
SLURM_NODELIST='cndev[1-4,8-9]'                <-- As expected 6 nodes
SLURM_NPROCS=3                                         <-- Why 3 and not
6???
SLURM_NTASKS=3
SLURM_STEP_NODELIST='cndev[1-3]'           <-- As expected 3 nodes
SLURM_STEP_NUM_NODES=3                      <-- As expected 3 nodes
SLURM_STEP_NUM_TASKS=3
SLURM_STEP_TASKS_PER_NODE='1(x3)'   <-- duplication?!
SLURM_TASKS_PER_NODE='1(x3)'               <-- duplication?!

1. Is it correct that SLURM_JOB_NUM_NODES = SLURM_STEP_NUM_NODES? I thought
that SLURM_JOB_NUM_NODES should remain it's initial value for the whole job.

2. According to srun's man I can't definitely say if last two variables
duplicate each other or not:
- SLURM_STEP_TASKS_PER_NODE - Number of processes per node within the step.
- SLURM_TASKS_PER_NODE  - Number of tasks to be initiated on each node...

I understand that SLURM is flexible and I may miss some possible
configurations where this two values would be different, could you provide
the use case if that is correct? Was it done for backward portability
reasons?

-- 
С Уважением, Поляков Артем Юрьевич
Best regards, Artem Y. Polyakov

Reply via email to