Hi Folks,

We ran into an interesting issue with the SGI MPT mpiexec_mpt script that
references SLURM_STEP_NODELIST and SLURM_STEP_TASKS_PER_NODE variables
instead of the job-level SLURM_JOB_NODELIST and SLURM_TASKS_PER_NODE
variables. It reads these environment variables and then ultimately calls
srun which creates a new job step. This is probably incorrect since it
can't actually launch into the job step for which those SLURM_STEP
variables apply and we're working with SGI on this. It manifests itself as
errors during interactive jobs where the interactive shell is spawned by an
"salloc srun -N1 -n1 --preserve-env --pty $SHELL". mpiexec_mpt reads the
SLURM_STEP variable set by the shell launched by srun, thinks there's only
one task available and will only then launch a single task regardless of
the allocation available to the job.

It got me thinking-- something feels a little wrong about the SLURM_STEP
variables being set when launching an interactive shell as the first step
of an salloc allocation. On the flip side of that I can also understand why
they're set and it does make sense. Is there a way to make this initial
srun for the interactive shell behave like the batch script does in a batch
job? The batch script itself doesn't seem to be treated as a job step
(please, correct me if I'm wrong) and it certainly doesn't get the
SLURM_STEP_ variables set.

I'm curious to get other peoples thoughts on this. I know I could just
unset the SLURM_STEP_ variables in some type of interactive wrapper but
that feels a little hacky and possibly dangerous.

-Aaron

Reply via email to