Hi Folks, We ran into an interesting issue with the SGI MPT mpiexec_mpt script that references SLURM_STEP_NODELIST and SLURM_STEP_TASKS_PER_NODE variables instead of the job-level SLURM_JOB_NODELIST and SLURM_TASKS_PER_NODE variables. It reads these environment variables and then ultimately calls srun which creates a new job step. This is probably incorrect since it can't actually launch into the job step for which those SLURM_STEP variables apply and we're working with SGI on this. It manifests itself as errors during interactive jobs where the interactive shell is spawned by an "salloc srun -N1 -n1 --preserve-env --pty $SHELL". mpiexec_mpt reads the SLURM_STEP variable set by the shell launched by srun, thinks there's only one task available and will only then launch a single task regardless of the allocation available to the job.
It got me thinking-- something feels a little wrong about the SLURM_STEP variables being set when launching an interactive shell as the first step of an salloc allocation. On the flip side of that I can also understand why they're set and it does make sense. Is there a way to make this initial srun for the interactive shell behave like the batch script does in a batch job? The batch script itself doesn't seem to be treated as a job step (please, correct me if I'm wrong) and it certainly doesn't get the SLURM_STEP_ variables set. I'm curious to get other peoples thoughts on this. I know I could just unset the SLURM_STEP_ variables in some type of interactive wrapper but that feels a little hacky and possibly dangerous. -Aaron
