Dear Community, I'm seeing strange behavior from sbatch with different --nodelist (-w) options on my two node cluster. Here are my test scripts:
*~/slurm$ cat mpirun.slm#!/bin/bash#SBATCH --job-name=mpirun_2x1#SBATCH --nodes=2#SBATCH --ntasks-per-node=1#SBATCH --exclusivesource /usr/mpi/gcc/openmpi-4.1.7a1/bin/mpivars.shmpirun ./a.sh* *~/slurm$ cat a.sh#!/bin/bashecho "`uname -n` OMPI_COMM_WORLD_RANK = $OMPI_COMM_WORLD_RANK SLURM_NODEID = $SLURM_NODEID SLURM_JOB_NODELIST = $SLURM_JOB_NODELIST"* If I do not specify any -w option or if I include both nodes in -w option, I get expected results. *~/slurm$ sbatch mpirun.slmSubmitted batch job 71~/slurm$ cat slurm-71.outstd-199 OMPI_COMM_WORLD_RANK = 0 SLURM_NODEID = 0 SLURM_JOB_NODELIST = std-[199,271]std-271 OMPI_COMM_WORLD_RANK = 1 SLURM_NODEID = 0 SLURM_JOB_NODELIST = std-[199,271]* However, if I specify only one node in -w option, but still want two nodes, I always get one expected result and one unexpected result. The unexpected one will dispatch both MPI tasks to the same node. This one is expected - running two MPI tasks across nodes. *~/slurm$ sbatch -w std-199 mpirun.slmSubmitted batch job 72~/slurm$ cat slurm-72.outstd-199 OMPI_COMM_WORLD_RANK = 0 SLURM_NODEID = 0 SLURM_JOB_NODELIST = std-[199,271]std-271 OMPI_COMM_WORLD_RANK = 1 SLURM_NODEID = 0 SLURM_JOB_NODELIST = std-[199,271]* This one if unexpected - ends up running two MPI tasks on the same node, though SLURM_JOB_NODELIST also gives the correct two nodes. *~/slurm$ sbatch -w std-271 mpirun.slmSubmitted batch job 73ubuntu@bright-anchovy-controller:~/slurm$ cat slurm-73.outstd-199 OMPI_COMM_WORLD_RANK = 0 SLURM_NODEID = 0 SLURM_JOB_NODELIST = std-[199,271]std-199 OMPI_COMM_WORLD_RANK = 1 SLURM_NODEID = 0 SLURM_JOB_NODELIST = std-[199,271]* I first saw the problem on a larger cluster where I needed to specify both -w and -x options to include and exclude nodes. I then narrowed it down to a two node cluster. I tried adding options like --hostfile or --rankfile, or -npernode, all does not change how the tasks are dispatched to nodes. The problem is repeatable. Here are the tested systems: Slurm 23.02.5 on Ubuntu 22.04.5 LTS Slurm 24.05.1 on Ubuntu 22.04.4 LTS How to make the last one work. I.e., requesting *-w std-271* and make it run on two nodes. I'd appreciate any help! Regards, Xinghong
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com