Sten, Some thoughts which may or may not be of any help…
1. The slurmctld determines the number of tasks to allocate on each node. You need the task/affinity plugin enabled for the slurmd to actually bind tasks to sockets, cores, or threads. 2. A good description of how this happens can be found here: http://www.schedmd.com/slurmdocs/cpu_management.html 3. The code that handles the rich set of related options (some of which you explored below) is very complicated and has undergone numerous improvements in the past. 4. The answer as to whether there is a bug in v2.3.2 in the code you exercised is to compare it against the latest v2.4 behavior. If it works as expected in v2.4, then v2.3.2 has a bug. Don From: Sten Wolf [mailto:[email protected]] Sent: Tuesday, March 13, 2012 11:41 PM To: slurm-dev Subject: [slurm-dev] Re: forcing tasks per socket constraint with openmpi Got it working in the end - sbatch -N 8 -B 2:4:1 --ntasks-per-socket=4 --ntasks-per-node=8 myapp.sh It seems redundant to add --ntasks-per-node=8, if the node description already includes 2 sockets and the ntasks-per-socket is defined. Is that a bug? On 14/03//2012 05:51, Sten Wolf wrote: Hi, I am using slurm 2.3.2, with openmpi 1.4.5 on dual socket - 6 core - single thread intel CPUs (defined in slurm.conf, as well as SelectTypeParameters=CR_Core_Memory). I am trying to run a 64 logical cpu mpi app on 8 nodes, using 4 tasks per socket, but everything I've tried from sbatch uses either 6 nodes (12 cores per node x 5 + 4cores), or 8 nodes with 8 cores per node, but assigned 6+2 instead of 4+4. I know I can solve this easily at the mpirun level by providing a correct machine file etc. , but I'm hoping I can use the scheduler to assign resources correctly. I have created a simple batch file containing 2 lines: $ cat myapp.sh #!/bin/bash mpirun --mca btl openib,self,sm $HOME/myapp So far I have tried the following: 1. sbatch -n 64 --ntasks-per-socket=4 myapp.sh 2. sbatch -n 64 -B 2:4:1 myapp.sh 3. sbatch -N 8 -B 2:4:1 myapp.sh 4. sbatch -N 8 --ntasks-per-socket=4 myapp.sh 5. sbatch -N 8 -B 2:4:1 --ntasks-per-node=8 myapp.sh 1-4) allocate 12 tasks per node, 5) allocates 8 tasks per node (6+2 allocation). What am I doing wrong? for some reason, when using -B, even though the allocation is the same as when unused, I get better results (total runtime-wise). I assume -B is only a constraint (only allocate nodes which support at least the -B geometry), but I was hoping there was some way for slurm to pass my preferences to openmpi. Thanks in advance
<<inline: ~WRD000.jpg>>
