We are running a slurm controller (2.6.5) with built-in scheduler. No
matter which options I give to sbatch and srun I can only manage to run
multiple tasks on a single core.

I have thousands of independent tasks I want to run. I should be able to
run them individually on a single core, right? I don't care about memory
bandwidth. I do care about using a dedicated core for each task.

All the compute nodes have 8 cores and I want to run 8 tasks on a dedicated
core. So task 1 should run on core 1 and ... and task 8 should run on core
8. What happens is that all 8 tasks are run on core 1. I do not want this.
I did also experiment with --exclusive and --shared. The used partition is
set in exclusive mode.

Here is an example batch script I use:
#!/bin/bash
#SBATCH --partition=m610 -N9 --output=~/experiments/scripts/slurm-out.log
--open-mode=append --cpus-per-task=1 --ntasks-per-core=1 --ntasks-per-node=8
#steps 1 - 500
srun -n1 -N1 --exclusive --time=35
~/experiments/scripts/steps/step_718f5c96-18da-421d-840a-ee94d4ddee18.sh &
... thousands more similar tasks ...

The full list of scheduling options is:
# SCHEDULING
#DefMemPerCPU=0
FastSchedule=1
#MaxMemPerCPU=0
#SchedulerRootFilter=1
#SchedulerTimeSlice=30
SchedulerType=sched/builtin
SchedulerPort=7321
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
SchedulerParameters=defer

Any ideas what I am doing wrong?

Reply via email to