The shared option is working as designed. See "man slurm.conf". Quoting Olaf Gellert <[email protected]>:
> > Hi there, > > I am running a serial partition on three nodes. So the nodes > are shared and each one should run some jobs (as many as memory > or cpus/cores permit). The nodes have 2 sockets, each CPU has > 6 cores and SMT (2 threads), and slurm "knows" that: > > [2013-07-03T09:22:42+02:00] node:node01 cpus:24 c:6 s:2 t:2 mem:90112 > a_mem:0 state:0 > [2013-07-03T09:22:42+02:00] node:node02 cpus:24 c:6 s:2 t:2 mem:90112 > a_mem:0 state:0 > [2013-07-03T09:22:42+02:00] node:node03 cpus:24 c:6 s:2 t:2 mem:90112 > a_mem:0 state:0 > > Memory-constraints seem to work. When I start jobs using 2 CPUs > each, slurm puts first 12 jobs on node01, next 12 jobs on node02, > next 12 jobs on node03 and then starts the next jobs additionally > on node01 (though none of the first 12 jobs is finished already). > > The jobs have this in their header: > > ---------------- 8< ---- snip ---- 8< --------------------------- > #SBATCH -p serial > #SBATCH --nodes=1 > #SBATCH --ntasks-per-node=2 > #SBATCH --ntasks=2 > #SBATCH --cpus-per-task=1 > #SBATCH --mem=60mb > #SBATCH --share > #SBATCH --time=01:01:00 > > srun --mpi=openmpi ./testprog > ---------------- 8< ---- snap ---- 8< --------------------------- > > For each job, processes like these are started: > > 11:59 00:00:00 srun > 11:59 00:00:00 srun > 11:59 00:00:11 /what/ever/./testprog > 11:59 00:00:11 /what/ever/./testprog > > For each job slurm seems to know that it needs > 2 CPUs: > > [2013-07-03T12:06:40+02:00] job_id:907 nhosts:1 ncpus:2 node_req:0 > nodes=miklip08 > [2013-07-03T12:06:40+02:00] Node[0]: > [2013-07-03T12:06:40+02:00] Mem(MB):60:0 Sockets:2 Cores:6 CPUs:2:0 > [2013-07-03T12:06:40+02:00] Socket[0] Core[0] is allocated > [2013-07-03T12:06:40+02:00] -------------------- > [2013-07-03T12:06:40+02:00] cpu_array_value[0]:2 reps:1 > [2013-07-03T12:06:40+02:00] ==================== > > So after 12 jobs on a node, the next one should need to > wait (until 2 cpus are availbale)... Anything that I am missing? > > Regards, Olaf > > P.S.: The according section from slurm.conf: > > ---------------- 8< ---- snip ---- 8< --------------------------- > # SCHEDULING > #DefMemPerCPU=0 > FastSchedule=1 > #MaxMemPerCPU=0 > #SchedulerRootFilter=1 > #SchedulerTimeSlice=30 > SchedulerType=sched/backfill > SchedulerPort=7321 > SelectType=select/cons_res > #SelectTypeParameters=CR_Core_Memory > SelectTypeParameters=CR_CPU_Memory > > # COMPUTE NODES > NodeName=node[01-03] RealMemory=90112 Sockets=2 CoresPerSocket=6 > ThreadsPerCore=2 State=UNKNOWN > PartitionName=serial AllowGroups=ALL Nodes=node[01-03] Default=YES > Shared=YES DefMemPerCPU=4096 MaxMemPerCPU=16384 MaxTime=480 State=UP > ---------------- 8< ---- snap ---- 8< --------------------------- > > -- > Dipl. Inform. Olaf Gellert email [email protected] > Deutsches Klimarechenzentrum GmbH phone +49 (0)40 460094 214 > Bundesstrasse 45a fax +49 (0)40 460094 270 > D-20146 Hamburg, Germany www http://www.dkrz.de > > Sitz der Gesellschaft: Hamburg > Gesch�ftsf�hrer: Prof. Dr. Thomas Ludwig > Registergericht: Amtsgericht Hamburg, HRB 39784 >
