Hi there, I am running a serial partition on three nodes. So the nodes are shared and each one should run some jobs (as many as memory or cpus/cores permit). The nodes have 2 sockets, each CPU has 6 cores and SMT (2 threads), and slurm "knows" that:
[2013-07-03T09:22:42+02:00] node:node01 cpus:24 c:6 s:2 t:2 mem:90112 a_mem:0 state:0 [2013-07-03T09:22:42+02:00] node:node02 cpus:24 c:6 s:2 t:2 mem:90112 a_mem:0 state:0 [2013-07-03T09:22:42+02:00] node:node03 cpus:24 c:6 s:2 t:2 mem:90112 a_mem:0 state:0 Memory-constraints seem to work. When I start jobs using 2 CPUs each, slurm puts first 12 jobs on node01, next 12 jobs on node02, next 12 jobs on node03 and then starts the next jobs additionally on node01 (though none of the first 12 jobs is finished already). The jobs have this in their header: ---------------- 8< ---- snip ---- 8< --------------------------- #SBATCH -p serial #SBATCH --nodes=1 #SBATCH --ntasks-per-node=2 #SBATCH --ntasks=2 #SBATCH --cpus-per-task=1 #SBATCH --mem=60mb #SBATCH --share #SBATCH --time=01:01:00 srun --mpi=openmpi ./testprog ---------------- 8< ---- snap ---- 8< --------------------------- For each job, processes like these are started: 11:59 00:00:00 srun 11:59 00:00:00 srun 11:59 00:00:11 /what/ever/./testprog 11:59 00:00:11 /what/ever/./testprog For each job slurm seems to know that it needs 2 CPUs: [2013-07-03T12:06:40+02:00] job_id:907 nhosts:1 ncpus:2 node_req:0 nodes=miklip08 [2013-07-03T12:06:40+02:00] Node[0]: [2013-07-03T12:06:40+02:00] Mem(MB):60:0 Sockets:2 Cores:6 CPUs:2:0 [2013-07-03T12:06:40+02:00] Socket[0] Core[0] is allocated [2013-07-03T12:06:40+02:00] -------------------- [2013-07-03T12:06:40+02:00] cpu_array_value[0]:2 reps:1 [2013-07-03T12:06:40+02:00] ==================== So after 12 jobs on a node, the next one should need to wait (until 2 cpus are availbale)... Anything that I am missing? Regards, Olaf P.S.: The according section from slurm.conf: ---------------- 8< ---- snip ---- 8< --------------------------- # SCHEDULING #DefMemPerCPU=0 FastSchedule=1 #MaxMemPerCPU=0 #SchedulerRootFilter=1 #SchedulerTimeSlice=30 SchedulerType=sched/backfill SchedulerPort=7321 SelectType=select/cons_res #SelectTypeParameters=CR_Core_Memory SelectTypeParameters=CR_CPU_Memory # COMPUTE NODES NodeName=node[01-03] RealMemory=90112 Sockets=2 CoresPerSocket=6 ThreadsPerCore=2 State=UNKNOWN PartitionName=serial AllowGroups=ALL Nodes=node[01-03] Default=YES Shared=YES DefMemPerCPU=4096 MaxMemPerCPU=16384 MaxTime=480 State=UP ---------------- 8< ---- snap ---- 8< --------------------------- -- Dipl. Inform. Olaf Gellert email [email protected] Deutsches Klimarechenzentrum GmbH phone +49 (0)40 460094 214 Bundesstrasse 45a fax +49 (0)40 460094 270 D-20146 Hamburg, Germany www http://www.dkrz.de Sitz der Gesellschaft: Hamburg Gesch�ftsf�hrer: Prof. Dr. Thomas Ludwig Registergericht: Amtsgericht Hamburg, HRB 39784
