Thanks for your suggestions. I did what you suggested. I restart the daemon using a gentoo init.d script, adapted from the included one (my work is on github @ https://github.com/nbigaouette/ebuilds/tree/master/sys-cluster/slurm). It does kill the daemon before restarting it.
"scontrol show node" did not show anything intersting: NodeName=node71 Arch=x86_64 CoresPerSocket=4 CPUAlloc=0 CPUErr=0 CPUTot=16 Features=(null) Gres=(null) OS=Linux RealMemory=23000 Sockets=2 State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1 BootTime=2011-03-22T19:46:32 SlurmdStartTime=2011-03-22T22:27:36 Reason=(null) Everything seems to be in order... I've set SelectTypeParameters=CR_CPU. But before trying it, I revisited the man page of slurm.conf: > CR_CPU CPUs are consumable resources. There is *no notion of sockets, > cores or threads*; > *do not define those values in the node specification*. If these > are defined, > unexpected results will happen when hyper-threading is enabled *Procs= > should be > used instead.* On a multi-core system, each core will be considered > a CPU. On a > multi-core and hyper-threaded system, each thread will be considered a > CPU. On > single-core systems, each CPUs will be considered a CPU. > I thus set: NodeName=node[69-71] RealMemory=23000 Procs=16 State=UNKNOWN restart everything, and now 16 jobs can runs at the same time on each node! Thanks for your support. I do have another question though. Is it possible to "reserve" more cpu then needed so that a job which is slowed down by HT can "reserve" a whole node while only running with 8 processes? I think I saw a previous email about a setting in the submission script, but can't find it anymore... Regards, Nicolas On Tue, Mar 22, 2011 at 11:05 PM, <[email protected]> wrote: > The only thing that comes to mind is explicitly configuring: > SelectTypeParameters=CR_CPUs > (that should be the default) and restarting the slurmctld > daemon (don't just run "scontrol reconfig" or send SIGHUP). > > I'd also execute "scontrol show node" to confirm that the > values are what you configured. > > > > Quoting Nicolas Bigaouette <[email protected]>: > > Hi Jette, >> >> Thank's for your ultra fast answer. Unfortunately (again), it does not >> affect the number of jobs running. I previously tried it without >> success. >> NodeName=node[69-71] RealMemory=23000 Procs=16 Sockets=2 >> CoresPerSocket=4 ThreadsPerCore=2 State=UNKNOWN >> >> Could there be another limit somewhere? Here is my complete configuration >> file: >> https://gist.github.com/882507 >> >> Thanks >> >> Nicolas >> >> On Tue, Mar 22, 2011 at 10:25 PM, <[email protected]> wrote: >> >>> Try adding "Procs=16" to the NodeName line. >>> By default, SLURM schedules one task per core/ >>> >>> Quoting Nicolas Bigaouette <[email protected]>: >>> >>> Hi all, >>>> >>>> I want to be able to submit 16 serial jobs on my compute nodes at the >>>> same time since each node is 2 sockets, 4 core, hyperthreading. We see >>>> a speedup when saturating the node with 16 different serial jobs >>>> (launched manually) so I want to take advantage of this with slurm. >>>> >>>> I tough it would be easy... >>>> >>>> Unfortunately, I always get at most 8 jobs running on nodes. >>>> >>>> Here is the relevant (I think) part of /etc/slurm.conf: >>>> # SCHEDULING >>>> #DefMemPerCPU=0 >>>> FastSchedule=1 >>>> #MaxMemPerCPU=0 >>>> #SchedulerRootFilter=1 >>>> #SchedulerTimeSlice=30 >>>> SchedulerType=sched/backfill >>>> SchedulerPort=7321 >>>> SelectType=select/cons_res >>>> NodeName=node[69-71] RealMemory=23000 Sockets=2 CoresPerSocket=4 >>>> ThreadsPerCore=2 State=UNKNOWN >>>> PartitionName=test Nodes=node[69-71] >>>> MaxTime=INFINITE State=UP >>>> >>>> The logs don't show anything interesting. For example, setting >>>> ThreadsPerCore to 1 will print a warning for the compute nodes that >>>> the number of hardward cpu is not the same as the config's. So the >>>> compute nodes are correctly detecting the number of threads possible. >>>> >>>> How can I achieve this? >>>> >>>> Thanks! >>>> >>>> >>> >>> >>> >>> >> >> > > >
