Re: [slurm-dev] Number of concurrent serial jobs

jette Tue, 22 Mar 2011 20:32:07 -0700

There are a couple of options that should help, see the man pages for
details: --cpus-per-task option and --exclusive.



Quoting Nicolas Bigaouette <[email protected]>:

Thanks for your suggestions.

I did what you suggested. I restart the daemon using a gentoo init.d script,
adapted from the included one (my work is on github @
https://github.com/nbigaouette/ebuilds/tree/master/sys-cluster/slurm). It
does kill the daemon before restarting it.

"scontrol show node" did not show anything intersting:
NodeName=node71 Arch=x86_64 CoresPerSocket=4
   CPUAlloc=0 CPUErr=0 CPUTot=16 Features=(null)
   Gres=(null)
   OS=Linux RealMemory=23000 Sockets=2
   State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1
   BootTime=2011-03-22T19:46:32 SlurmdStartTime=2011-03-22T22:27:36
   Reason=(null)
Everything seems to be in order...

I've set SelectTypeParameters=CR_CPU. But before trying it, I revisited the
man page of slurm.conf:

CR_CPU CPUs are consumable resources.  There is *no notion of sockets,
cores or  threads*;
    *do  not  define  those  values  in the node specification*.  If these
are defined,
    unexpected results will happen when hyper-threading is enabled *Procs=
should  be
    used  instead.*  On a multi-core system, each core will be considered
a CPU.  On a
    multi-core and hyper-threaded system, each thread will be considered a
CPU.   On
    single-core systems, each CPUs will be considered a CPU.


I thus set:
NodeName=node[69-71] RealMemory=23000 Procs=16 State=UNKNOWN
restart everything, and now 16 jobs can runs at the same time on each node!

Thanks for your support.

I do have another question though. Is it possible to "reserve" more cpu then
needed so that a job which is slowed down by HT can "reserve" a whole node
while only running with 8 processes? I think I saw a previous email about a
setting in the submission script, but can't find it anymore...

Regards,

Nicolas



On Tue, Mar 22, 2011 at 11:05 PM, <[email protected]> wrote:

The only thing that comes to mind is explicitly configuring:
SelectTypeParameters=CR_CPUs
(that should be the default) and restarting the slurmctld
daemon (don't just run "scontrol reconfig" or send SIGHUP).

I'd also execute "scontrol show node" to confirm that the
values are what you configured.



Quoting Nicolas Bigaouette <[email protected]>:

 Hi Jette,


Thank's for your ultra fast answer. Unfortunately (again), it does not
affect the number of jobs running. I previously tried it without
success.
NodeName=node[69-71] RealMemory=23000 Procs=16 Sockets=2
CoresPerSocket=4 ThreadsPerCore=2 State=UNKNOWN

Could there be another limit somewhere? Here is my complete  configuration
file:
https://gist.github.com/882507

Thanks

Nicolas

On Tue, Mar 22, 2011 at 10:25 PM,  <[email protected]> wrote:

Try adding "Procs=16" to the NodeName line.
By default, SLURM schedules one task per core/

Quoting Nicolas Bigaouette <[email protected]>:

 Hi all,


I want to be able to submit 16 serial jobs on my compute nodes at the
same time since each node is 2 sockets, 4 core, hyperthreading. We see
a speedup when saturating the node with 16 different serial jobs
(launched manually) so I want to take advantage of this with slurm.

I tough it would be easy...

Unfortunately, I always get at most 8 jobs running on nodes.

Here is the relevant (I think) part of /etc/slurm.conf:
# SCHEDULING
#DefMemPerCPU=0
FastSchedule=1
#MaxMemPerCPU=0
#SchedulerRootFilter=1
#SchedulerTimeSlice=30
SchedulerType=sched/backfill
SchedulerPort=7321
SelectType=select/cons_res
NodeName=node[69-71] RealMemory=23000 Sockets=2 CoresPerSocket=4
ThreadsPerCore=2 State=UNKNOWN
PartitionName=test         Nodes=node[69-71]
MaxTime=INFINITE State=UP

The logs don't show anything interesting. For example, setting
ThreadsPerCore to 1 will print a warning for the compute nodes that
the number of hardward cpu is not the same as the config's. So the
compute nodes are correctly detecting the number of threads possible.

How can I achieve this?

Thanks!

Re: [slurm-dev] Number of concurrent serial jobs

Reply via email to