Re: [slurm-dev] Number of concurrent serial jobs

Nicolas Bigaouette Tue, 22 Mar 2011 20:27:55 -0700

Thanks for your suggestions.

I did what you suggested. I restart the daemon using a gentoo init.d script,
adapted from the included one (my work is on github @
https://github.com/nbigaouette/ebuilds/tree/master/sys-cluster/slurm). It
does kill the daemon before restarting it.


"scontrol show node" did not show anything intersting:
NodeName=node71 Arch=x86_64 CoresPerSocket=4
   CPUAlloc=0 CPUErr=0 CPUTot=16 Features=(null)
   Gres=(null)
   OS=Linux RealMemory=23000 Sockets=2
   State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1
   BootTime=2011-03-22T19:46:32 SlurmdStartTime=2011-03-22T22:27:36
   Reason=(null)
Everything seems to be in order...

I've set SelectTypeParameters=CR_CPU. But before trying it, I revisited the
man page of slurm.conf:

> CR_CPU CPUs are consumable resources.  There is *no notion of sockets,
> cores or  threads*;
>     *do  not  define  those  values  in the node specification*.  If these
> are defined,
>     unexpected results will happen when hyper-threading is enabled *Procs=
> should  be
>     used  instead.*  On a multi-core system, each core will be considered
> a CPU.  On a
>     multi-core and hyper-threaded system, each thread will be considered a
> CPU.   On
>     single-core systems, each CPUs will be considered a CPU.
>

I thus set:
NodeName=node[69-71] RealMemory=23000 Procs=16 State=UNKNOWN
restart everything, and now 16 jobs can runs at the same time on each node!

Thanks for your support.

I do have another question though. Is it possible to "reserve" more cpu then
needed so that a job which is slowed down by HT can "reserve" a whole node
while only running with 8 processes? I think I saw a previous email about a
setting in the submission script, but can't find it anymore...

Regards,

Nicolas



On Tue, Mar 22, 2011 at 11:05 PM, <[email protected]> wrote:

> The only thing that comes to mind is explicitly configuring:
> SelectTypeParameters=CR_CPUs
> (that should be the default) and restarting the slurmctld
> daemon (don't just run "scontrol reconfig" or send SIGHUP).
>
> I'd also execute "scontrol show node" to confirm that the
> values are what you configured.
>
>
>
> Quoting Nicolas Bigaouette <[email protected]>:
>
>  Hi Jette,
>>
>> Thank's for your ultra fast answer. Unfortunately (again), it does not
>> affect the number of jobs running. I previously tried it without
>> success.
>> NodeName=node[69-71] RealMemory=23000 Procs=16 Sockets=2
>> CoresPerSocket=4 ThreadsPerCore=2 State=UNKNOWN
>>
>> Could there be another limit somewhere? Here is my complete  configuration
>> file:
>> https://gist.github.com/882507
>>
>> Thanks
>>
>> Nicolas
>>
>> On Tue, Mar 22, 2011 at 10:25 PM,  <[email protected]> wrote:
>>
>>> Try adding "Procs=16" to the NodeName line.
>>> By default, SLURM schedules one task per core/
>>>
>>> Quoting Nicolas Bigaouette <[email protected]>:
>>>
>>>  Hi all,
>>>>
>>>> I want to be able to submit 16 serial jobs on my compute nodes at the
>>>> same time since each node is 2 sockets, 4 core, hyperthreading. We see
>>>> a speedup when saturating the node with 16 different serial jobs
>>>> (launched manually) so I want to take advantage of this with slurm.
>>>>
>>>> I tough it would be easy...
>>>>
>>>> Unfortunately, I always get at most 8 jobs running on nodes.
>>>>
>>>> Here is the relevant (I think) part of /etc/slurm.conf:
>>>> # SCHEDULING
>>>> #DefMemPerCPU=0
>>>> FastSchedule=1
>>>> #MaxMemPerCPU=0
>>>> #SchedulerRootFilter=1
>>>> #SchedulerTimeSlice=30
>>>> SchedulerType=sched/backfill
>>>> SchedulerPort=7321
>>>> SelectType=select/cons_res
>>>> NodeName=node[69-71] RealMemory=23000 Sockets=2 CoresPerSocket=4
>>>> ThreadsPerCore=2 State=UNKNOWN
>>>> PartitionName=test         Nodes=node[69-71]
>>>> MaxTime=INFINITE State=UP
>>>>
>>>> The logs don't show anything interesting. For example, setting
>>>> ThreadsPerCore to 1 will print a warning for the compute nodes that
>>>> the number of hardward cpu is not the same as the config's. So the
>>>> compute nodes are correctly detecting the number of threads possible.
>>>>
>>>> How can I achieve this?
>>>>
>>>> Thanks!
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>
>
>

Re: [slurm-dev] Number of concurrent serial jobs

Reply via email to