Re: [slurm-dev] Number of concurrent serial jobs

Nicolas Bigaouette Wed, 23 Mar 2011 10:32:32 -0700
Great, thanks again.

On Wed, Mar 23, 2011 at 9:44 AM,  <[email protected]> wrote:
> One more fine point. Your configuration of
>
> NodeName=node[69-71] RealMemory=23000 Procs=16 Sockets=2 CoresPerSocket=4
> ThreadsPerCore=2 State=UNKNOWN
>
> would support up to 16 tasks, but the logic avoids putting more than
> one job per core for performance reasons. Removing the Sockets/Cores/Threads
> specifications (like you did) eliminates this restriction. I'll try to
> clarify this in the documentation.
>
>
>
> Quoting [email protected]:
>
>> There are a couple of options that should help, see the man pages for
>> details: --cpus-per-task option and --exclusive.
>>
>>
>> Quoting Nicolas Bigaouette <[email protected]>:
>>
>>> Thanks for your suggestions.
>>>
>>> I did what you suggested. I restart the daemon using a gentoo init.d
>>> script,
>>> adapted from the included one (my work is on github @
>>> https://github.com/nbigaouette/ebuilds/tree/master/sys-cluster/slurm). It
>>> does kill the daemon before restarting it.
>>>
>>> "scontrol show node" did not show anything intersting:
>>> NodeName=node71 Arch=x86_64 CoresPerSocket=4
>>>  CPUAlloc=0 CPUErr=0 CPUTot=16 Features=(null)
>>>  Gres=(null)
>>>  OS=Linux RealMemory=23000 Sockets=2
>>>  State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1
>>>  BootTime=2011-03-22T19:46:32 SlurmdStartTime=2011-03-22T22:27:36
>>>  Reason=(null)
>>> Everything seems to be in order...
>>>
>>> I've set SelectTypeParameters=CR_CPU. But before trying it, I revisited
>>> the
>>> man page of slurm.conf:
>>>
>>>> CR_CPU CPUs are consumable resources.  There is *no notion of sockets,
>>>> cores or  threads*;
>>>>   *do  not  define  those  values  in the node specification*.  If these
>>>> are defined,
>>>>   unexpected results will happen when hyper-threading is enabled *Procs=
>>>> should  be
>>>>   used  instead.*  On a multi-core system, each core will be considered
>>>> a CPU.  On a
>>>>   multi-core and hyper-threaded system, each thread will be considered a
>>>> CPU.   On
>>>>   single-core systems, each CPUs will be considered a CPU.
>>>>
>>>
>>> I thus set:
>>> NodeName=node[69-71] RealMemory=23000 Procs=16 State=UNKNOWN
>>> restart everything, and now 16 jobs can runs at the same time on each
>>> node!
>>>
>>> Thanks for your support.
>>>
>>> I do have another question though. Is it possible to "reserve" more cpu
>>> then
>>> needed so that a job which is slowed down by HT can "reserve" a whole
>>> node
>>> while only running with 8 processes? I think I saw a previous email about
>>> a
>>> setting in the submission script, but can't find it anymore...
>>>
>>> Regards,
>>>
>>> Nicolas
>>>
>>>
>>>
>>> On Tue, Mar 22, 2011 at 11:05 PM, <[email protected]> wrote:
>>>
>>>> The only thing that comes to mind is explicitly configuring:
>>>> SelectTypeParameters=CR_CPUs
>>>> (that should be the default) and restarting the slurmctld
>>>> daemon (don't just run "scontrol reconfig" or send SIGHUP).
>>>>
>>>> I'd also execute "scontrol show node" to confirm that the
>>>> values are what you configured.
>>>>
>>>>
>>>>
>>>> Quoting Nicolas Bigaouette <[email protected]>:
>>>>
>>>> Hi Jette,
>>>>>
>>>>> Thank's for your ultra fast answer. Unfortunately (again), it does not
>>>>> affect the number of jobs running. I previously tried it without
>>>>> success.
>>>>> NodeName=node[69-71] RealMemory=23000 Procs=16 Sockets=2
>>>>> CoresPerSocket=4 ThreadsPerCore=2 State=UNKNOWN
>>>>>
>>>>> Could there be another limit somewhere? Here is my complete
>>>>>  configuration
>>>>> file:
>>>>> https://gist.github.com/882507
>>>>>
>>>>> Thanks
>>>>>
>>>>> Nicolas
>>>>>
>>>>> On Tue, Mar 22, 2011 at 10:25 PM,  <[email protected]> wrote:
>>>>>
>>>>>> Try adding "Procs=16" to the NodeName line.
>>>>>> By default, SLURM schedules one task per core/
>>>>>>
>>>>>> Quoting Nicolas Bigaouette <[email protected]>:
>>>>>>
>>>>>> Hi all,
>>>>>>>
>>>>>>> I want to be able to submit 16 serial jobs on my compute nodes at the
>>>>>>> same time since each node is 2 sockets, 4 core, hyperthreading. We
>>>>>>> see
>>>>>>> a speedup when saturating the node with 16 different serial jobs
>>>>>>> (launched manually) so I want to take advantage of this with slurm.
>>>>>>>
>>>>>>> I tough it would be easy...
>>>>>>>
>>>>>>> Unfortunately, I always get at most 8 jobs running on nodes.
>>>>>>>
>>>>>>> Here is the relevant (I think) part of /etc/slurm.conf:
>>>>>>> # SCHEDULING
>>>>>>> #DefMemPerCPU=0
>>>>>>> FastSchedule=1
>>>>>>> #MaxMemPerCPU=0
>>>>>>> #SchedulerRootFilter=1
>>>>>>> #SchedulerTimeSlice=30
>>>>>>> SchedulerType=sched/backfill
>>>>>>> SchedulerPort=7321
>>>>>>> SelectType=select/cons_res
>>>>>>> NodeName=node[69-71] RealMemory=23000 Sockets=2 CoresPerSocket=4
>>>>>>> ThreadsPerCore=2 State=UNKNOWN
>>>>>>> PartitionName=test         Nodes=node[69-71]
>>>>>>> MaxTime=INFINITE State=UP
>>>>>>>
>>>>>>> The logs don't show anything interesting. For example, setting
>>>>>>> ThreadsPerCore to 1 will print a warning for the compute nodes that
>>>>>>> the number of hardward cpu is not the same as the config's. So the
>>>>>>> compute nodes are correctly detecting the number of threads possible.
>>>>>>>
>>>>>>> How can I achieve this?
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>
>
>
>
Re: [slurm-dev] Number of concurrent serial jobs

Reply via email to