On Wed, Aug 17, 2016 at 03:20:20AM -0700, Adrian Sevcenco wrote:

> 
> On 08/17/2016 08:41 AM, Paddy Doyle wrote:
> > 
> > Hi Adrian,
> Hi!
> 
> yeah, i thought that if i set CPUs=8 this will set that that machine
> have 8 job slots
> 
> > You should define the node as follows:
> > 
> > NodeName=localhost Sockets=1 CoresPerSocket=4 ThreadsPerCore=2
> ok, thanks for clarification!
> 
> so.. should i understand that this must be done for each node?
> if i have a very heterogenous cluster with 100+ nodes (machines with
> different ages) do i have to do it for each machine?

You can define defaults for the majority of homogeneous nodes, and then define
different properties for ranges of nodes, for example something like:

  NodeName=DEFAULT Sockets=2 CoresPerSocket=4 ThreadsPerCore=1
  NodeName=node[001-016] RealMemory=15948 Feature=switch1 Weight=10
  NodeName=node[017-032] RealMemory=32171 Feature=switch2 Weight=20
  NodeName=node[033-048] RealMemory=64489 Feature=switch3 Weight=30
  NodeName=node[049-064] RealMemory=15948 Feature=switch4 Weight=10 Sockets=1 
CoresPerSocket=4 ThreadsPerCore=2
  NodeName=node[065-080] RealMemory=15948 Feature=switch5 Weight=10 Sockets=2 
CoresPerSocket=6 ThreadsPerCore=1
  etc


For that you might want to run "slurmd -C" on each node to report back what it
has detected, and use those results as the basis for creating the slurm.conf
entries.

Alternatively use "FastSchedule=0" in your slurm.conf, and tell slurm to use the
values detected by slurmd, instead of defining each one.

> well, i could script it somehow (put an include in slurm.conf)
> and create some node_def.conf .. but how can i detect the actual
> configuration? does slurm have some detection tool or should i do it
> by hand with lscpu ? 
> and if i can do it by hand, is it possible to add some kind of plugin to
> slurm that can automatically obtain the configuration by running lscpu on 
> nodes?
> 
> why it is needed to create some configuration by hand when slurm
> already found out the correct configuration ?

It's a performance trade-off. With "FastSchedule=1" (the default), the scheduler
can simply look up values in a few records, rather than checking the many
individual records from each slurmd.

See the entries in "man slurm.conf" for more.

It's up to you whether you go with defining them all by hand or use
"FastSchedule=0".

> >> how can i define for my machine to run 8 jobs? also, for the next
> >> level, how can i set a job slot for each physical core (a job to
> >> run on the physical core + its ht partner)?
> > 
> > I don't quite understand what you're looking for here. You can
> > probably use some sbatch parameters to define what the job needs
> > (e.g. --threads-per-core=2) in an allocation.
> it does not need 2 threads .. this would be in order to not disable in bios
> the HT .. 
> if i define ThreadsPerCore=2 for node and use CR_Cores would this 
> automatically alocate
> the jobs per core (and so both threads will be associated with the job) ?

I'm not sure really. We generally turn it off in the BIOS because most of our
codes are floating point heavy, and so the hyperthreading hasn't been of benefit
(when we've benchmarked with and without). So I don't know what the best
strategy is here. There are some notes in the man page for slurm.conf about
hyperthreading which may point you in the right direction. You may have to
experiment a little.

Thanks,
Paddy

-- 
Paddy Doyle
Trinity Centre for High Performance Computing,
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
Phone: +353-1-896-3725
http://www.tchpc.tcd.ie/

Reply via email to