Hi Alex! Please try this solution first, as I just went through this
exact same issue over the past two weeks, and believe setting anything
related to cores/sockets/threads will not work.

Leave;
SelectType=select/cons_res
SelectTypeParameters=CR_CPU

But in the node configuration you need to only have the # of CPUs
(virual+real threads), do not put anything about the number of
sockets, threads or cores or else you will be limited to the number of
cores (18);

NodeName=node1 CPUs=32 RealMemory=128000  State=UNKNOWN

This was the solution I got from the mailing list that eventually
worked, and can be references in the slurm faq, #30
http://slurm.schedmd.com/faq.html
[...]
30.  Slurm documentation refers to CPUs, cores and threads. What exactly
is considered a CPU?
If your nodes are configured with hyperthreading, then a CPU is
equivalent to a hyperthread. Otherwise a CPU is equivalent to a core.
You can determine if your nodes have more than one thread per core using
the command "scontrol show node" and looking at the values of
"ThreadsPerCore".


Also, I dont know your IO setup, but just remember if you are reading
and/or writing small files then tossing more CPUs at it might not be
beneficial as you can hit an IO wall. Best to run test cases for a
varying number of CPUs runs and see how efficient it is. (unless your
writing to independent drives seperatly)

-ashton





On Mon, Sep 12, 2016 at 7:55 AM, Uwe Sauter <uwe.sauter...@gmail.com> wrote:
>
> Also. CPUs=32 is wrong. You need
>
> Sockets=2 CoresPerSocket=8 ThreadsPerCore=2
>
> Am 12.09.2016 um 16:02 schrieb alex straza:
>> hello,
>>
>> We have some slurm nodes that have 32 CPUS - two 8-core processors with 
>> hyperthreading - and are trying to run some
>> "embarrassingly parallel" jobs.  However, no matter how we set values in 
>> slurm.conf, slurm will only run at most 16 jobs
>> at a time on each machine.  What are the right settings to use here?  this 
>> is what we have at the moment in the config:
>>
>> SchedulerType=sched/builtin
>> SelectType=select/cons_res
>> SelectTypeParameters=CR_CPU
>>
>> NodeName=node1 CoresPerSocket=8 CPUs=32 RealMemory=128000 ThreadsPerCore=2 
>> Sockets=2 State=UNKNOWN
>> PartitionName=debug Nodes=node1 Default=YES MaxTime=INFINITE State=UP 
>> DefMemPerCPU=4000 MaxMemPerCPU=32000

Reply via email to