Hi Alex! Please try this solution first, as I just went through this exact same issue over the past two weeks, and believe setting anything related to cores/sockets/threads will not work.
Leave; SelectType=select/cons_res SelectTypeParameters=CR_CPU But in the node configuration you need to only have the # of CPUs (virual+real threads), do not put anything about the number of sockets, threads or cores or else you will be limited to the number of cores (18); NodeName=node1 CPUs=32 RealMemory=128000 State=UNKNOWN This was the solution I got from the mailing list that eventually worked, and can be references in the slurm faq, #30 http://slurm.schedmd.com/faq.html [...] 30. Slurm documentation refers to CPUs, cores and threads. What exactly is considered a CPU? If your nodes are configured with hyperthreading, then a CPU is equivalent to a hyperthread. Otherwise a CPU is equivalent to a core. You can determine if your nodes have more than one thread per core using the command "scontrol show node" and looking at the values of "ThreadsPerCore". Also, I dont know your IO setup, but just remember if you are reading and/or writing small files then tossing more CPUs at it might not be beneficial as you can hit an IO wall. Best to run test cases for a varying number of CPUs runs and see how efficient it is. (unless your writing to independent drives seperatly) -ashton On Mon, Sep 12, 2016 at 7:55 AM, Uwe Sauter <uwe.sauter...@gmail.com> wrote: > > Also. CPUs=32 is wrong. You need > > Sockets=2 CoresPerSocket=8 ThreadsPerCore=2 > > Am 12.09.2016 um 16:02 schrieb alex straza: >> hello, >> >> We have some slurm nodes that have 32 CPUS - two 8-core processors with >> hyperthreading - and are trying to run some >> "embarrassingly parallel" jobs. However, no matter how we set values in >> slurm.conf, slurm will only run at most 16 jobs >> at a time on each machine. What are the right settings to use here? this >> is what we have at the moment in the config: >> >> SchedulerType=sched/builtin >> SelectType=select/cons_res >> SelectTypeParameters=CR_CPU >> >> NodeName=node1 CoresPerSocket=8 CPUs=32 RealMemory=128000 ThreadsPerCore=2 >> Sockets=2 State=UNKNOWN >> PartitionName=debug Nodes=node1 Default=YES MaxTime=INFINITE State=UP >> DefMemPerCPU=4000 MaxMemPerCPU=32000