For what it's worth, we have a similar setup, with one crucial
difference: we are handing out physical cores to jobs, not hyperthreads,
and we are *not* seeing this behaviour:

$ srun --cpus-per-task=1 -t 10 --mem-per-cpu=1g -A nn9999k -q devel echo foo
srun: job 5371678 queued and waiting for resources
srun: job 5371678 has been allocated resources
foo
$ srun --cpus-per-task=3 -t 10 --mem-per-cpu=1g -A nn9999k -q devel echo foo
srun: job 5371680 queued and waiting for resources
srun: job 5371680 has been allocated resources
foo

We have

SelectType=select/cons_tres
SelectTypeParameters=CR_CPU_Memory

and node definitions like

NodeName=DEFAULT CPUs=40 Sockets=2 CoresPerSocket=20 ThreadsPerCore=2 
RealMemory=182784 Gres=localscratch:330G Weight=1000

(so we set CPUs to the number of *physical cores*, not *hyperthreads*).

-- 
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo

Attachment: signature.asc
Description: PGP signature

Reply via email to