The srun --cores-per-socket option does not appear to be working 
correctly. See the following example:

SelectType=select/cons_res
SelectTypeParameters=CR_Core
NodeName=n6  NodeHostname=scotty NodeAddr=scotty Sockets=2 
CoresPerSocket=4 ThreadsPerCore=1 Procs=8
NodeName=n7  NodeHostname=chekov NodeAddr=chekov Sockets=2 
CoresPerSocket=4 ThreadsPerCore=1 Procs=8
NodeName=n8 NodeHostname=bones NodeAddr=bones Sockets=2 CoresPerSocket=4 
ThreadsPerCore=1 Procs=8
PartitionName=bones-chekov-scotty  Nodes=n8,n7,n6  State=UP Default=YES
PartitionName=bones-only  Nodes=n8  State=UP

[sulu] (slurm) etc> srun -n6 --cores-per-socket=1 -l hostname | sort
0: bones
1: bones
2: bones
3: bones
4: bones
5: bones

Given the "cores-per-socket=1" and 2 sockets on each node, I would expect 
slurm to allocate 2 cores on each of the three nodes.  Instead, it has 
allocated 6 cores on one node. 

The option also appears to produce incorrect results when using just one 
node, if --cpus-per-task > 1:

[sulu] (slurm) etc> srun -p bones-only -n2 -c3 --cores-per-socket=3  ...

In this case, instead of allocating 3 cores on each socket of node bones, 
Slurm allocates 4 cores on one socket and 2 on the other.  However, if I 
specify "-n6" instead of "-n2 -c3", Slurm does allocate 3 cores on each 
socket.

The srun man page states that --cores-per-socket specifies the number of 
cores to be allocated per socket.  But the code in cons_res seems to treat 
it only as a constraint when determining whether a node can be used, not 
as the number of cores to be allocated on a socket.  So I'm a bit confused 
as to whether this really is a bug or whether the option is behaving as 
intended.  In the example with a single node, I don't understand why the 
behavior is different for "-n6" vs "-n2 -c3".

There appears to be a similar problem with --sockets-per-node.   Are these 
real bugs, or am I misunderstanding the way these options are intended to 
work?  If they're real bugs, I'm willing to work on a fix.

Regards,
Martin

Reply via email to