Okay, Don.  Then I think we need to change the srun man page, which states 
that --cores-per-socket specifies how many cores to allocate from each 
socket.  Ditto for the other options.  Here is a suggested patch for 2.2.4 
to change the descriptions of these options. 
Regards,
Martin


Index: doc/man/man1/srun.1
===================================================================
RCS file: /cvsroot/slurm/slurm/doc/man/man1/srun.1,v
retrieving revision 1.1.1.53.2.2
diff -u -r1.1.1.53.2.2 srun.1
--- doc/man/man1/srun.1 1 Apr 2011 18:05:43 -0000       1.1.1.53.2.2
+++ doc/man/man1/srun.1 5 Apr 2011 20:48:41 -0000
@@ -163,9 +163,8 @@
 
 .TP
 \fB\-\-cores\-per\-socket\fR=<\fIcores\fR>
-Allocate the specified number of cores per socket. This may be used to 
avoid
-allocating more than one core per socket on multi\-core sockets. This 
option
-is used for job allocations, but ignored for job step allocations.
+Restrict node selection to nodes with at least the specified number of 
+cores per socket.
 
 .TP
 \fB\-\-cpu_bind\fR=[{\fIquiet,verbose\fR},]\fItype\fR
@@ -1029,9 +1028,8 @@
 
 .TP
 \fB\-\-sockets\-per\-node\fR=<\fIsockets\fR>
-Allocate the specified number of sockets per node. This may be used to 
avoid
-allocating more than one task per node on multi\-socket nodes. This 
option
-is used for job allocations, but ignored for job step allocations.
+Restrict node selection to nodes with at least the specified number of 
+sockets.
 
 .TP
 \fB\-T\fR, \fB\-\-threads\fR=<\fInthreads\fR>
@@ -1103,9 +1101,8 @@
 
 .TP
 \fB\-\-threads\-per\-core\fR=<\fIthreads\fR>
-Allocate the specified number of threads per core. This may be used to 
avoid
-allocating more than one task per core on hyper\-threaded nodes. This 
option
-is used for job allocations, but ignored for job step allocations.
+Restrict node selection to nodes with at least the specified number of 
+threads per core.
 
 .TP
 \fB\-\-tmp\fR=<\fIMB\fR>
 




"Lipari, Don" <[email protected]> 
Sent by: [email protected]
04/05/2011 01:11 PM
Please respond to
[email protected]


To
"[email protected]" <[email protected]>
cc

Subject
RE: [slurm-dev] srun --cores-per-socket and --sockets-per-node options






Martin,
 
The --sockets-per-node, --cores-per-socket, and --threads-per-core options 
should be considered minimum requirements for any node allocated to the 
job.  They are not directives for allocating resources to tasks.
 
Instead, use --ntasks-per-node, --ntasks-per-socket, and --ntasks-per-core 
to influence the allocation of specific resources to tasks.
 
Don
 
From: [email protected] 
[mailto:[email protected]] On Behalf Of [email protected]
Sent: Tuesday, March 29, 2011 11:08 AM
To: [email protected]
Cc: [email protected]; [email protected]
Subject: [slurm-dev] srun --cores-per-socket and --sockets-per-node 
options
 

The srun --cores-per-socket option does not appear to be working 
correctly. See the following example: 

SelectType=select/cons_res 
SelectTypeParameters=CR_Core 
NodeName=n6  NodeHostname=scotty NodeAddr=scotty Sockets=2 
CoresPerSocket=4 ThreadsPerCore=1 Procs=8 
NodeName=n7  NodeHostname=chekov NodeAddr=chekov Sockets=2 
CoresPerSocket=4 ThreadsPerCore=1 Procs=8 
NodeName=n8 NodeHostname=bones NodeAddr=bones Sockets=2 CoresPerSocket=4 
ThreadsPerCore=1 Procs=8 
PartitionName=bones-chekov-scotty  Nodes=n8,n7,n6  State=UP Default=YES 
PartitionName=bones-only  Nodes=n8  State=UP 

[sulu] (slurm) etc> srun -n6 --cores-per-socket=1 -l hostname | sort 
0: bones 
1: bones 
2: bones 
3: bones 
4: bones 
5: bones 

Given the "cores-per-socket=1" and 2 sockets on each node, I would expect 
slurm to allocate 2 cores on each of the three nodes.  Instead, it has 
allocated 6 cores on one node.   

The option also appears to produce incorrect results when using just one 
node, if --cpus-per-task > 1: 

[sulu] (slurm) etc> srun -p bones-only -n2 -c3 --cores-per-socket=3  ... 

In this case, instead of allocating 3 cores on each socket of node bones, 
Slurm allocates 4 cores on one socket and 2 on the other.  However, if I 
specify "-n6" instead of "-n2 -c3", Slurm does allocate 3 cores on each 
socket. 

The srun man page states that --cores-per-socket specifies the number of 
cores to be allocated per socket.  But the code in cons_res seems to treat 
it only as a constraint when determining whether a node can be used, not 
as the number of cores to be allocated on a socket.  So I'm a bit confused 
as to whether this really is a bug or whether the option is behaving as 
intended.  In the example with a single node, I don't understand why the 
behavior is different for "-n6" vs "-n2 -c3". 

There appears to be a similar problem with --sockets-per-node.   Are these 
real bugs, or am I misunderstanding the way these options are intended to 
work?  If they're real bugs, I'm willing to work on a fix. 

Regards, 
Martin

Reply via email to