Unfortunately, my previous patch for this problem breaks the
--ntasks-per-node option. Here is a corrected version.
Regards,
Martin
Index: src/plugins/select/cons_res/job_test.c
===================================================================
RCS file: /cvsroot/slurm/slurm/src/plugins/select/cons_res/job_test.c,v
retrieving revision 1.1.1.28
diff -u -r1.1.1.28 job_test.c
--- src/plugins/select/cons_res/job_test.c 3 Mar 2011 19:18:10 -0000
1.1.1.28
+++ src/plugins/select/cons_res/job_test.c 8 Apr 2011 22:08:29 -0000
@@ -512,7 +512,8 @@
} else {
j = avail_cpus / cpus_per_task;
num_tasks = MIN(num_tasks, j);
- avail_cpus = num_tasks * cpus_per_task;
+ if (job_ptr->details->ntasks_per_node)
+ avail_cpus = num_tasks * cpus_per_task;
}
if ((job_ptr->details->ntasks_per_node &&
Martin Perry/US/BULL
04/08/2011 02:15 PM
To
[email protected]
cc
[email protected], "[email protected]"
<[email protected]>
Subject
cons_res core allocation problem
With cons_res, the default method for allocating cores within nodes is
cyclic allocation across sockets, as shown in the following examples.
Node bones (n8) CPU layout (Slurm numbering):
Socket 0: CPU_IDs 0,1,2,3
Socket 1: CPU_IDs 4,5,6,7
SelectType=select/cons_res
SelectTypeParameters=CR_Core
[sulu] (slurm) etc> srun -p bones-only -n6 -c1 scontrol --details show job
| grep CPU_IDs
...
Nodes=n8 CPU_IDs=0-2,4-6 Mem=0
[sulu] (slurm) etc> srun -p bones-only -n3 -c2 scontrol --details show job
| grep CPU_IDs
...
Nodes=n8 CPU_IDs=0-2,4-6 Mem=0
However, for certain combinations of node layout and -cpus-per-task > 1,
the default is not honored. Slurm uses block allocation instead:
[sulu] (slurm) etc> srun -p bones-only -n2 -c3 scontrol --details show job
| grep CPU_IDs
...
Nodes=n8 CPU_IDs=0-5 Mem=0
The problem appears to be in function _allocate_cores in
src/plugins/select/cons_res/job_test.c. It sometimes returns an incorrect
value for the number of CPUs that can be used on the node. The patch below
fixes the problem in 2.2.4.
Regards,
Martin
Index: src/plugins/select/cons_res/job_test.c
===================================================================
RCS file: /cvsroot/slurm/slurm/src/plugins/select/cons_res/job_test.c,v
retrieving revision 1.1.1.28
diff -u -r1.1.1.28 job_test.c
--- src/plugins/select/cons_res/job_test.c 3 Mar 2011 19:18:10 -0000
1.1.1.28
+++ src/plugins/select/cons_res/job_test.c 8 Apr 2011 20:37:29 -0000
@@ -508,11 +508,10 @@
num_tasks = MIN(num_tasks,
job_ptr->details->ntasks_per_node);
if (cpus_per_task < 2) {
- avail_cpus = num_tasks;
+ num_tasks = avail_cpus;
} else {
j = avail_cpus / cpus_per_task;
num_tasks = MIN(num_tasks, j);
- avail_cpus = num_tasks * cpus_per_task;
}
if ((job_ptr->details->ntasks_per_node &&