Dear developers,

I have found something looking like a bug in Slurm code (file
src/plugins/select/cons_res/job_test.c, _allocate_sc(...)).

For example I have 2 nodes:
GresTypes=gpu
NodeName=n[01-02] CPUs=20 Sockets=2 CoresPerSocket=10 ThreadsPerCore=1
Gres=gpu:kepler:2 RealMemory=64000 TmpDisk=16384 State=UNKNOWN
PartitionName=gpu   Nodes=n[01-02] Shared=NO  MaxCPUsPerNode=4
 Default=YES  MaxTime=INFINITE State=UP

gres.conf:
Name=gpu Type=kepler File=/dev/nvidia0 CPUs=0,1
Name=gpu Type=kepler File=/dev/nvidia1 CPUs=10,11

I use
SelectType=select/cons_res
SelectTypeParameters=CR_CPU

=> I have 4 GPU on the cluster.

Problem with the next: I would like submit 4 one-proces jobs, which each
need 1 GPU per jobs. Slurm runs 2 tasks, other tasks are pending. It is
fail.

Please see more carefully on the next code from file
src/plugins/select/cons_res/job_test.c, function: _allocate_sc(...):

/* Step 1: create and compute core-count-per-socket
* arrays and total core counts */
free_cores = xmalloc(sockets * sizeof(uint16_t));
used_cores = xmalloc(sockets * sizeof(uint16_t));
used_cpu_array = xmalloc(sockets * sizeof(uint32_t));

for (c = core_begin; c < core_end; c++) {         //Cycle 1.
i = (uint16_t) (c - core_begin) / cores_per_socket;
if (bit_test(core_map, c)) {
free_cores[i]++;
free_core_count++;
} else {
used_cores[i]++;                 //<-------Here can be error!!! (1 line)
}
if (part_core_map && bit_test(part_core_map, c))
used_cpu_array[i]++;
}

for (i = 0; i < sockets; i++) {       //Cycle 2.
/* if a socket is already in use and entire_sockets_only is
* enabled, it cannot be used by this job */
if (entire_sockets_only && used_cores[i]) {
free_core_count -= free_cores[i];
used_cores[i] += free_cores[i];
free_cores[i] = 0;
}
free_cpu_count += free_cores[i] * threads_per_core;
if (used_cpu_array[i])
used_cpu_count += used_cores[i] * threads_per_core;   //<----Here can be
error.  (2 line)
}
xfree(used_cores);
xfree(used_cpu_array);

/* Ignore resources that would push a job allocation over the
* partition CPU limit (if any) */
if ((job_ptr->part_ptr->max_cpus_per_node != INFINITE) &&
   (free_cpu_count + used_cpu_count >
    job_ptr->part_ptr->max_cpus_per_node)) {
int excess = free_cpu_count + used_cpu_count -
    job_ptr->part_ptr->max_cpus_per_node;
for (c = core_begin; c < core_end; c++) {
i = (uint16_t) (c - core_begin) / cores_per_socket;
if (free_cores[i] > 0) {
free_core_count--;
free_cores[i]--;
excess -= threads_per_core;
if (excess <= 0)
break;
}
}
}


I mark two lines, in which I think contain errors. Because when I use Gres,
line 1 is wrong, because some of this cores may be used or can
be forbidden by gres. Gres allow use only 0,1,10,11 cores, other cores are
forbidden. In this case after cylce 2 variable used_cpu_count be wrong, and
on the next if operator I can not allocate this node for job, because
((used_cpu_count is equal to 10) + (free_cpu_count is equal to 2) = 12) > 4
=> it is wrong!!!

Could this be a bug?

Best regards, Vova.

Reply via email to