Dear Developers,

when I use MaxCPUsPerNode options in partition definition with 2 GPU per
node (defined in gres) and SelectType=select/cons_res, Slurm allocate only
one GPU for job per node, another GPU is free, but Slurm can not allocate
it for jobs.

In man slurm.conf:

"MaxCPUsPerNode
              Maximum  number  of  CPUs on any node available to all jobs
from this parti-
              tion.  This can be especially useful to schedule GPUs. For
 example  a  node
              can  be  associated with two Slurm partitions (e.g. "cpu" and
"gpu") and the
              partition/queue "cpu" could be limited to only a subset of
the node’s  CPUs,
              insuring  that one or more CPUs would be available to jobs in
the "gpu" par-
              tition/queue."

Can You help me? Why option MaxCPUsPerNode is not working correctly with
SelectType=select/cons_res?

Best Regards, Vova.




2016-04-20 14:20 GMT+10:00 Vladimir Goy <[email protected]>:

> Dear developers,
>
> I have found something looking like a bug in Slurm code (file
> src/plugins/select/cons_res/job_test.c, _allocate_sc(...)).
>
> For example I have 2 nodes:
> GresTypes=gpu
> NodeName=n[01-02] CPUs=20 Sockets=2 CoresPerSocket=10 ThreadsPerCore=1
> Gres=gpu:kepler:2 RealMemory=64000 TmpDisk=16384 State=UNKNOWN
> PartitionName=gpu   Nodes=n[01-02] Shared=NO  MaxCPUsPerNode=4
>  Default=YES  MaxTime=INFINITE State=UP
>
> gres.conf:
> Name=gpu Type=kepler File=/dev/nvidia0 CPUs=0,1
> Name=gpu Type=kepler File=/dev/nvidia1 CPUs=10,11
>
> I use
> SelectType=select/cons_res
> SelectTypeParameters=CR_CPU
>
> => I have 4 GPU on the cluster.
>
> Problem with the next: I would like submit 4 one-proces jobs, which each
> need 1 GPU per jobs. Slurm runs 2 tasks, other tasks are pending. It is
> fail.
>
> Please see more carefully on the next code from file
> src/plugins/select/cons_res/job_test.c, function: _allocate_sc(...):
>
> /* Step 1: create and compute core-count-per-socket
> * arrays and total core counts */
> free_cores = xmalloc(sockets * sizeof(uint16_t));
> used_cores = xmalloc(sockets * sizeof(uint16_t));
> used_cpu_array = xmalloc(sockets * sizeof(uint32_t));
>
> for (c = core_begin; c < core_end; c++) {         //Cycle 1.
> i = (uint16_t) (c - core_begin) / cores_per_socket;
> if (bit_test(core_map, c)) {
> free_cores[i]++;
> free_core_count++;
> } else {
> used_cores[i]++;                 //<-------Here can be error!!! (1 line)
> }
> if (part_core_map && bit_test(part_core_map, c))
> used_cpu_array[i]++;
> }
>
> for (i = 0; i < sockets; i++) {       //Cycle 2.
> /* if a socket is already in use and entire_sockets_only is
> * enabled, it cannot be used by this job */
> if (entire_sockets_only && used_cores[i]) {
> free_core_count -= free_cores[i];
> used_cores[i] += free_cores[i];
> free_cores[i] = 0;
> }
> free_cpu_count += free_cores[i] * threads_per_core;
> if (used_cpu_array[i])
> used_cpu_count += used_cores[i] * threads_per_core;   //<----Here can be
> error.  (2 line)
> }
> xfree(used_cores);
> xfree(used_cpu_array);
>
> /* Ignore resources that would push a job allocation over the
> * partition CPU limit (if any) */
> if ((job_ptr->part_ptr->max_cpus_per_node != INFINITE) &&
>    (free_cpu_count + used_cpu_count >
>     job_ptr->part_ptr->max_cpus_per_node)) {
> int excess = free_cpu_count + used_cpu_count -
>     job_ptr->part_ptr->max_cpus_per_node;
> for (c = core_begin; c < core_end; c++) {
> i = (uint16_t) (c - core_begin) / cores_per_socket;
> if (free_cores[i] > 0) {
> free_core_count--;
> free_cores[i]--;
> excess -= threads_per_core;
> if (excess <= 0)
> break;
> }
> }
> }
>
>
> I mark two lines, in which I think contain errors. Because when I use
> Gres, line 1 is wrong, because some of this cores may be used or can
> be forbidden by gres. Gres allow use only 0,1,10,11 cores, other cores are
> forbidden. In this case after cylce 2 variable used_cpu_count be wrong, and
> on the next if operator I can not allocate this node for job, because
> ((used_cpu_count is equal to 10) + (free_cpu_count is equal to 2) = 12) > 4
> => it is wrong!!!
>
> Could this be a bug?
>
> Best regards, Vova.
>
>

Reply via email to