[slurm-dev] Re: Bug report from Vladivostok.

jette Thu, 05 May 2016 15:15:12 -0700

Marco's patch has been committed to the Slurm version 16.05 code base inthis commit:

https://github.com/SchedMD/slurm/commit/70aafa68b19a1d6819f1823ebdc0c1c103f2c9b6


Thank you for your contribution.


On 2016-05-04 05:08, Marco Ehlert wrote:

Hi Vova,

some weeks ago I proposed a bug fix for this problem.

https://groups.google.com/forum/#!searchin/slurm-devel/marco$20ehlert/slurm-devel/CRsW-eiUfms/MI2aAL4UGwAJ

But this solves only half of the problem if you are using distinct
partitioning of cores like this:

NodeName=n[01-02] CPUs=20 Sockets=2 CoresPerSocket=10 ThreadsPerCore=1
Gres=gpu:4
PartitionName=gpu Nodes=n[01-02] Shared=NO MaxCPUsPerNode=4
PartitionName=cpu Nodes=n[01-02] Shared=NO MaxCPUsPerNode=16

and bind concrete cpu cores to GPU cards in gres.conf.

Didn't understand whether this is the configuration you are using, but
for me it is the case. Slurm does not manage the core maps for the two
distinct partitions on the same node correctly. At some point slurm
gets confused, I saw definitely wrong core_map/part_coremap variables.

That's why I decided not to use "CPUs" arguments in gres.conf but set
the core map in a prolog script depending on the partition choosen by
the job.
Job scripts are reading this core map and are then setting the taskset
by themselves.
Maybe it is not the best work around, but it is working to some
satisfaction at least.
It ends up in a difference between the picture of used/non-used cores
slurm has of and the actual core map in use. But it works anyway,
because the cores are correctly counted now.

Best,
Marco

On Wed, 4 May 2016, Vladimir Goy wrote:
Dear Developers, when I use MaxCPUsPerNode options in partitiondefinition with 2 GPU per node (defined in gres) andSelectType=select/cons_res, Slurm allocate only one GPUfor job per node, another GPU is free, but Slurm can not allocate itfor jobs.
In man slurm.conf:

"MaxCPUsPerNode
Maximum number of CPUs on any node available to alljobs from this parti-tion. This can be especially useful to schedule GPUs.For example a nodecan be associated with two Slurm partitions (e.g."cpu" and "gpu") and thepartition/queue "cpu" could be limited to only a subsetof the node?s CPUs,insuring that one or more CPUs would be available tojobs in the "gpu" par-
              tition/queue."
Can You help me? Why option MaxCPUsPerNode is not working correctlywith SelectType=select/cons_res?
Best Regards, Vova.




2016-04-20 14:20 GMT+10:00 Vladimir Goy <[email protected]>:
      Dear developers,
I have found something looking like a bug in Slurm code (filesrc/plugins/select/cons_res/job_test.c, _allocate_sc(...)).
For example I have 2 nodes:GresTypes=gpu
NodeName=n[01-02] CPUs=20 Sockets=2 CoresPerSocket=10 ThreadsPerCore=1Gres=gpu:kepler:2 RealMemory=64000 TmpDisk=16384 State=UNKNOWNPartitionName=gpu Nodes=n[01-02] Shared=NO MaxCPUsPerNode=4Default=YES MaxTime=INFINITE State=UP
gres.conf:
Name=gpu Type=kepler File=/dev/nvidia0 CPUs=0,1
Name=gpu Type=kepler File=/dev/nvidia1 CPUs=10,11

I use
SelectType=select/cons_res
SelectTypeParameters=CR_CPU

=> I have 4 GPU on the cluster.
Problem with the next: I would like submit 4 one-proces jobs, whicheach need 1 GPU per jobs. Slurm runs 2 tasks, other tasks are pending.It is fail.
Please see more carefully on the next code from filesrc/plugins/select/cons_res/job_test.c, function: _allocate_sc(...):
/* Step 1: create and compute core-count-per-socket
* arrays and total core counts */
free_cores = xmalloc(sockets * sizeof(uint16_t));
used_cores = xmalloc(sockets * sizeof(uint16_t));
used_cpu_array = xmalloc(sockets * sizeof(uint32_t));

for (c = core_begin; c < core_end; c++) {         //Cycle 1.
i = (uint16_t) (c - core_begin) / cores_per_socket;
if (bit_test(core_map, c)) {
free_cores[i]++;
free_core_count++;
} else {
used_cores[i]++; //<-------Here can be error!!! (1line)
}
if (part_core_map && bit_test(part_core_map, c))
used_cpu_array[i]++;
}

for (i = 0; i < sockets; i++) {       //Cycle 2.
/* if a socket is already in use and entire_sockets_only is
* enabled, it cannot be used by this job */
if (entire_sockets_only && used_cores[i]) {
free_core_count -= free_cores[i];
used_cores[i] += free_cores[i];
free_cores[i] = 0;
}
free_cpu_count += free_cores[i] * threads_per_core;
if (used_cpu_array[i])
used_cpu_count += used_cores[i] * threads_per_core; //<----Here canbe error. (2 line)
}
xfree(used_cores);
xfree(used_cpu_array);

/* Ignore resources that would push a job allocation over the
* partition CPU limit (if any) */
if ((job_ptr->part_ptr->max_cpus_per_node != INFINITE) &&
   (free_cpu_count + used_cpu_count >
    job_ptr->part_ptr->max_cpus_per_node)) {
int excess = free_cpu_count + used_cpu_count -
    job_ptr->part_ptr->max_cpus_per_node;
for (c = core_begin; c < core_end; c++) {
i = (uint16_t) (c - core_begin) / cores_per_socket;
if (free_cores[i] > 0) {
free_core_count--;
free_cores[i]--;
excess -= threads_per_core;
if (excess <= 0)
break;
}
}
}
I mark two lines, in which I think contain errors. Because when I useGres, line 1 is wrong, because some of this cores may be used or canbe forbidden by gres. Gres allow use only 0,1,10,11 cores, other coresare forbidden. In this case after cylce 2 variable used_cpu_count bewrong, andon the next if operator I can not allocate this node for job, because((used_cpu_count is equal to 10) + (free_cpu_count is equal to 2) =12) > 4 => it
is wrong!!!

Could this be a bug?

Best regards, Vova.

[slurm-dev] Re: Bug report from Vladivostok.

Reply via email to