Hi,

I've found another bug in slurmctld that kills it with a fatal error.
I've solved it with the following patch but I'm not sure if it's the best
way to solve it, so I'm not pushing it to the git and sending it to the
list to discuss the best solution.
The problem is the same as the previous gres bug:
"fatal: cons_res: sync loop not progressing"
due to an error considering available gres resources.

--- slurm-2.3.1/src/common/gres.c    2011-10-24 19:15:42.000000000 +0200
+++ ../slurm-2.3.1/src/common/gres.c    2011-11-21 18:50:34.256761175 +0100
@@ -2509,6 +2509,12 @@
         return NO_VAL;
     } else if (job_gres_ptr->gres_cnt_alloc && node_gres_ptr->topo_cnt) {
         /* Need to determine which specific CPUs can be used */
+        gres_avail = node_gres_ptr->gres_cnt_avail;
+                if (!use_total_gres)
+                        gres_avail -= node_gres_ptr->gres_cnt_alloc;
+                if (job_gres_ptr->gres_cnt_alloc > gres_avail)
+                        return (uint32_t) 0;    /* insufficient, gres to
use */
+
         if (cpu_bitmap) {
             cpus_ctld = cpu_end_bit - cpu_start_bit + 1;
             if (cpus_ctld < 1) {


-- 
--
Carles Fenoy

Reply via email to