The problem was conflicting logic in the select/cons_res plugin. Some  
of the code was trying to get the job the maximum node count in the  
range while other logic was trying to minimize spreading out of the  
job across multiple switches. As you note, this problem only happens  
when a range of node counts is specified and the select/cons_res  
plugin and the topology/tree plugin and even then it is not easy to  
reproduce (you included all of the details below). This patch seems to  
fix the problem and will be in SLURM v2.3.5.

diff --git a/src/plugins/select/cons_res/job_test.c  
b/src/plugins/select/cons_res/job_test.c
index 700bf91..e00a483 100644
--- a/src/plugins/select/cons_res/job_test.c
+++ b/src/plugins/select/cons_res/job_test.c
@@ -1570,6 +1570,12 @@ static int _eval_nodes_topo(struct job_record  
*job_ptr, bitstr_t *bitmap,
                 goto fini;
         }
         bit_and(avail_nodes_bitmap, switches_bitmap[best_fit_inx]);
+       if ((min_nodes <  req_nodes) &&
+           (min_nodes <= switches_node_cnt[best_fit_inx])) {
+               /* If job specifies a range of node counts, then allocate
+                * resources with a minimal switch configuration */
+               rem_nodes = switches_node_cnt[best_fit_inx];
+       }

         /* Identify usable leafs (within higher switch having best fit) */
         for (j=0; j<switch_record_cnt; j++) {


Quoting [email protected]:

> Certain combinations of topology configuration and srun -N option produce
> spurious job rejection with "Requested node configuration is not
> available" with select/cons_res. The following example illustrates the
> problem.
>
> [sulu] (slurm) etc> cat slurm.conf
> ...
> TopologyPlugin=topology/tree
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core
> ...
>
> [sulu] (slurm) etc> cat topology.conf
> SwitchName=s1 Nodes=xna[13-26]
> SwitchName=s2 Nodes=xna[41-45]
> SwitchName=s3 Switches=s[1-2]
>
> [sulu] (slurm) etc> sinfo
> PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
> ...
> jkob         up   infinite      4   idle xna[14,19-20,41]
> ...
>
> [sulu] (slurm) etc> srun -N 2-4 -n 4 -p jkob hostname
> srun: Force Terminated job 79
> srun: error: Unable to allocate resources: Requested node configuration is
> not available
>
> The problem does not occur with select/linear, or topology/none, or if -N
> is omitted, or for certain other values for -N (for example, -N 4-4 and -N
> 2-3 work ok). The problem seems to be in function _eval_nodes_topo in
> src/plugins/select/cons_res/job_test.c. The srun man page states that when
> -N is used, "the job will be allocated as many nodes as possible within
> the range specified and without delaying the initiation of the job."
> Consistent with this description, the requested number of nodes in the
> above example is 4 (req_nodes=4).  However, the code that selects the
> best-fit topology switches appears to make the selection based on the
> minimum required number of nodes (min_nodes=2). It therefore selects
> switch s1.  s1 has only 3 nodes from partition jkob. Since this is fewer
> than req_nodes the job is rejected with the "node configuration" error.
>
> I'm not sure where the code is going wrong.  It could be in the
> calculation of the number of needed nodes in function _enough_nodes.  Or
> it could be in the code that initializes/updates req_nodes or rem_nodes. I
> don't feel confident that I understand the logic well enough to propose a
> fix without introducing a regression.
>
> Regards,
> Martin
>

Reply via email to