This patch allows us to submit jobs with min_nodes < max_nodes and
num_cpus < max_cpus but it breaks down when num_nodes <= num_cpus.
partition d1 has 16 nodes, each with 16 cpus and we are using
SelectType=select/cons_res
SelectTypeParameters=CR_CPU
sbatch -p d1 -N15-16 -c 4
The above allocates 16 nodes when available but
sbatch -p d1 -N4-16 -c 4
only allocates 4 nodes even if more are available.
--- slurm-14.03.6/src/sbatch/opt.c 2014-07-17 06:48:18.000000000 +0800
+++ slurm-14.03.6.new/src/sbatch/opt.c 2014-07-17 08:16:39.000000000 +0800
@@ -2403,9 +2403,7 @@
}
/* massage the numbers */
- if ((opt.nodes_set || opt.extra_set) &&
- ((opt.min_nodes == opt.max_nodes) || (opt.max_nodes == 0)) &&
- !opt.ntasks_set) {
+ if (!opt.ntasks_set && (opt.nodes_set || opt.extra_set)) {
/* 1 proc / node default */
opt.ntasks = MAX(opt.min_nodes, 1);
diff -Nur -x .deps -x Makefile -x .libs
slurm-14.03.6/src/slurmctld/node_scheduler.c
slurm-14.03.6.new/src/slurmctld/node_scheduler.c
--- slurm-14.03.6/src/slurmctld/node_scheduler.c 2014-07-17
06:48:18.000000000 +0800
+++ slurm-14.03.6.new/src/slurmctld/node_scheduler.c 2014-07-17
08:11:06.000000000 +0800
@@ -843,7 +843,7 @@
}
feature_bitmap = NULL;
min_nodes = feat_ptr->count;
- req_nodes = feat_ptr->count;
+ req_nodes = MAX(feat_ptr->count, max_nodes);
job_ptr->details->min_nodes = feat_ptr->count;
job_ptr->details->min_cpus = feat_ptr->count;
if (*preemptee_job_list) {
On Wed, 2014-07-16 at 03:08 -0700, Franco Broi wrote:
> Hi
>
> Been looking into this a bit more and it seems that part of the problem
> is in sbatch where it modifies the ntasks value.
>
> src/sbatch/opt.c" line 2406
>
>
> /* massage the numbers */
> if ((opt.nodes_set || opt.extra_set) &&
> ((opt.min_nodes == opt.max_nodes) || (opt.max_nodes == 0)) &&
> !opt.ntasks_set) {
> /* 1 proc / node default */
> opt.ntasks = MAX(opt.min_nodes, 1);
>
> If I remove the check for opt.min_nodes == opt.max_nodes, my job works.
>
> I also made a change in src/slurmctld/node_scheduler.c at line 846 to
> set req_nodes = to max_nodes instead of min_nodes but I'm not sure that
> does anything, it just looked wrong. I'll change it back tomorrow and
> see if my job still works.
>
> This is the command that would normally fail but now works, d1 has 16
> nodes each with 16 cores and I'm using con_res with CR_CPU.
>
> sbatch -p d1 -N15-16 -c 4
>
> but, any value of min_cpu <= num_cpus only allocates 4 nodes, -N5-16
> gives me 16 nodes - weird!
>
> Cheers,
>
>
> On Mon, 2014-06-16 at 17:48 -0700, Franco Broi wrote:
> >
> > You can't currently submit a job with -Nmin<max:max and -c < all cpus,
> > you get a bad constraints error.
> >
> > A few people have reported this bug over the past several months but I
> > haven't seen an mention of a fix.
> >
> > Cheers,