I have merged your patch to the version 2.6 branch of Slurm and merged that to the newer versions. We will probably not have any more releases of version 2.6. Your commit is here:
https://github.com/SchedMD/slurm/commit/d508ea95822050c5fa255ffd01711e7272293667

Thanks you for your contribution



Quoting Hongjia Cao <[email protected]>:

I found this in a cluster running Slurm 2.6.9, using select/linear. I
think the problem exists in newer versions also.

When there are completing nodes in a partition, the backfill loop may be
ended early: _try_sched() thinks the job can run immediately, while
select_nodes() cannot allocate nodes for it, returning
ESLURM_NODES_BUSY. The jobs in the queue will not be backfilled any
longer until the related job can be started or failed to backfill.

Reply via email to