We're running Slurm 2.6.5 with sched/backfill and the backfill parameters set to the default values.
I'm noticing some behavior that confuses me and hoping someone has some insight. Here's a simplified example. Suppose SLURM is managing a single 18-core machine and users submit the following jobs sequentially: A: 12 cores, time limit of 300 B: 8 cores, time limit of 300 C: 2 cores, time limit of 500 A then starts, while B is pending because of "Resources". Now I would expect C to start based on backfilling since having it start won't affect when B starts. B should be expected to start after 300 minutes regardless of whether C starts. However, when I test this, SLURM has C in pending state because of "Priority". So in some sense it seems that SLURM's plan for B is to use the 6 currently-free cores plus 2 cores being used by A, rather than recognizing that B could use only cores being used by A without any change in the expected start time, thereby allowing C to run. Any insights that would help me understand this better and whether there are any configuration changes that would avoid this happening? thanks, Chris ---------------------------------------------------------------------------------------------- Chris Paciorek Statistical Computing Consultant Statistical Computing Facility, Econometrics Laboratory, Berkeley Research Computing Office: 495 Evans Hall Email: pacio...@stat.berkeley.edu Mailing Address: Voice: 510-842-6670 Department of Statistics Fax: 510-642-7892 367 Evans Hall Skype: cjpaciorek University of California, Berkeley WWW: www.stat.berkeley.edu/~paciorek Berkeley, CA 94720 USA Permanent forward: pacio...@alumni.cmu.edu