We're running Slurm 2.6.5 with sched/backfill and the backfill
parameters set to the default values.

I'm noticing some behavior that confuses me and hoping someone has some insight.

Here's a simplified example. Suppose SLURM is managing a single
18-core machine and users submit the following jobs sequentially:

A: 12 cores, time limit of 300
B: 8 cores, time limit of 300
C: 2 cores, time limit of 500

A then starts, while B is pending because of "Resources".  Now I would
expect C to start based on backfilling since having it start won't
affect when B starts. B should be expected to start after 300 minutes
regardless of whether C starts. However, when I test this, SLURM has C
in pending state because of "Priority".  So in some sense it seems
that SLURM's plan for B is to use the 6 currently-free cores plus 2
cores being used by A, rather than recognizing that B could use only
cores being used by A without any change in the expected start time,
thereby allowing C to run.

Any insights that would help me understand this better and whether
there are any configuration changes that would avoid this happening?

thanks,
Chris

----------------------------------------------------------------------------------------------
Chris Paciorek

Statistical Computing Consultant
Statistical Computing Facility, Econometrics Laboratory, Berkeley
Research Computing

Office: 495 Evans Hall                      Email: pacio...@stat.berkeley.edu
Mailing Address:                            Voice: 510-842-6670
Department of Statistics                    Fax:   510-642-7892
367 Evans Hall                              Skype: cjpaciorek
University of California, Berkeley          WWW:
www.stat.berkeley.edu/~paciorek
Berkeley, CA 94720 USA                      Permanent forward:
pacio...@alumni.cmu.edu

Reply via email to