All of the logic is in the backfill. The job_scheduler.c module isn't used until the backfill scheduler decides to start the job.

You can probably get the information that you need by executing
"scontrol setdebugflags +backfill" and later
"scontrol setdebugflags \-backfill"
Note: This will generate lots of log messages in the SlurmctldLogFile so you don't want to keep this running. I'd suggest capturing the squeue output and "scontrol show job" to help with the analysis.



Quoting Martins Innus <[email protected]>:


Moe,

Thanks, sorry I probably should have been more verbose. My understanding was that some amount of scheduling takes place in:

src/slurmctld/job_scheduler.c

independent of the backfill plugin.  Specifically in:

extern int schedule(uint32_t job_limit)

If we have a job, "job1" that is the highest priority job in the queue, but cannot run due to available resources, the backfill scheduler should not affect "job1" expected start time. My expectation would be that job_scheduler.c somehow communicates the reserved resources for job1 to the backfill plugin. Let me know if I am completely off base and job_scheduler.c is not involved at all.

We are seeing some scheduling abnormalities that we cannot reliably reproduce, and I am just trying to instrument the code in appropriate places to try to figure out what is going on.

Thanks

Martins

On 9/3/13 12:19 PM, Moe Jette wrote:

The backfill scheduling logic is all in
src/plugins/sched/backfill/backfill.c

Most of the logic is in the function _attempt_backfill().

Quoting Martins Innus <[email protected]>:


Hello,
We have some questions on the basic functionality of the backfill scheduler. In the man page it says the following:

"Backfill scheduling will initiate lower-priority jobs if doing so does not delay the expected initiation time of any higher priority job."

Could someone point me to the code block that enforces this? Specifically, if the highest priority job cannot run currently due to a lack of available resources, is a reservation made or some sort of other limit placed on what backfill jobs can be run on the nodes that the above high priority job will be running on in the future?

Thanks

Martins



Reply via email to