[slurm-dev] Re: Killing the backfill...

jette Tue, 20 May 2014 08:04:35 -0700

Creating a job step modifies the job record, hence last_job_update.last_job_update is used in several ways besides scheduling (statesave, updates to squeue, etc.). One way to address this might be toadd a new time stamp that notes changes to the job records that mightimpact scheduling (e.g. new job submitted or job terminates).

There have been some bugs related to the bf_continue configurationparameter in recent months. I would suggest that you reconsider that.



Quoting Magnus Jonsson <[email protected]>:

Hi!
While investigating an other matter I found that if you have lots ofjobs running with short job steps they killing the backfill veryeffective.
As all actions on a job step modifies the last_job_update globalvariable that effective stops the backfill loop.
This could be very simple demonstrated with this simple batch scripton a system with some jobs in the queue.
----8<----
#!/bin/bash

for n in `seq 120`; do
        srun sleep 1
done
----8<----
In 2.6.7-version I can only find a few places where last_job_updateis used and only one that is directly related to job step.
Is there a need to have the code updated the last_job_update forevery action of a job step?
Should there be a last_job_step_update also? Is there actions of ajob step that affects the queue?
Could there be an other variable that could be used to trigger areschedule of the queue based on events that actually affects thescheduling of the queue?
Best regards,
Magnus

--
Magnus Jonsson, Developer, HPC2N, Umeå Universitet

[slurm-dev] Re: Killing the backfill...

Reply via email to