Hi!
While investigating an other matter I found that if you have lots of jobs running with short job steps they killing the backfill very effective.
As all actions on a job step modifies the last_job_update global variable that effective stops the backfill loop.
This could be very simple demonstrated with this simple batch script on a system with some jobs in the queue.
----8<----
#!/bin/bash
for n in `seq 120`; do
srun sleep 1
done
----8<----
In 2.6.7-version I can only find a few places where last_job_update is
used and only one that is directly related to job step.
Is there a need to have the code updated the last_job_update for every action of a job step?
Should there be a last_job_step_update also? Is there actions of a job step that affects the queue?
Could there be an other variable that could be used to trigger a reschedule of the queue based on events that actually affects the scheduling of the queue?
Best regards, Magnus -- Magnus Jonsson, Developer, HPC2N, UmeƄ Universitet
smime.p7s
Description: S/MIME Cryptographic Signature
