Hi!

While investigating an other matter I found that if you have lots of jobs running with short job steps they killing the backfill very effective.

As all actions on a job step modifies the last_job_update global variable that effective stops the backfill loop.

This could be very simple demonstrated with this simple batch script on a system with some jobs in the queue.

----8<----
#!/bin/bash

for n in `seq 120`; do
        srun sleep 1
done
----8<----

In 2.6.7-version I can only find a few places where last_job_update is used and only one that is directly related to job step.

Is there a need to have the code updated the last_job_update for every action of a job step?

Should there be a last_job_step_update also? Is there actions of a job step that affects the queue?

Could there be an other variable that could be used to trigger a reschedule of the queue based on events that actually affects the scheduling of the queue?

Best regards,
Magnus

--
Magnus Jonsson, Developer, HPC2N, UmeƄ Universitet

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to