Another option would be to use the squeue command and poll until all of the steps are complete. You could either use a script or add a --wait option to the squeue command to do the polling.

Quoting Yuri D'Elia <[email protected]>:

On Thu, 15 Sep 2011 20:16:02 +0200
"Yuri D'Elia" <[email protected]> wrote:

When "swait" is invoked inside the same job id as given on the command like, it should simply wait for all steps to finish, without counting the id of the allocation. Better yet, if SLURM_JOB_ID is defined, it should use it directly. That way "swait" would map *perfectly* to the "wait" built-in.

This way I could also implement super-easily my steps:

sbatch multi-stage.sh

# multi-stage.sh
for ...; do
  sbatch --jobid $SLURM_JOB_ID stage1-step.sh
done
swait

for ...; do
  sbatch --jobid $SLURM_JOB_ID stage2-step.sh
done
swait

echo "finished"
#####

There! The dependency problem is resolved without using dependencies in the first place. Also, managing the queue becomes *much* easier.

Actually, after thinking about it, it should be rather simple to implement. I think I can implement "swait" entirely in user-space so to speak.

I can simply list the steps for the given job and "sattach" to the first one. Once that finishes, repeat until there are no more steps. "sattach" has probably some unnecessary overhead that I don't need (redirections), and then again implementing this outside of slurmctld won't be as efficient, but if the steps are long enough, wasting a couple of seconds over 100k scheduled jobs is nothing.

I'll give it a try using the API first and then report back. I think it would make a nice addition to the SLURM system.




Reply via email to