Another option would be to use the squeue command and poll until all
of the steps are complete. You could either use a script or add a
--wait option to the squeue command to do the polling.
Quoting Yuri D'Elia <[email protected]>:
On Thu, 15 Sep 2011 20:16:02 +0200
"Yuri D'Elia" <[email protected]> wrote:
When "swait" is invoked inside the same job id as given on the
command like, it should simply wait for all steps to finish,
without counting the id of the allocation. Better yet, if
SLURM_JOB_ID is defined, it should use it directly. That way
"swait" would map *perfectly* to the "wait" built-in.
This way I could also implement super-easily my steps:
sbatch multi-stage.sh
# multi-stage.sh
for ...; do
sbatch --jobid $SLURM_JOB_ID stage1-step.sh
done
swait
for ...; do
sbatch --jobid $SLURM_JOB_ID stage2-step.sh
done
swait
echo "finished"
#####
There! The dependency problem is resolved without using
dependencies in the first place. Also, managing the queue becomes
*much* easier.
Actually, after thinking about it, it should be rather simple to
implement. I think I can implement "swait" entirely in user-space so
to speak.
I can simply list the steps for the given job and "sattach" to the
first one. Once that finishes, repeat until there are no more steps.
"sattach" has probably some unnecessary overhead that I don't need
(redirections), and then again implementing this outside of
slurmctld won't be as efficient, but if the steps are long enough,
wasting a couple of seconds over 100k scheduled jobs is nothing.
I'll give it a try using the API first and then report back. I think
it would make a nice addition to the SLURM system.