On Thu, 15 Sep 2011 20:16:02 +0200
"Yuri D'Elia" <[email protected]> wrote:

> When "swait" is invoked inside the same job id as given on the command like, 
> it should simply wait for all steps to finish, without counting the id of the 
> allocation. Better yet, if SLURM_JOB_ID is defined, it should use it 
> directly. That way "swait" would map *perfectly* to the "wait" built-in.
> 
> This way I could also implement super-easily my steps:
> 
> sbatch multi-stage.sh
> 
> # multi-stage.sh
> for ...; do
>   sbatch --jobid $SLURM_JOB_ID stage1-step.sh
> done
> swait
> 
> for ...; do
>   sbatch --jobid $SLURM_JOB_ID stage2-step.sh
> done
> swait
> 
> echo "finished"
> #####
> 
> There! The dependency problem is resolved without using dependencies in the 
> first place. Also, managing the queue becomes *much* easier.

Actually, after thinking about it, it should be rather simple to implement. I 
think I can implement "swait" entirely in user-space so to speak.

I can simply list the steps for the given job and "sattach" to the first one. 
Once that finishes, repeat until there are no more steps. "sattach" has 
probably some unnecessary overhead that I don't need (redirections), and then 
again implementing this outside of slurmctld won't be as efficient, but if the 
steps are long enough, wasting a couple of seconds over 100k scheduled jobs is 
nothing.

I'll give it a try using the API first and then report back. I think it would 
make a nice addition to the SLURM system.

Reply via email to