Carles Fenoy <[email protected]> writes:

> I would suggest using srun inside the job, and not sbatch.
>
> If you submit a job with sbatch and inside use srun you can easily use
> dependencies between stages.
>
> So you will have
>
> $ sbatch --job-name=stage1 --ntasks=6 ./script_stage1.sh
> Submitted batch job 1
>
> script_stage1.sh:
>
> srun -n1 process_data.exe 1 &
> srun -n1 process_data.exe 2 &
> srun -n1 process_data.exe 3 &
> srun -n1 process_data.exe 4 &
> ...
> # You should control the number of srun running at the same time
>
> wait
>
> sbatch --job-name=collect_and_aggregate --dependency=afterok:1 
> ./collect_and_aggregate.sh
> Submitted batch job 2
>
> sbatch --job-name=stage2 --ntasks=6 --dependency=afterok:2 ./script_stage2.sh
>
> That's the way I'll do it, but don't know if there is any other way to
> do it.  You should also have a control of which steps in every stage
> have finished in order to be able to resume the execution in case of
> any node failure.

If I wanted to wrap all three stages

stage1
collect_and_aggregate
stage2

in a single script, I would need to capture the job ID of stage1 so that
I could add it as the argument to 'afterok' for collect_and_aggregate
(and similarly for collect_and_aggregate and stage2).

Is there an easy way to do this, other than parsing the output of squeue
after each sbatch and using the job name to extract the job ID?

Regards

Loris

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin         Email [email protected]

Reply via email to