Carles Fenoy <[email protected]> writes: > I would suggest using srun inside the job, and not sbatch. > > If you submit a job with sbatch and inside use srun you can easily use > dependencies between stages. > > So you will have > > $ sbatch --job-name=stage1 --ntasks=6 ./script_stage1.sh > Submitted batch job 1 > > script_stage1.sh: > > srun -n1 process_data.exe 1 & > srun -n1 process_data.exe 2 & > srun -n1 process_data.exe 3 & > srun -n1 process_data.exe 4 & > ... > # You should control the number of srun running at the same time > > wait > > sbatch --job-name=collect_and_aggregate --dependency=afterok:1 > ./collect_and_aggregate.sh > Submitted batch job 2 > > sbatch --job-name=stage2 --ntasks=6 --dependency=afterok:2 ./script_stage2.sh > > That's the way I'll do it, but don't know if there is any other way to > do it. You should also have a control of which steps in every stage > have finished in order to be able to resume the execution in case of > any node failure.
If I wanted to wrap all three stages stage1 collect_and_aggregate stage2 in a single script, I would need to capture the job ID of stage1 so that I could add it as the argument to 'afterok' for collect_and_aggregate (and similarly for collect_and_aggregate and stage2). Is there an easy way to do this, other than parsing the output of squeue after each sbatch and using the job name to extract the job ID? Regards Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email [email protected]
