We have discussed adding support for job arrays, which is a way to group jobs, but there are no current plans to do that. -- Sent from my Android phone. Please excuse my brevity and typos.
Yuri D'Elia <[email protected]> wrote: On Thu, 15 Sep 2011 14:34:38 -0600 Moe Jette <[email protected]> wrote: > Another option would be to use the squeue command and poll until all > of the steps are complete. You could either use a script or add a > --wait option to the squeue command to do the polling. To comment further on this, after some testing, it doesn't really seem to work the way I want it to: > >> sbatch multi-stage.sh > >> > >> # multi-stage.sh > >> for ...; do > >> sbatch --jobid $SLURM_JOB_ID stage1-step.sh > >> done > >> swait Each job "step" here has the same allocation as the parent, mearning that you have to request all resources of the cluster in advance to have the steps scheduled on all the available resources. Uncool. It doesn't seem to be possible to create a "step" with independent allocation either. Also, steps don't seem to be managed/queued (ie: I can run actually as many steps as I want into a previous allocation). Using dependencies with 100k jobs is prohibitive. Right now I'm submitting job groups with a specific name. Then, I'm polling the queue by using the job name: the single script responsible for polling is then used as a dependency for step 2, and so forth. I really wish I could setup job "groups" to handle dependencies, and also to inspect the queue in a more sensible way. Similarly, I'm interested if the whole job group finished, not just the single job.
