Perhaps what you want is the srun --exclusive option and running the  
srun commands in the background.

Slurm v2.6 has native job array support.

Quoting "Alan V. Cowles" <[email protected]>:

>
> Hey guys,
>
> We are new to slurm, hoping to use some of it's advanced parallel
> features over what is offered in older versions of SGE.
>
> We have written various sbatch scripts to test out methods of submitting
> jobs, and we are not finding a way to have it perform as intended.
>
> We have spent many hours looking over the man pages and resubmitting
> jobs but haven't found one that works just yet so I'm hoping another
> user can help us out.
>
> Here is a simple example what we are attempting to do:
>
> We have an sbatch script that in turn should call out 10 consecutive
> srun commands.
>
> We have it spread across 2 nodes of our cluster with -N 2, and what we
> would like is for srun1,srun2 to run at the same time, then 3,4 once the
> first two are finished, and so on until all 10 jobs are finished.
>
> What we are finding is that the first srun is running in parallel on 2
> nodes, then it's proceeding to the next sequentially, until it finishes
> all 10. Obviously this is not ideal.
>
> We have looked into the options for -n, -c, and haven't found either to
> do what we were expecting just extrapolate out the running of each srun
> to multiple cores/machines.
>
> One workaround we have found is to just submit all 10 jobs as separate
> srun commands. This works in theory until the we try to scale up to say
> 200 jobs, we run out of available slots, and with srun, jobs will
> terminate without available slots to receive them, which is why we
> really want to get this running as intended in an sbatch.
>
> Any help that can be provided in how to correctly modify the sbatch
> script would be most helpful.
>
> Thanks in advance.
>
> Alan Cowles
>

Reply via email to