Alan,

You should use something like this:

srun --exclusive -c 4 BINARY1 &
srun --exclusive -c 4 BINARY2
wait
srun --exclusive -c 4 BINARY3 &
srun --exclusive -c 4 BINARY4
wait
...

Or you can simply submit 200 sbatch each with its srun command inside
sbatch -c 4 --wrap="srun BINARY1"
...

Regards,
Carles Fenoy


On Thu, Jun 6, 2013 at 5:45 PM, Moe Jette <[email protected]> wrote:

>
> End of month, but pre-releases are running on some really large systems
> now.
>
> Quoting Paul Edmon <[email protected]>:
>
> >
> > Side question, when is the stable release of 2.6 projected to be
> available?
> >
> > -Paul Edmon-
> >
> > On 06/06/2013 11:40 AM, Moe Jette wrote:
> >> Perhaps what you want is the srun --exclusive option and running the
> >> srun commands in the background.
> >>
> >> Slurm v2.6 has native job array support.
> >>
> >> Quoting "Alan V. Cowles" <[email protected]>:
> >>
> >>> Hey guys,
> >>>
> >>> We are new to slurm, hoping to use some of it's advanced parallel
> >>> features over what is offered in older versions of SGE.
> >>>
> >>> We have written various sbatch scripts to test out methods of
> submitting
> >>> jobs, and we are not finding a way to have it perform as intended.
> >>>
> >>> We have spent many hours looking over the man pages and resubmitting
> >>> jobs but haven't found one that works just yet so I'm hoping another
> >>> user can help us out.
> >>>
> >>> Here is a simple example what we are attempting to do:
> >>>
> >>> We have an sbatch script that in turn should call out 10 consecutive
> >>> srun commands.
> >>>
> >>> We have it spread across 2 nodes of our cluster with -N 2, and what we
> >>> would like is for srun1,srun2 to run at the same time, then 3,4 once
> the
> >>> first two are finished, and so on until all 10 jobs are finished.
> >>>
> >>> What we are finding is that the first srun is running in parallel on 2
> >>> nodes, then it's proceeding to the next sequentially, until it finishes
> >>> all 10. Obviously this is not ideal.
> >>>
> >>> We have looked into the options for -n, -c, and haven't found either to
> >>> do what we were expecting just extrapolate out the running of each srun
> >>> to multiple cores/machines.
> >>>
> >>> One workaround we have found is to just submit all 10 jobs as separate
> >>> srun commands. This works in theory until the we try to scale up to say
> >>> 200 jobs, we run out of available slots, and with srun, jobs will
> >>> terminate without available slots to receive them, which is why we
> >>> really want to get this running as intended in an sbatch.
> >>>
> >>> Any help that can be provided in how to correctly modify the sbatch
> >>> script would be most helpful.
> >>>
> >>> Thanks in advance.
> >>>
> >>> Alan Cowles
> >>>
> >
>
>


-- 
--
Carles Fenoy

Reply via email to