Carles,

Both you and Moe seem to be on the same track here, I will investigate this further, thanks for your time.

AC

On 06/06/2013 11:56 AM, Carles Fenoy wrote:
Re: [slurm-dev] Re: correct formatting for sbatch jobs
Alan,

You should use something like this:

srun --exclusive -c 4 BINARY1 &
srun --exclusive -c 4 BINARY2
wait
srun --exclusive -c 4 BINARY3 &
srun --exclusive -c 4 BINARY4
wait
...

Or you can simply submit 200 sbatch each with its srun command inside
sbatch -c 4 --wrap="srun BINARY1"
...

Regards,
Carles Fenoy


On Thu, Jun 6, 2013 at 5:45 PM, Moe Jette <[email protected] <mailto:[email protected]>> wrote:


    End of month, but pre-releases are running on some really large
    systems now.

    Quoting Paul Edmon <[email protected]
    <mailto:[email protected]>>:

    >
    > Side question, when is the stable release of 2.6 projected to be
    available?
    >
    > -Paul Edmon-
    >
    > On 06/06/2013 11:40 AM, Moe Jette wrote:
    >> Perhaps what you want is the srun --exclusive option and
    running the
    >> srun commands in the background.
    >>
    >> Slurm v2.6 has native job array support.
    >>
    >> Quoting "Alan V. Cowles" <[email protected]
    <mailto:[email protected]>>:
    >>
    >>> Hey guys,
    >>>
    >>> We are new to slurm, hoping to use some of it's advanced parallel
    >>> features over what is offered in older versions of SGE.
    >>>
    >>> We have written various sbatch scripts to test out methods of
    submitting
    >>> jobs, and we are not finding a way to have it perform as intended.
    >>>
    >>> We have spent many hours looking over the man pages and
    resubmitting
    >>> jobs but haven't found one that works just yet so I'm hoping
    another
    >>> user can help us out.
    >>>
    >>> Here is a simple example what we are attempting to do:
    >>>
    >>> We have an sbatch script that in turn should call out 10
    consecutive
    >>> srun commands.
    >>>
    >>> We have it spread across 2 nodes of our cluster with -N 2, and
    what we
    >>> would like is for srun1,srun2 to run at the same time, then
    3,4 once the
    >>> first two are finished, and so on until all 10 jobs are finished.
    >>>
    >>> What we are finding is that the first srun is running in
    parallel on 2
    >>> nodes, then it's proceeding to the next sequentially, until it
    finishes
    >>> all 10. Obviously this is not ideal.
    >>>
    >>> We have looked into the options for -n, -c, and haven't found
    either to
    >>> do what we were expecting just extrapolate out the running of
    each srun
    >>> to multiple cores/machines.
    >>>
    >>> One workaround we have found is to just submit all 10 jobs as
    separate
    >>> srun commands. This works in theory until the we try to scale
    up to say
    >>> 200 jobs, we run out of available slots, and with srun, jobs will
    >>> terminate without available slots to receive them, which is why we
    >>> really want to get this running as intended in an sbatch.
    >>>
    >>> Any help that can be provided in how to correctly modify the
    sbatch
    >>> script would be most helpful.
    >>>
    >>> Thanks in advance.
    >>>
    >>> Alan Cowles
    >>>
    >




--
--
Carles Fenoy

Reply via email to