Carles,
Both you and Moe seem to be on the same track here, I will investigate
this further, thanks for your time.
AC
On 06/06/2013 11:56 AM, Carles Fenoy wrote:
Re: [slurm-dev] Re: correct formatting for sbatch jobs
Alan,
You should use something like this:
srun --exclusive -c 4 BINARY1 &
srun --exclusive -c 4 BINARY2
wait
srun --exclusive -c 4 BINARY3 &
srun --exclusive -c 4 BINARY4
wait
...
Or you can simply submit 200 sbatch each with its srun command inside
sbatch -c 4 --wrap="srun BINARY1"
...
Regards,
Carles Fenoy
On Thu, Jun 6, 2013 at 5:45 PM, Moe Jette <[email protected]
<mailto:[email protected]>> wrote:
End of month, but pre-releases are running on some really large
systems now.
Quoting Paul Edmon <[email protected]
<mailto:[email protected]>>:
>
> Side question, when is the stable release of 2.6 projected to be
available?
>
> -Paul Edmon-
>
> On 06/06/2013 11:40 AM, Moe Jette wrote:
>> Perhaps what you want is the srun --exclusive option and
running the
>> srun commands in the background.
>>
>> Slurm v2.6 has native job array support.
>>
>> Quoting "Alan V. Cowles" <[email protected]
<mailto:[email protected]>>:
>>
>>> Hey guys,
>>>
>>> We are new to slurm, hoping to use some of it's advanced parallel
>>> features over what is offered in older versions of SGE.
>>>
>>> We have written various sbatch scripts to test out methods of
submitting
>>> jobs, and we are not finding a way to have it perform as intended.
>>>
>>> We have spent many hours looking over the man pages and
resubmitting
>>> jobs but haven't found one that works just yet so I'm hoping
another
>>> user can help us out.
>>>
>>> Here is a simple example what we are attempting to do:
>>>
>>> We have an sbatch script that in turn should call out 10
consecutive
>>> srun commands.
>>>
>>> We have it spread across 2 nodes of our cluster with -N 2, and
what we
>>> would like is for srun1,srun2 to run at the same time, then
3,4 once the
>>> first two are finished, and so on until all 10 jobs are finished.
>>>
>>> What we are finding is that the first srun is running in
parallel on 2
>>> nodes, then it's proceeding to the next sequentially, until it
finishes
>>> all 10. Obviously this is not ideal.
>>>
>>> We have looked into the options for -n, -c, and haven't found
either to
>>> do what we were expecting just extrapolate out the running of
each srun
>>> to multiple cores/machines.
>>>
>>> One workaround we have found is to just submit all 10 jobs as
separate
>>> srun commands. This works in theory until the we try to scale
up to say
>>> 200 jobs, we run out of available slots, and with srun, jobs will
>>> terminate without available slots to receive them, which is why we
>>> really want to get this running as intended in an sbatch.
>>>
>>> Any help that can be provided in how to correctly modify the
sbatch
>>> script would be most helpful.
>>>
>>> Thanks in advance.
>>>
>>> Alan Cowles
>>>
>
--
--
Carles Fenoy