Alan, You should use something like this:
srun --exclusive -c 4 BINARY1 & srun --exclusive -c 4 BINARY2 wait srun --exclusive -c 4 BINARY3 & srun --exclusive -c 4 BINARY4 wait ... Or you can simply submit 200 sbatch each with its srun command inside sbatch -c 4 --wrap="srun BINARY1" ... Regards, Carles Fenoy On Thu, Jun 6, 2013 at 5:45 PM, Moe Jette <[email protected]> wrote: > > End of month, but pre-releases are running on some really large systems > now. > > Quoting Paul Edmon <[email protected]>: > > > > > Side question, when is the stable release of 2.6 projected to be > available? > > > > -Paul Edmon- > > > > On 06/06/2013 11:40 AM, Moe Jette wrote: > >> Perhaps what you want is the srun --exclusive option and running the > >> srun commands in the background. > >> > >> Slurm v2.6 has native job array support. > >> > >> Quoting "Alan V. Cowles" <[email protected]>: > >> > >>> Hey guys, > >>> > >>> We are new to slurm, hoping to use some of it's advanced parallel > >>> features over what is offered in older versions of SGE. > >>> > >>> We have written various sbatch scripts to test out methods of > submitting > >>> jobs, and we are not finding a way to have it perform as intended. > >>> > >>> We have spent many hours looking over the man pages and resubmitting > >>> jobs but haven't found one that works just yet so I'm hoping another > >>> user can help us out. > >>> > >>> Here is a simple example what we are attempting to do: > >>> > >>> We have an sbatch script that in turn should call out 10 consecutive > >>> srun commands. > >>> > >>> We have it spread across 2 nodes of our cluster with -N 2, and what we > >>> would like is for srun1,srun2 to run at the same time, then 3,4 once > the > >>> first two are finished, and so on until all 10 jobs are finished. > >>> > >>> What we are finding is that the first srun is running in parallel on 2 > >>> nodes, then it's proceeding to the next sequentially, until it finishes > >>> all 10. Obviously this is not ideal. > >>> > >>> We have looked into the options for -n, -c, and haven't found either to > >>> do what we were expecting just extrapolate out the running of each srun > >>> to multiple cores/machines. > >>> > >>> One workaround we have found is to just submit all 10 jobs as separate > >>> srun commands. This works in theory until the we try to scale up to say > >>> 200 jobs, we run out of available slots, and with srun, jobs will > >>> terminate without available slots to receive them, which is why we > >>> really want to get this running as intended in an sbatch. > >>> > >>> Any help that can be provided in how to correctly modify the sbatch > >>> script would be most helpful. > >>> > >>> Thanks in advance. > >>> > >>> Alan Cowles > >>> > > > > -- -- Carles Fenoy
