srun has a multitude of options to control where the job steps run.  
Perhaps you just want "srun --exclusive ...".

Quoting Kasra Hosseini <[email protected]>:

> Dear SLURMers!
>
> I want to run a program with the following configuration:
> - One Batch script which runs two or more embarrassingly parallel workloads
> - Each of these embarrassingly parallel workloads needs one node (24 CPUs)
> - Therefore, I need two or more nodes, each node takes care of one
> embarrassingly parallel workload
>
> Problem:
> - All tasks will be distributed on just one node! (the first one)
>
> Here is my BATCH script: (Each node in our cluster has 24 CPUs. Since I
> want to run two jobs, I requested 2 nodes and for each node, 24 tasks)
>
> #!/bin/bash
> #SBATCH --partition=....
> #SBATCH --output=...
> #SBATCH --workdir=...
> #SBATCH --job-name="testing"
> #SBATCH --tasks-per-node=24
> #SBATCH --nodes=2
> #SBATCH --mail-type=all
> #SBATCH --time=71:59:59
>
> # run compute job
> source ..../.bashrc
> ./submit.sh my_job_1 24 &
> ./submit.sh my_job_2 24 &
>
> wait
>
> Result:
> - 48 tasks on the first node, nothing on the second one!
>
> Any idea?
>
> Thank you very much in advance for your time and kind attention to my
> request.
>
> Best,
> Kasra
>

Reply via email to