Dear SLURMers!

I want to run a program with the following configuration:
- One Batch script which runs two or more embarrassingly parallel workloads
- Each of these embarrassingly parallel workloads needs one node (24 CPUs)
- Therefore, I need two or more nodes, each node takes care of one
embarrassingly parallel workload

Problem:
- All tasks will be distributed on just one node! (the first one)

Here is my BATCH script: (Each node in our cluster has 24 CPUs. Since I
want to run two jobs, I requested 2 nodes and for each node, 24 tasks)

#!/bin/bash
#SBATCH --partition=....
#SBATCH --output=...
#SBATCH --workdir=...
#SBATCH --job-name="testing"
#SBATCH --tasks-per-node=24
#SBATCH --nodes=2
#SBATCH --mail-type=all
#SBATCH --time=71:59:59

# run compute job
source ..../.bashrc
./submit.sh my_job_1 24 &
./submit.sh my_job_2 24 &

wait

Result:
- 48 tasks on the first node, nothing on the second one!

Any idea?

Thank you very much in advance for your time and kind attention to my
request.

Best,
Kasra

Reply via email to