Dear all,I have a set of simulation runs, each consisting of running a certain executable with a certain set of parameters. Each simulation run uses two cores. Different simulation runs are independent of each other.
If have about 20 nodes with between 20 and 40 cores each.My problem is that I'm using a proprietary programming language where licensing only allows me to run 8 parallel processes per node.
My question is how to handle this additional resource "license" using slurm. Some approaches I tried: *1.* Each simulation run is a job.This leads to crashes because more than 8 jobs can be allocated to the same node.
*2.* The set of all simulation runs forms one job with sbatch --tasks-per-node=8. One simulation run is a parallel srun --exclusive call.
This should work, but I see an efficiency problem (please correct me if I'm wrong):
I'm creating basically my own private "pool" of a size specified by the value of --ntasks. Now it's not clear what that value should be (the optimal value would depend on the current usage of the cluster), and I also shouldn't have to worry about it: the job scheduler should decide which tasks to allocate where, not the user, and it should be done dynamically rather than statically.
*3.* 8 simulation runs form a job with sbatch --exclusive -N 1-1. One simulation run is a parallel srun --exclusive call.
This should work as well, but has a similar efficiency problem:I'm allocating a full node per job, but each job can only use 8*2=16 cores, out of the 20-40 ones available.
What would be ideal is alternative 1 or 3, but with an option like --exclusive-among-jobs-of-the-same-kind (whatever that means).
Any ideas? Thanks a lot, Steffen -- Steffen Schuldenzucker Ph.D. Student Department of Informatics University of Zurich Binzmühlestrasse 14 CH-8050 Zürich Room BIN 2.A.25 Tel +41 44 635 45 82 Mob +49 176 5337 8181 Email [email protected] Web http://www.ifi.uzh.ch/ce/people/schuldenzucker.html
smime.p7s
Description: S/MIME Cryptographic Signature
