Hello, After having read a whole bunch of web pages about nodes and cores allocation in Slurm, I still don't get the logic of how it is done, and/or what is wrong with my script.
We have 5 nodes, with 64 cores on each. The test script I try to run is very simple : print in a text file the identifier of the current job. Here is the script : ######### #!/bin/bash #SBATCH --job-name=test_sbatch #SBATCH --output=res_test.stdout #SBATCH --ntasks=5 #SBATCH --ntasks-per-node=5 #SBATCH --array=1-5 #SBATCH -p shortq sleep 10 echo $SLURM_ARRAY_TASK_ID > test$SLURM_ARRAY_TASK_ID.txt ######### This version of the script runs perfectly well (5 cores used on one single node), but here is what happens if I change the parameters values (--array is always set to be equal to --ntasks) : --ntasks --ntasks-per-node #_of_nodes_used #_of_simultaneous_tasks_on_1_node 5 5 1 5 10 10 2 6/4 20 20 5 3 64 64 5 1 70 64 2 1 Do you know why my tasks are spread on several nodes (even if I add the --nodes=1 parameter) instead of using all the resources of one single node before using the next one ? One other thing I can't understand is the squeue output... In the --ntasks=20 --ntasks-per-node=20 case, the output looks like this : NODES NODELIST 1 node001 1 node001 1 node002 ... Whereas in the --ntasks=20 --ntasks-per-node=20 case, it looks like : NODES NODELIST 2 node[001-002] 2 node[003-004] Could you explain this difference ? Thanks in advance, Pierre-François
