We have a cluster with dual socket nodes with 10-core cpus (ht off) and we share nodes with SelectType=select/cons_res. Before (or after) running an MPI task, I'd like to run some pre (and post) processing tasks, one per node but am having trouble finding documentation for how to do this. I was expecting to submit a jobs with sbatch with --nodes=N --tasks-per-node=20 where N is an integer to get multiple whole nodes then run srun --tasks-per-node=1 for the per node tasks but this does not work (I get one task for each core).
I'd also like any solution to work with hybrid mpi/openmp with one openmp task per node or per socket. Thanks, Gareth
