Hello Ulf,

I have updated the --overcommit documentation as shown below. The --ntasks-per-core and socket apply only to the job allocation and are ignored for job steps. Proobably the best solution would be to create a job allocation of the desired size using the salloc command and then execute srun with the desired task count. There is an environment variable SLURM_JOB_CPUS_PER_NODE set to the CPU count on each node, but the definition of "CPU" depends upon your configuration and could be a core or hyperthread count. Something like this should work in your environment:
salloc -n8 -N1 srun -n16 -O a.out

       --ntasks-per-core=<ntasks>
              Request the maximum ntasks be invoked on each core.  This option
              applies to the job allocation,  but  not  to  step  allocations.
              Meant   to  be  used  with  the  --ntasks  option.   Related  to
              --ntasks-per-node except at the core level instead of  the  node
              level.   Masks will automatically be generated to bind the tasks
              to specific core unless  --cpu_bind=none  is  specified.   NOTE:
              This option is not supported unless SelectTypeParameters=CR_Core
              or SelectTypeParameters=CR_Core_Memory is configured.
       -O, --overcommit
              Overcommit  resources.  When applied to job allocation, only one
              CPU is allocated to the job per node and options used to specify
              the  number  of tasks per node, socket, core, etc.  are ignored.
              When applied to job step allocations (the srun command when exe‐
              cuted  within  an  existing  job allocation), this option can be
              used to launch more than one task per CPU.  Normally, srun  will
              not  allocate  more  than  one  process  per CPU.  By specifying
              --overcommit you are explicitly allowing more than  one  process
              per  CPU. However no more than MAX_TASKS_PER_NODE tasks are per‐
              mitted to execute per node.  NOTE: MAX_TASKS_PER_NODE is defined
              in  the  file  slurm.h and is not a variable, it is set at SLURM
              build time.

Quoting Ulf Markwardt <[email protected]>:

Hello Moe,

That is exactly what the overcommit option is designed to do.

   when I do
salloc -p sandy --overcommit --ntasks-per-node=16 --ntasks-per-core=2 -t 10
   all tasks run on a single core:

Hm, thats a bit confusing for me. It looks like "--overcommit" and "--ntasks-per-core=2" do not work together.

Is there a way to tell SLURM that I want to run exactly two processes per core?

Thanks
Ulf


--
___________________________________________________________________
Dr. Ulf Markwardt

Technische Universität Dresden
Center for Information Services and High Performance Computing (ZIH)
01062 Dresden, Germany

Phone: (+49) 351/463-33640      WWW:  http://www.tu-dresden.de/zih



Reply via email to