[slurm-dev] Processes sharing cores

Jason Bacon Fri, 03 Jun 2016 08:19:58 -0700


We're having an issue with CPU binding when two jobs land on the same node.

Some cores are shared by the 2 jobs while others are left idle. Below isoutput from "top" after pressing 'f' then 'j' to show processors used(the P column):


  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND

5577 bacon 20 0 3916 368 300 R 49.9 0.0 0:34.86 0calcpi-parallel5578 bacon 20 0 3916 372 300 R 49.9 0.0 0:34.89 2calcpi-parallel

 5609 bacon     20   0  410m 108m 3836 R 49.9  0.7   0:12.52 0 mpi_bench
 5610 bacon     20   0  410m 110m 3836 R 49.9  0.7   0:12.52 2 mpi_bench

As you can see above, both jobs are using cores 0 and 2, while cores 1and 3 are unused.


Here's what I think could possibly be relevant from our slurm.conf:

MpiDefault=none
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
TaskPlugin=task/affinity
TaskPluginParam=cores,verbose
FastSchedule=1

NodeName=compute-001 RealMemory=8000 Sockets=2 CoresPerSocket=2State=UNKNOWNNodeName=compute-002 RealMemory=15946 Sockets=2 CoresPerSocket=2State=UNKNOWNPartitionName=batch Nodes=compute-[001-002] Default=YES MaxTime=INFINITEState=UP


This is a small test cluster running CentOS and SLURM 14.11.6.

Any suggestions would be appreciated.

Thanks,

    Jason

--
All wars are civil wars, because all men are brothers ... Each one owes
infinitely more to the human race than to the particular country in
which he was born.
                -- Francois Fenelon

[slurm-dev] Processes sharing cores

Reply via email to