[slurm-dev] Node Oversubscription with Shared=Force

Paul Edmon Tue, 24 Sep 2013 10:34:37 -0700

We are running SLURM 2.6.1. So far it's been working great. However weran into a bug recently. We wanted to disable users from using--exclusive because many of our users were using it when they didn'tactually need it. So we used the SHARED=FORCE option for the queue. Wehave this configured too:


SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory

So that should prevent collisions and everyone should get the resourcethey asked for. However, it appears not to work that way. I submittedthe following job:


#!/bin/sh
#SBATCH -n 64
#SBATCH --ntasks-per-node=64
#SBATCH -t 20
#SBATCH --exclusive
#SBATCH --mem=10000
#SBATCH -p general

echo "Hello, World"

echo start
hostname
sleep 10m
echo end

Which ended up looking like this:

[pedmon@itc011 slurm-testing]$ scontrol -dd show job 986116
JobId=986116 Name=sleep-test
   UserId=pedmon(56483) GroupId=rc_admin(40273)
   Priority=199305409 Account=cluster_users QOS=normal
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=00:04:54 TimeLimit=00:20:00 TimeMin=N/A
   SubmitTime=2013-09-24T12:57:50 EligibleTime=2013-09-24T12:57:50
   StartTime=2013-09-24T12:58:01 EndTime=2013-09-24T13:18:11
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=general AllocNode:Sid=itc011:33593
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=holy2a02205
   BatchHost=holy2a02205
   NumNodes=1 NumCPUs=64 CPUs/Task=1 ReqS:C:T=*:*:*
     Nodes=holy2a02205 CPU_IDs=0-31 Mem=10000
   MinCPUsNode=64 MinMemoryNode=10000M MinTmpDiskNode=0
   Features=(null) Gres=(null) Reservation=(null)
   Shared=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/n/home_rc/pedmon/slurm-testing/sleep-test
   WorkDir=/n/home_rc/pedmon/slurm-testing
   BatchScript=
#!/bin/sh
#SBATCH -n 64
#SBATCH --ntasks-per-node=64
#SBATCH -t 20
#SBATCH --exclusive
#SBATCH --mem=10000
#SBATCH -p general

echo "Hello,  World"

echo start
hostname
sleep 10m
echo end

Each one of our nodes had 64 cores and 256 GB of RAM. With SHARED=FORCEon it should disable exclusive but still obey the other commands. ThusI should get the whole node as I requested all 64 cores and for them allto land on the same node. However when I look at the node I landed on Iget:


Tasks: 1704 total,   14 running,  1689 sleeping,    0 stopped,    1 zombie
Cpu(s): 28.6%us,   0.9%sy,   0.0%ni,  70.0%id,   0.5%wa,   0.0%hi,   0.0%si,   
0.0%st
Mem:  264498760k total,  90909832k used,  173588928k free,     81228k buffers
Swap:  8388600k total,    120688k used,   8267912k free,  30757332k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
17952 jgarcia   20   0 1344m 757m 3000 R 96.0  0.3   3273:04 xillver-HR.x
18024 jgarcia   20   0 1344m 759m 3000 R 96.0  0.3   3269:57 xillver-HR.x
25336 chaoye    20   0  516m 495m 1656 R 96.0  0.2 174:31.83 dynslicing
31573 chaoye    20   0  226m 205m 1656 R 96.0  0.1  50:09.53 dynslicing
31574 chaoye    20   0  231m 210m 1656 R 96.0  0.1  50:09.47 dynslicing
53832 sglee     20   0 1314m 1.1g 4152 R 96.0  0.4 857:59.02 R
53894 sglee     20   0 1502m 1.3g 4084 R 96.0  0.5 857:42.19 R
53933 sglee     20   0 1486m 1.3g 4084 R 96.0  0.5 857:30.79 R
17890 jgarcia   20   0 1344m 759m 3008 R 94.3  0.3   3274:48 xillver-HR.x
18218 jgarcia   20   0 1344m 759m 3008 R 94.3  0.3   3268:29 xillver-HR.x
25337 chaoye    20   0  509m 488m 1656 R 94.3  0.2 174:31.79 dynslicing
29758 chaoye    20   0  358m 337m 1632 R 94.3  0.1 103:26.06 dynslicing
29759 chaoye    20   0  364m 343m 1632 R 94.3  0.1 103:26.09 dynslicing
21775 sstokes   20   0 28.9g  20g  41m S 82.0  8.3 153:33.34 MATLAB
37767 root      20   0 27124 2564  980 R  7.0  0.0   0:00.10 top
30910 root      20   0     0    0    0 S  1.7  0.0   1:14.49 ldlm_poold

If I look at what jobs are there I get:


[root@holy-slurm01 log]# scontrol -dd show job | grep holy2a02205
   NodeList=holy2a02205
   BatchHost=holy2a02205
     Nodes=holy2a02205 CPU_IDs=0 Mem=8000
   NodeList=holy2a02205
   BatchHost=holy2a02205
     Nodes=holy2a02205 CPU_IDs=1 Mem=8000
   NodeList=holy2a02205
   BatchHost=holy2a02205
     Nodes=holy2a02205 CPU_IDs=4 Mem=8000
   NodeList=holy2a02205
   BatchHost=holy2a02205
     Nodes=holy2a02205 CPU_IDs=5 Mem=8000
   NodeList=holy2a02205
   BatchHost=holy2a02205
     Nodes=holy2a02205 CPU_IDs=12 Mem=4096
   NodeList=holy2a02205
   BatchHost=holy2a02205
     Nodes=holy2a02205 CPU_IDs=13 Mem=4096
   NodeList=holy2a02205
   BatchHost=holy2a02205
     Nodes=holy2a02205 CPU_IDs=14 Mem=4096
   NodeList=holy2a02205
   BatchHost=holy2a02205
     Nodes=holy2a02205 CPU_IDs=7 Mem=30000
   NodeList=holy2a02205
   BatchHost=holy2a02205
     Nodes=holy2a02205 CPU_IDs=6 Mem=30000
   NodeList=holy2a02205
   BatchHost=holy2a02205
     Nodes=holy2a02205 CPU_IDs=8 Mem=30000
   NodeList=holy2a02205
   BatchHost=holy2a02205
     Nodes=holy2a02205 CPU_IDs=10 Mem=16000
   NodeList=holy2a02205
   BatchHost=holy2a02205
     Nodes=holy2a02205 CPU_IDs=11 Mem=16000
   NodeList=holy2a02205
   BatchHost=holy2a02205
     Nodes=holy2a02205 CPU_IDs=9 Mem=16000
   NodeList=holy2a02205
   BatchHost=holy2a02205
     Nodes=holy2a02205 CPU_IDs=15 Mem=16000
   NodeList=holy2a02205
   BatchHost=holy2a02205
     Nodes=holy2a02205 CPU_IDs=2 Mem=16000
   NodeList=holy2a02205
   BatchHost=holy2a02205
     Nodes=holy2a02205 CPU_IDs=16 Mem=16000
   NodeList=holy2a02205
   BatchHost=holy2a02205
     Nodes=holy2a02205 CPU_IDs=29 Mem=200
   NodeList=holy2a02205
   BatchHost=holy2a02205
     Nodes=holy2a02205 CPU_IDs=0-31 Mem=10000

As you can see this is oversubscribed.  Looks like it is not oversubscribed in 
memory space but it is in core space.  This is not good.  We do not want 
oversubscription in any space.  This seems to be a bug in the code.  Unless of 
course there is something about the behavior of SHARED=FORCE we aren't 
understanding.


Is this fixed in the newer version of SLURM?  Anyone have any ideas?

-Paul Edmon-

[slurm-dev] Node Oversubscription with Shared=Force

Reply via email to