Hello list!

I asked this question in #slurm yesterday but didn't receive a
response, and I also wasn't able to find any insight via Google or the
Slurm site.

Anyways, to the point!

How does Slurm (14.03) determine when a node should be placed in a
"drain" state with the reason "Low RealMemory"?  I'm asking this
question because I have three nodes each having between 12-14 GB RAM
total, with "free" reporting between 7-10 GB as free.

I'll paste some scontrol output below and corresponding entries from slurm.conf.

NodeName=sanitized_hostname[1] Arch=x86_64 CoresPerSocket=4
   CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.53
Features=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon
   Gres=(null)
   NodeAddr=sanitized_hostname[1] NodeHostName=sanitized_hostname[1]
Version=(null)
   OS=Linux RealMemory=12929 AllocMem=0 Sockets=2 Boards=1
   State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=0 Weight=1
   BootTime=2014-03-08T20:15:30 SlurmdStartTime=2014-07-02T12:29:17
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
   Reason=Low RealMemory [root@2014-07-01T14:48:44]

NodeName=sanitized_hostname[2] Arch=x86_64 CoresPerSocket=4
   CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.54
Features=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon
   Gres=(null)
   NodeAddr=sanitized_hostname[2] NodeHostName=sanitized_hostname[2]
Version=(null)
   OS=Linux RealMemory=10909 AllocMem=0 Sockets=2 Boards=1
   State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=0 Weight=1
   BootTime=2014-03-08T20:15:02 SlurmdStartTime=2014-07-02T12:29:17
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
   Reason=Low RealMemory [root@2014-07-01T14:48:44]

NodeName=sanitized_hostname[3] Arch=x86_64 CoresPerSocket=4
   CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.71
Features=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon
   Gres=(null)
   NodeAddr=sanitized_hostname[3] NodeHostName=sanitized_hostname[3]
Version=(null)
   OS=Linux RealMemory=10909 AllocMem=0 Sockets=2 Boards=1
   State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=0 Weight=1
   BootTime=2014-03-08T20:14:55 SlurmdStartTime=2014-07-02T12:29:17
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
   Reason=Low RealMemory [root@2014-07-01T14:48:44]

NodeName=sanitized_hostname[1] CPUs=8 CoresPerSocket=4 Sockets=2
RealMemory=12929 Feature=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon
NodeName=sanitized_hostname[2-3] CPUs=8 CoresPerSocket=4 Sockets=2
RealMemory=10909 Feature=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon

Thanks for any help and/or insight!

John DeSantis

Reply via email to