Hello list! I asked this question in #slurm yesterday but didn't receive a response, and I also wasn't able to find any insight via Google or the Slurm site.
Anyways, to the point! How does Slurm (14.03) determine when a node should be placed in a "drain" state with the reason "Low RealMemory"? I'm asking this question because I have three nodes each having between 12-14 GB RAM total, with "free" reporting between 7-10 GB as free. I'll paste some scontrol output below and corresponding entries from slurm.conf. NodeName=sanitized_hostname[1] Arch=x86_64 CoresPerSocket=4 CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.53 Features=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon Gres=(null) NodeAddr=sanitized_hostname[1] NodeHostName=sanitized_hostname[1] Version=(null) OS=Linux RealMemory=12929 AllocMem=0 Sockets=2 Boards=1 State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=0 Weight=1 BootTime=2014-03-08T20:15:30 SlurmdStartTime=2014-07-02T12:29:17 CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s Reason=Low RealMemory [root@2014-07-01T14:48:44] NodeName=sanitized_hostname[2] Arch=x86_64 CoresPerSocket=4 CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.54 Features=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon Gres=(null) NodeAddr=sanitized_hostname[2] NodeHostName=sanitized_hostname[2] Version=(null) OS=Linux RealMemory=10909 AllocMem=0 Sockets=2 Boards=1 State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=0 Weight=1 BootTime=2014-03-08T20:15:02 SlurmdStartTime=2014-07-02T12:29:17 CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s Reason=Low RealMemory [root@2014-07-01T14:48:44] NodeName=sanitized_hostname[3] Arch=x86_64 CoresPerSocket=4 CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.71 Features=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon Gres=(null) NodeAddr=sanitized_hostname[3] NodeHostName=sanitized_hostname[3] Version=(null) OS=Linux RealMemory=10909 AllocMem=0 Sockets=2 Boards=1 State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=0 Weight=1 BootTime=2014-03-08T20:14:55 SlurmdStartTime=2014-07-02T12:29:17 CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s Reason=Low RealMemory [root@2014-07-01T14:48:44] NodeName=sanitized_hostname[1] CPUs=8 CoresPerSocket=4 Sockets=2 RealMemory=12929 Feature=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon NodeName=sanitized_hostname[2-3] CPUs=8 CoresPerSocket=4 Sockets=2 RealMemory=10909 Feature=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon Thanks for any help and/or insight! John DeSantis
