Did you check the slurmd.log on the node's and make sure the RealMemory for them on start up is less then what's defined in slurmd.conf?
On Wed, Jul 2, 2014 at 12:45 PM, John Desantis <[email protected]> wrote: > > Hello list! > > I asked this question in #slurm yesterday but didn't receive a > response, and I also wasn't able to find any insight via Google or the > Slurm site. > > Anyways, to the point! > > How does Slurm (14.03) determine when a node should be placed in a > "drain" state with the reason "Low RealMemory"? I'm asking this > question because I have three nodes each having between 12-14 GB RAM > total, with "free" reporting between 7-10 GB as free. > > I'll paste some scontrol output below and corresponding entries from > slurm.conf. > > NodeName=sanitized_hostname[1] Arch=x86_64 CoresPerSocket=4 > CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.53 > Features=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon > Gres=(null) > NodeAddr=sanitized_hostname[1] NodeHostName=sanitized_hostname[1] > Version=(null) > OS=Linux RealMemory=12929 AllocMem=0 Sockets=2 Boards=1 > State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=0 Weight=1 > BootTime=2014-03-08T20:15:30 SlurmdStartTime=2014-07-02T12:29:17 > CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 > ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s > Reason=Low RealMemory [root@2014-07-01T14:48:44] > > NodeName=sanitized_hostname[2] Arch=x86_64 CoresPerSocket=4 > CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.54 > Features=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon > Gres=(null) > NodeAddr=sanitized_hostname[2] NodeHostName=sanitized_hostname[2] > Version=(null) > OS=Linux RealMemory=10909 AllocMem=0 Sockets=2 Boards=1 > State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=0 Weight=1 > BootTime=2014-03-08T20:15:02 SlurmdStartTime=2014-07-02T12:29:17 > CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 > ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s > Reason=Low RealMemory [root@2014-07-01T14:48:44] > > NodeName=sanitized_hostname[3] Arch=x86_64 CoresPerSocket=4 > CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.71 > Features=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon > Gres=(null) > NodeAddr=sanitized_hostname[3] NodeHostName=sanitized_hostname[3] > Version=(null) > OS=Linux RealMemory=10909 AllocMem=0 Sockets=2 Boards=1 > State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=0 Weight=1 > BootTime=2014-03-08T20:14:55 SlurmdStartTime=2014-07-02T12:29:17 > CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 > ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s > Reason=Low RealMemory [root@2014-07-01T14:48:44] > > NodeName=sanitized_hostname[1] CPUs=8 CoresPerSocket=4 Sockets=2 > RealMemory=12929 Feature=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon > NodeName=sanitized_hostname[2-3] CPUs=8 CoresPerSocket=4 Sockets=2 > RealMemory=10909 Feature=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon > > Thanks for any help and/or insight! > > John DeSantis
