Mike, Thanks for the reply!
Yes, I had found that posting previously and 'scontrol' did work as it did in the post. The problem was a configuration issue on my end; I'm sure you'll see my reply to EV. The nodes in question actually have less memory than their neighbors, and due to user error (mine!), I didn't configure them properly. John DeSantis 2014-07-02 14:17 GMT-04:00 Michael Robbert <[email protected]>: > John, > Did you find and read this thread from 2011 that appears to discuss this > issue? > > http://comments.gmane.org/gmane.comp.distributed.slurm.devel/669 > > Do you have RealMemory set in your slurm.conf? If so what is it set to? > Have you tried manually updating the node to Idle? Something like: > scontrol update NodeName=sanitized_hostname State=IDLE > > Mike > > On Jul 2, 2014, at 10:45 AM, John Desantis <[email protected]> wrote: > > > Hello list! > > I asked this question in #slurm yesterday but didn't receive a > response, and I also wasn't able to find any insight via Google or the > Slurm site. > > Anyways, to the point! > > How does Slurm (14.03) determine when a node should be placed in a > "drain" state with the reason "Low RealMemory"? I'm asking this > question because I have three nodes each having between 12-14 GB RAM > total, with "free" reporting between 7-10 GB as free. > > I'll paste some scontrol output below and corresponding entries from > slurm.conf. > > NodeName=http://comments.gmane.org/gmane.comp.distributed.slurm.devel/669[1] > Arch=x86_64 CoresPerSocket=4 > > CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.53 > Features=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon > Gres=(null) > NodeAddr=sanitized_hostname[1] NodeHostName=sanitized_hostname[1] > Version=(null) > OS=Linux RealMemory=12929 AllocMem=0 Sockets=2 Boards=1 > State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=0 Weight=1 > BootTime=2014-03-08T20:15:30 SlurmdStartTime=2014-07-02T12:29:17 > CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 > ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s > Reason=Low RealMemory [root@2014-07-01T14:48:44] > > NodeName=sanitized_hostname[2] Arch=x86_64 CoresPerSocket=4 > CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.54 > Features=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon > Gres=(null) > NodeAddr=sanitized_hostname[2] NodeHostName=sanitized_hostname[2] > Version=(null) > OS=Linux RealMemory=10909 AllocMem=0 Sockets=2 Boards=1 > State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=0 Weight=1 > BootTime=2014-03-08T20:15:02 SlurmdStartTime=2014-07-02T12:29:17 > CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 > ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s > Reason=Low RealMemory [root@2014-07-01T14:48:44] > > NodeName=sanitized_hostname[3] Arch=x86_64 CoresPerSocket=4 > CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.71 > Features=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon > Gres=(null) > NodeAddr=sanitized_hostname[3] NodeHostName=sanitized_hostname[3] > Version=(null) > OS=Linux RealMemory=10909 AllocMem=0 Sockets=2 Boards=1 > State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=0 Weight=1 > BootTime=2014-03-08T20:14:55 SlurmdStartTime=2014-07-02T12:29:17 > CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 > ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s > Reason=Low RealMemory [root@2014-07-01T14:48:44] > > NodeName=sanitized_hostname[1] CPUs=8 CoresPerSocket=4 Sockets=2 > RealMemory=12929 Feature=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon > NodeName=sanitized_hostname[2-3] CPUs=8 CoresPerSocket=4 Sockets=2 > RealMemory=10909 Feature=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon > > Thanks for any help and/or insight! > > John DeSantis > > >
