John,
Did you find and read this thread from 2011 that appears to discuss this issue?

http://comments.gmane.org/gmane.comp.distributed.slurm.devel/669

Do you have RealMemory set in your slurm.conf? If so what is it set to?
Have you tried manually updating the node to Idle? Something like:
scontrol update NodeName=sanitized_hostname State=IDLE

Mike

On Jul 2, 2014, at 10:45 AM, John Desantis <[email protected]> wrote:

> 
> Hello list!
> 
> I asked this question in #slurm yesterday but didn't receive a
> response, and I also wasn't able to find any insight via Google or the
> Slurm site.
> 
> Anyways, to the point!
> 
> How does Slurm (14.03) determine when a node should be placed in a
> "drain" state with the reason "Low RealMemory"?  I'm asking this
> question because I have three nodes each having between 12-14 GB RAM
> total, with "free" reporting between 7-10 GB as free.
> 
> I'll paste some scontrol output below and corresponding entries from 
> slurm.conf.
> 
> NodeName=http://comments.gmane.org/gmane.comp.distributed.slurm.devel/669[1] 
> Arch=x86_64 CoresPerSocket=4
>   CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.53
> Features=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon
>   Gres=(null)
>   NodeAddr=sanitized_hostname[1] NodeHostName=sanitized_hostname[1]
> Version=(null)
>   OS=Linux RealMemory=12929 AllocMem=0 Sockets=2 Boards=1
>   State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=0 Weight=1
>   BootTime=2014-03-08T20:15:30 SlurmdStartTime=2014-07-02T12:29:17
>   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>   Reason=Low RealMemory [root@2014-07-01T14:48:44]
> 
> NodeName=sanitized_hostname[2] Arch=x86_64 CoresPerSocket=4
>   CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.54
> Features=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon
>   Gres=(null)
>   NodeAddr=sanitized_hostname[2] NodeHostName=sanitized_hostname[2]
> Version=(null)
>   OS=Linux RealMemory=10909 AllocMem=0 Sockets=2 Boards=1
>   State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=0 Weight=1
>   BootTime=2014-03-08T20:15:02 SlurmdStartTime=2014-07-02T12:29:17
>   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>   Reason=Low RealMemory [root@2014-07-01T14:48:44]
> 
> NodeName=sanitized_hostname[3] Arch=x86_64 CoresPerSocket=4
>   CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.71
> Features=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon
>   Gres=(null)
>   NodeAddr=sanitized_hostname[3] NodeHostName=sanitized_hostname[3]
> Version=(null)
>   OS=Linux RealMemory=10909 AllocMem=0 Sockets=2 Boards=1
>   State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=0 Weight=1
>   BootTime=2014-03-08T20:14:55 SlurmdStartTime=2014-07-02T12:29:17
>   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>   Reason=Low RealMemory [root@2014-07-01T14:48:44]
> 
> NodeName=sanitized_hostname[1] CPUs=8 CoresPerSocket=4 Sockets=2
> RealMemory=12929 Feature=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon
> NodeName=sanitized_hostname[2-3] CPUs=8 CoresPerSocket=4 Sockets=2
> RealMemory=10909 Feature=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon
> 
> Thanks for any help and/or insight!
> 
> John DeSantis

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to