Mike,

Thanks for the reply!

Yes, I had found that posting previously and 'scontrol' did work as it did
in the post.  The problem was a configuration issue on my end;  I'm sure
you'll see my reply to EV.

The nodes in question actually have less memory than their neighbors, and
due to user error (mine!), I didn't configure them properly.

John DeSantis


2014-07-02 14:17 GMT-04:00 Michael Robbert <[email protected]>:

> John,
> Did you find and read this thread from 2011 that appears to discuss this
> issue?
>
> http://comments.gmane.org/gmane.comp.distributed.slurm.devel/669
>
> Do you have RealMemory set in your slurm.conf? If so what is it set to?
> Have you tried manually updating the node to Idle? Something like:
> scontrol update NodeName=sanitized_hostname State=IDLE
>
> Mike
>
> On Jul 2, 2014, at 10:45 AM, John Desantis <[email protected]> wrote:
>
>
> Hello list!
>
> I asked this question in #slurm yesterday but didn't receive a
> response, and I also wasn't able to find any insight via Google or the
> Slurm site.
>
> Anyways, to the point!
>
> How does Slurm (14.03) determine when a node should be placed in a
> "drain" state with the reason "Low RealMemory"?  I'm asking this
> question because I have three nodes each having between 12-14 GB RAM
> total, with "free" reporting between 7-10 GB as free.
>
> I'll paste some scontrol output below and corresponding entries from
> slurm.conf.
>
> NodeName=http://comments.gmane.org/gmane.comp.distributed.slurm.devel/669[1]
> Arch=x86_64 CoresPerSocket=4
>
>   CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.53
> Features=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon
>   Gres=(null)
>   NodeAddr=sanitized_hostname[1] NodeHostName=sanitized_hostname[1]
> Version=(null)
>   OS=Linux RealMemory=12929 AllocMem=0 Sockets=2 Boards=1
>   State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=0 Weight=1
>   BootTime=2014-03-08T20:15:30 SlurmdStartTime=2014-07-02T12:29:17
>   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>   Reason=Low RealMemory [root@2014-07-01T14:48:44]
>
> NodeName=sanitized_hostname[2] Arch=x86_64 CoresPerSocket=4
>   CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.54
> Features=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon
>   Gres=(null)
>   NodeAddr=sanitized_hostname[2] NodeHostName=sanitized_hostname[2]
> Version=(null)
>   OS=Linux RealMemory=10909 AllocMem=0 Sockets=2 Boards=1
>   State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=0 Weight=1
>   BootTime=2014-03-08T20:15:02 SlurmdStartTime=2014-07-02T12:29:17
>   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>   Reason=Low RealMemory [root@2014-07-01T14:48:44]
>
> NodeName=sanitized_hostname[3] Arch=x86_64 CoresPerSocket=4
>   CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.71
> Features=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon
>   Gres=(null)
>   NodeAddr=sanitized_hostname[3] NodeHostName=sanitized_hostname[3]
> Version=(null)
>   OS=Linux RealMemory=10909 AllocMem=0 Sockets=2 Boards=1
>   State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=0 Weight=1
>   BootTime=2014-03-08T20:14:55 SlurmdStartTime=2014-07-02T12:29:17
>   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>   Reason=Low RealMemory [root@2014-07-01T14:48:44]
>
> NodeName=sanitized_hostname[1] CPUs=8 CoresPerSocket=4 Sockets=2
> RealMemory=12929 Feature=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon
> NodeName=sanitized_hostname[2-3] CPUs=8 CoresPerSocket=4 Sockets=2
> RealMemory=10909 Feature=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon
>
> Thanks for any help and/or insight!
>
> John DeSantis
>
>
>

Reply via email to