Note the time stamp when the node was set down. It may never have been restored to service since then. Take a look at the configuration parameter ReturnToService in "man slurm.conf" and set appropriately. You can manually return the node to service with "scontrol update NodeName=ClusterNode0 State=Resume"

 NodeName=ClusterNode0 Arch=i686 CoresPerSocket=1
 CPUAlloc=0 CPUErr=0 CPUTot=1 Features=(null)
 OS=Linux RealMemory=2 Sockets=1
 State=DOWN ThreadsPerCore=1 TmpDisk=0 Weight=1
 Reason=Low RealMemory [slurm@2011-07-31T21:30:51]
I don't understand why the compute node is reporting low memory... running >scontrol show slurm reports that the node has 1018 Mb available to it and 480 Mb of disk space.


Reply via email to