Hi André, As to why nodes aren't automatically returned to service: think of a case where a node up and reboots unexpectedly possibly due to a hardware issue. Personally I'd want to check it out before I returned it to service. If SLURM didn't mark it as down I likely wouldnt know because my monitoring system wouldn't catch it since my nodes reboot so quickly. Granted this scenario isn't really applicable to the situation you described, but you get the idea.
Enjoy SLURM :) I think you'll be very happy with it. Sent from my iPhone On Aug 1, 2011, at 8:06 PM, "andre roy" <[email protected]> wrote: > Updating the node has changed its status to IDLE! > > thanks a bunch :) > > I'm surprised it doesn't automatically try to return to service when the > slurmd daemon starts up. > > -- > André > > >> ----- Original Message ----- >> From: [email protected] >> Sent: 08/01/11 04:58 PM >> To: [email protected] >> Subject: Re: [slurm-dev] Node state always down: low RealMemory >> >> >> Note the time stamp when the node was set down. It may never have been >> restored to service since then. Take a look at the configuration >> parameter ReturnToService in "man slurm.conf" and set appropriately. >> You can manually return the node to service with "scontrol update >> NodeName=ClusterNode0 State=Resume" >> >> > NodeName=ClusterNode0 Arch=i686 CoresPerSocket=1 >> > CPUAlloc=0 CPUErr=0 CPUTot=1 Features=(null) >> > OS=Linux RealMemory=2 Sockets=1 >> > State=DOWN ThreadsPerCore=1 TmpDisk=0 Weight=1 >> > Reason=Low RealMemory [slurm@2011-07-31T21:30:51] >> > I don't understand why the compute node is reporting low memory... >> > running >scontrol show slurm reports that the node has 1018 Mb >> > available to it and 480 Mb of disk space. >
