Your compute node has fewer CPUs than configured in slurm.conf. If you execute /usr/sbin/slurmd -C on a compute node, it will tell you exactly how many sockets, cores, thread, memory and temporary disk space are found on the node. If the slurm.conf file has higher values configured, then node will be marked DOWN as you see. You do not need to configure sockets, cores and threads, but only CPU (Procs=) if you are only worried about allocating CPUs and not about the task topology.
Quoting Fred Liu <[email protected]>:
Hi, What does "Low CPUS" mean? How can I make my node not in DOWN stat? scontrol show node cnlnx03 NodeName=cnlnx03 Arch=x86_64 CoresPerSocket=1 CPUAlloc=0 CPUErr=0 CPUTot=2 Features=(null) Gres=(null) OS=Linux RealMemory=1 Sockets=2 State=DOWN ThreadsPerCore=1 TmpDisk=0 Weight=1 BootTime=2010-12-27T09:33:49 SlurmdStartTime=2011-03-15T13:54:59 Reason=Low CPUs [slurm@2011-03-15T13:41:38] Thanks. Fred
