You probably don't have slurm configured properly. Check SlurmctldLogFile for a message of this format: "Node %s has low cpu count %u", That will tell you haw many CPUs the node reported that it has. Also double check your slurm.conf file. ________________________________________ From: [email protected] [[email protected]] On Behalf Of Fred Liu [[email protected]] Sent: Tuesday, March 15, 2011 6:44 AM To: [email protected]; [email protected] Subject: RE: [slurm-dev] low socket*core*thread count 1?
Thanks. But I set Procs=1. And I have found if make this node be the controller node, it works well. But another node in CentOS5.5 shows DOWN. BTW, I use CentOS 3.9 in this node and SLURM 2.2.3 in all modes.. It is really weird. Thanks. Fred -----Original Message----- From: [email protected] [mailto:[email protected]] Sent: Tuesday, March 15, 2011 9:33 PM To: [email protected]; Fred Liu Subject: Re: [slurm-dev] low socket*core*thread count 1? Your compute node has fewer CPUs than configured in slurm.conf. If you execute /usr/sbin/slurmd -C on a compute node, it will tell you exactly how many sockets, cores, thread, memory and temporary disk space are found on the node. If the slurm.conf file has higher values configured, then node will be marked DOWN as you see. You do not need to configure sockets, cores and threads, but only CPU (Procs=) if you are only worried about allocating CPUs and not about the task topology. Quoting Fred Liu <[email protected]>: > Hi, > > What does "Low CPUS" mean? > How can I make my node not in DOWN stat? > > scontrol show node cnlnx03 > NodeName=cnlnx03 Arch=x86_64 CoresPerSocket=1 > CPUAlloc=0 CPUErr=0 CPUTot=2 Features=(null) > Gres=(null) > OS=Linux RealMemory=1 Sockets=2 > State=DOWN ThreadsPerCore=1 TmpDisk=0 Weight=1 > BootTime=2010-12-27T09:33:49 SlurmdStartTime=2011-03-15T13:54:59 > Reason=Low CPUs [slurm@2011-03-15T13:41:38] > > Thanks. > > Fred > >
