Quoting Davis Ford <davisf...@gmail.com>:

Hi Moe, I have tried this..for example the node ORL-APP5 is one of the
nodes that is down.

If your node has Sockets=1 CoresPerSocket=1 ThreadsPerCore=1, then how can Procs=4?

[root@ORL-APP5 ~]# slurmd -C
NodeName=ORL-APP5 Procs=4 Sockets=1 CoresPerSocket=1 ThreadsPerCore=1
RealMemory=629 TmpDisk=3852
[root@ORL-APP5 ~]# scontrol show node ORL-APP5
NodeName=ORL-APP5 Arch=i686 CoresPerSocket=1
   CPUAlloc=0 CPUErr=0 CPUTot=4 Features=(null)
   Gres=(null)
   NodeAddr=192.168.206.47 NodeHostName=ORL-APP5
   OS=Linux RealMemory=629 Sockets=4
   State=DOWN* ThreadsPerCore=1 TmpDisk=3852 Weight=1
   BootTime=2011-10-31T20:23:06 SlurmdStartTime=2012-01-17T17:42:55
   Reason=Low socket*core*thread count [slurm@2012-01-17T17:38:27]

ORL-APP5 is defined in this section of the slurm.conf NodeName:

NodeName=ORL-APP[3,5-6] NodeAddr=192.168.206.[45,47-48] Procs=4 CoresPerSocket=1
ThreadsPerCore=1 RealMemory=300 TmpDisk=200 State=UNKNOWN

Unless I'm missing something, the hardware is a match with the config?  In
the slurmctld logfile, I see this is stated about ORL-APP5:

[2012-01-17T17:39:12] error: Node ORL-APP5 has low socket*core*thread count
(1 < 4)

Maybe I'm missing something here?  The node has 4 sockets, 4 cpus, and 1
core per socket.  The only thing I didn't spec. on the NodeName line was
the 4 sockets.  Is this misconfigured?

Thanks,
Davis




Reply via email to