I'm setting up a new test cluster with SLURM 15.08.5 (we run Torque/Maui
on our production cluster). We have a SLURM master server running
CentOS 7.2 and two compute nodes on separate subnets (10.1.. and
10.2..). I'm writing a SLURM installation HowTo page as I go along:
https://wiki.fysik.dtu.dk/niflheim/SLURM
I'm now facing a problem running a trivial test:
# srun -N1 --constraint="opteron4" /bin/hostname
srun: error: Unable to allocate resources: Requested node configuration
is not available
Question: What may be causing the available node with property
"opteron4" to reject jobs?
The other partition works just fine:
# srun -N1 --constraint="xeon8" /bin/hostname
a012.dcsc.fysik.dtu.dk
FYI, the node status is:
# scontrol show nodes
NodeName=a012 Arch=x86_64 CoresPerSocket=4
CPUAlloc=8 CPUErr=0 CPUTot=8 CPULoad=0.01
Features=xeon5570,hp5412e,ethernet,xeon8
Gres=(null)
NodeAddr=a012 NodeHostName=a012 Version=15.08
OS=Linux RealMemory=23900 AllocMem=0 FreeMem=22859 Sockets=2 Boards=1
State=IDLE+COMPLETING ThreadsPerCore=1 TmpDisk=32752 Weight=1 Owner=N/A
BootTime=2015-09-08T16:25:29 SlurmdStartTime=2015-12-16T15:29:32
CapWatts=n/a
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
NodeName=q007 Arch=x86_64 CoresPerSocket=2
CPUAlloc=0 CPUErr=0 CPUTot=4 CPULoad=0.01
Features=opteron2218,hp5412b,ethernet,opteron4
Gres=(null)
NodeAddr=q007 NodeHostName=q007 Version=15.08
OS=Linux RealMemory=7820 AllocMem=0 FreeMem=7584 Sockets=2 Boards=1
State=IDLE ThreadsPerCore=1 TmpDisk=32752 Weight=1 Owner=N/A
BootTime=2015-12-17T08:40:49 SlurmdStartTime=2015-12-17T08:41:03
CapWatts=n/a
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
I believe that the nodes are configured identically, except for their
hardware differences.
--
Ole Holm Nielsen
Department of Physics, Technical University of Denmark