Hi!
I noticed that some nodes are shown as allocated, but in fact no jobs
are running on them. Therefore no jobs are assigned to the nodes.
Restart of slurmctld or slurmd doesn't help. I have to put a node into
down state, and then resume it, if I want it to accept jobs.
I use slurm 2.6.5 on all nodes.
I added a new node today and the same problem appeared
# scontrol show node wn060
NodeName=wn060 Arch=x86_64 CoresPerSocket=12
CPUAlloc=24 CPUErr=0 CPUTot=24 CPULoad=0.00 Features=(null)
Gres=(null)
NodeAddr=wn060 NodeHostName=wn060
OS=Linux RealMemory=129035 AllocMem=108000 Sockets=2 Boards=1
State=ALLOCATED ThreadsPerCore=1 TmpDisk=0 Weight=1
BootTime=2014-03-10T17:53:13 SlurmdStartTime=2014-03-12T09:04:28
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
# squeue | grep wn060 shows nothing
And I don't see anything in slurmd logs.
[2014-03-12T08:56:14.775] update_node: node wn060 reason set to: test
[2014-03-12T08:56:14.775] update_node: node wn060 state set to DRAINING
[2014-03-12T09:18:38.721] update_node: node wn060 state set to ALLOCATED
# sacct -N wn060
JobID JobName Partition Account AllocCPUS State
ExitCode
------------ ---------- ---------- ---------- ---------- ----------
--------
Is there an easy way to detect nodes that are empty, but shown as allocated?
Anyone experienced this before?
Thanks for your help.
Regards,
Barbara