It didn't show anything.
I checked sinfo -l which showed that the node is in mixed state, node state (checked with scontrol) showed that the node is allocated, but nothing was running on the node. (checked on the node and with squeue). I located 8 nodes with the same problem. I restarted slurmctld, I changed the problematic nodes' state=down, then enabled the nodes again. Now it seems to work. But I would still like to find the reason for this behaviour.

Thanks,
Barbara

On 03/12/2014 12:49 PM, Andy Riebs wrote:

Hi Barbara,

The output of "sinfo -l" and "sinfo -R" may be helpful to figure out what's going on.

Andy

On 03/12/2014 04:29 AM, Barbara Krasovec wrote:

Hi!

I noticed that some nodes are shown as allocated, but in fact no jobs are running on them. Therefore no jobs are assigned to the nodes. Restart of slurmctld or slurmd doesn't help. I have to put a node into down state, and then resume it, if I want it to accept jobs.

I use slurm 2.6.5 on all nodes.

I added a new node today and the same problem appeared

# scontrol show node wn060
NodeName=wn060 Arch=x86_64 CoresPerSocket=12
   CPUAlloc=24 CPUErr=0 CPUTot=24 CPULoad=0.00 Features=(null)
   Gres=(null)
   NodeAddr=wn060 NodeHostName=wn060
   OS=Linux RealMemory=129035 AllocMem=108000 Sockets=2 Boards=1
   State=ALLOCATED ThreadsPerCore=1 TmpDisk=0 Weight=1
   BootTime=2014-03-10T17:53:13 SlurmdStartTime=2014-03-12T09:04:28
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

# squeue | grep wn060 shows nothing

And I don't see anything in slurmd logs.
[2014-03-12T08:56:14.775] update_node: node wn060 reason set to: test
[2014-03-12T08:56:14.775] update_node: node wn060 state set to DRAINING
[2014-03-12T09:18:38.721] update_node: node wn060 state set to ALLOCATED

# sacct -N wn060
       JobID    JobName  Partition    Account  AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------

Is there an easy way to detect nodes that are empty, but shown as allocated?
Anyone experienced this before?

Thanks for your help.
Regards,
Barbara



Reply via email to