Hi,
I have a job shown as running by 'squeue':
$ squeue -w node086
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
1234567 main abcdef user1234 R 10-09:32:34 1 node086
However with 'sinfo' I can see that the node has been powered off:
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
test up 3:00:00 2 idle~ node[001-002]
main* up 14-00:00:0 1 idle~ node086
...
This is the second time I have seen this phenomenon since updating to
version 15.08.8 a month ago.
Is this a bug or can this just happen if a job just crashes in an odd
enough way?
Cheers,
Loris
--
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin Email [email protected]