Loris Bennett <loris.benn...@fu-berlin.de> writes:

> Hi,
>
> I have a node which is powered on and to which I have sent a job.  The
> output of sinfo is
>
>   PARTITION   AVAIL  TIMELIMIT  NODES  STATE NODELIST
>   test           up 7-00:00:00      1   mix~ node001
>
> The output of squeue is
>
>     JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
>   1795993      test 7_single    loris CF      24:29      1 node001
>
> I don't understand the node state 'mix~'.  If at all, I would only
> expect it to exist very briefly between 'idle~' and 'mix#'.  The '~' is
> certainly incorrect, as the node is not in a power-saving state, which
> in our case is powered-off.
>
> This problem may have existed in 16.05.10-2, but currently we are using
> 17.02.7. All other nodes in the cluster apart from one are functioning
> normally.
>
> Does anyone have any idea what we might be doing wrong?

I still don't know what the problem was, but I got the node back into a
sensible state by setting the state to FAIL, rebooting the node, and
then setting the state to RESUME.

Cheers,

Loris

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin         Email loris.benn...@fu-berlin.de

Reply via email to