Hi,

We are running 16.05.10-2 with power-saving.  However, we have noticed a
problem recently when nodes are woken up in order to start a job.  The
node will go from 'idle~' to, say, 'mixed#', but then the job will fail
and the node will be put in 'down*'.  We have turned up the log level to
'debug' with the DebugFlag 'Power', but this hasn't produced anything
relevant.  The problem is, however, resolved if the node is rebooted.

Thus, there seems to be some disturbance of the communication between
the slurmd on the woken node and the slurmctd on the administration
node.  Does anyone have any idea what might be going on?

Cheers,

Loris

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin         Email [email protected]

Reply via email to