Have SLURM set up on a cluster of 2 nodes qdr[3-4]
Running sinfo shows the two nodes to be in a perpetual drain state.

sinfo -R yields the following :
REASON           USER           TIMESTAMP                       NODELIST
Epilog error         root              2014-02-03 T15:53:40
qdr3
Epilog error         root              2014-02-03 T15:52:42
qdr4

The epilog error occured on 3rd February! (More than 4 months ago)

Why is this happening ?

Reply via email to