Have SLURM set up on a cluster of 2 nodes qdr[3-4] Running sinfo shows the two nodes to be in a perpetual drain state.
sinfo -R yields the following : REASON USER TIMESTAMP NODELIST Epilog error root 2014-02-03 T15:53:40 qdr3 Epilog error root 2014-02-03 T15:52:42 qdr4 The epilog error occured on 3rd February! (More than 4 months ago) Why is this happening ?
