I didn't mark the node as "drained"
But after issuing the command scontrol update NodeName="qdr3" State="IDLE";
sinfo showed both nodes to be idle and usable.
I was also able to execute MPI jobs.

Thanks.


On Fri, Jun 27, 2014 at 2:35 PM, Paddy Doyle <[email protected]> wrote:

>
> Hi Arjun,
>
> On Fri, Jun 27, 2014 at 12:25:58AM -0700, Arjun J Rao wrote:
>
> > Have SLURM set up on a cluster of 2 nodes qdr[3-4]
> > Running sinfo shows the two nodes to be in a perpetual drain state.
> >
> > sinfo -R yields the following :
> > REASON           USER           TIMESTAMP                       NODELIST
> > Epilog error         root              2014-02-03 T15:53:40
> > qdr3
> > Epilog error         root              2014-02-03 T15:52:42
> > qdr4
> >
> > The epilog error occured on 3rd February! (More than 4 months ago)
> >
> > Why is this happening ?
>
> Maybe an obvious question, but have you set the nodes to be 'resume' or
> 'idle'
> using scontrol since then? In our setup at least, once a node is marked
> 'down',
> we have to manually clear it to either 'resume' or 'idle'.
>
> Paddy
>
> --
> Paddy Doyle
> Trinity Centre for High Performance Computing,
> Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
> Phone: +353-1-896-3725
> http://www.tchpc.tcd.ie/
> <http://t.signauxun.com/link?url=http%3A%2F%2Fwww.tchpc.tcd.ie%2F&ukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAgL27w5MKDA&k=1f7bb1d4-b936-4bb0-a3ec-61a63d3e760a>
>

Reply via email to