[slurm-dev] Re: the nodes in state down*

2015-10-09 Thread James Oguya
Sounds like slurmd isn't running on the other nodes and/or the compute
nodes can't contact the controller. I think you should attach your slurmd
logs to this thread.

On Thu, Oct 8, 2015 at 7:23 PM, Novosielski, Ryan <novos...@ca.rutgers.edu>
wrote:

> Well, it's not going to work if you don't have that. The log files should
> tell you why it won't start. My guess is that you don't have munged running.
>
>  *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
> || \\UTGERS  |-*O*-
> ||_// Biomedical | Ryan Novosielski - Senior Technologist
> || \\ and Health | novos...@rutgers.edu- 973/972.0922 (2x0922)
> ||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
> `'
>
> On Oct 8, 2015, at 11:25, Fany Pagés Díaz <fpa...@citi.cu> wrote:
>
> The node compute-0-0 is ok, but the nodes compute-0-1 and compute-0-2 are
> failed the slumd demon.
>
>
>
> *De:* Paul Edmon [mailto:ped...@cfa.harvard.edu <ped...@cfa.harvard.edu>]
> *Enviado el:* jueves, 08 de octubre de 2015 11:07
> *Para:* slurm-dev
> *Asunto:* [slurm-dev] Re: the nodes in state down*
>
>
>
> The only other thing I can think of it check that the node daemons are up
> and okay.
>
> -Paul Edmon-
>
> On 10/08/2015 11:04 AM, Fany Pagés Díaz wrote:
>
> My networks it looks fine, and the communication with the nodes too, the
> problems is with slurm,  the slumd demon is failed, I never made any
> configuration with acls but I am going to check. if you have another idea,
> please let me know.
>
> thanks
>
> *De:* Paul Edmon [mailto:ped...@cfa.harvard.edu <ped...@cfa.harvard.edu>]
> *Enviado el:* jueves, 08 de octubre de 2015 10:20
> *Para:* slurm-dev
> *Asunto:* [slurm-dev] Re: the nodes in state down*
>
>
>
> Typically that means that the master is having problems communicating with
> the nodes.  I would check your networking, especially your ACLs.
>
> -Paul Edmon-
>
> On 10/08/2015 10:15 AM, Fany Pagés Díaz wrote:
>
> I have a cluster with 3 nodes, and yesterday is incorrectly turned off by
> electrical problems. When I started the cluster, the slurm doesn´t work
> correctly, the state of the nodes appear in down *. I put it  to the idle
> state but put back in state down *, Any can help me?
>
> Thanks,
>
> Ing. Fany Pages Diaz
>
>
>
>
>
>


-- 
*James Oguya*


[slurm-dev] Re: the nodes in state down*

2015-10-08 Thread Paul Edmon
Typically that means that the master is having problems communicating 
with the nodes.  I would check your networking, especially your ACLs.


-Paul Edmon-

On 10/08/2015 10:15 AM, Fany Pagés Díaz wrote:


I have a cluster with 3 nodes, and yesterday isincorrectly turned off 
by electrical problems. When I started the cluster, the slurm doesn´t 
work correctly, the state of the nodes appear in down *. I put it to 
the idle state but put back in state down *, Any can help me?


Thanks,

Ing. Fany Pages Diaz





[slurm-dev] Re: the nodes in state down*

2015-10-08 Thread Fany Pagés Díaz
My networks it looks fine, and the communication with the nodes too, the 
problems is with slurm,  the slumd demon is failed, I never made any 
configuration with acls but I am going to check. if you have another idea, 
please let me know. 

thanks

De: Paul Edmon [mailto:ped...@cfa.harvard.edu] 
Enviado el: jueves, 08 de octubre de 2015 10:20
Para: slurm-dev
Asunto: [slurm-dev] Re: the nodes in state down*

 

Typically that means that the master is having problems communicating with the 
nodes.  I would check your networking, especially your ACLs.

-Paul Edmon-

On 10/08/2015 10:15 AM, Fany Pagés Díaz wrote:

I have a cluster with 3 nodes, and yesterday is incorrectly turned off by 
electrical problems. When I started the cluster, the slurm doesn´t work 
correctly, the state of the nodes appear in down *. I put it  to the idle state 
but put back in state down *, Any can help me?

Thanks,

Ing. Fany Pages Diaz

 



[slurm-dev] Re: the nodes in state down*

2015-10-08 Thread Paul Edmon
The only other thing I can think of it check that the node daemons are 
up and okay.


-Paul Edmon-

On 10/08/2015 11:04 AM, Fany Pagés Díaz wrote:


My networks it looks fine, and the communication with the nodes too, 
the problems is with slurm,  the slumd demon is failed, I never made 
any configuration with acls but I am going to check. if you have 
another idea, please let me know.


thanks

*De:*Paul Edmon [mailto:ped...@cfa.harvard.edu]
*Enviado el:* jueves, 08 de octubre de 2015 10:20
*Para:* slurm-dev
*Asunto:* [slurm-dev] Re: the nodes in state down*

Typically that means that the master is having problems communicating 
with the nodes.  I would check your networking, especially your ACLs.


-Paul Edmon-

On 10/08/2015 10:15 AM, Fany Pagés Díaz wrote:

I have a cluster with 3 nodes, and yesterday isincorrectly turned
off by electrical problems. When I started the cluster, the slurm
doesn´t work correctly, the state of the nodes appear in down *. I
put it to the idle state but put back in state down *, Any can
help me?

Thanks,

Ing. Fany Pages Diaz





[slurm-dev] Re: the nodes in state down*

2015-10-08 Thread Fany Pagés Díaz
The node compute-0-0 is ok, but the nodes compute-0-1 and compute-0-2 are 
failed the slumd demon. 

 

De: Paul Edmon [mailto:ped...@cfa.harvard.edu] 
Enviado el: jueves, 08 de octubre de 2015 11:07
Para: slurm-dev
Asunto: [slurm-dev] Re: the nodes in state down*

 

The only other thing I can think of it check that the node daemons are up and 
okay.

-Paul Edmon-

On 10/08/2015 11:04 AM, Fany Pagés Díaz wrote:

My networks it looks fine, and the communication with the nodes too, the 
problems is with slurm,  the slumd demon is failed, I never made any 
configuration with acls but I am going to check. if you have another idea, 
please let me know. 

thanks

De: Paul Edmon [mailto:ped...@cfa.harvard.edu] 
Enviado el: jueves, 08 de octubre de 2015 10:20
Para: slurm-dev
Asunto: [slurm-dev] Re: the nodes in state down*

 

Typically that means that the master is having problems communicating with the 
nodes.  I would check your networking, especially your ACLs.

-Paul Edmon-

On 10/08/2015 10:15 AM, Fany Pagés Díaz wrote:

I have a cluster with 3 nodes, and yesterday is incorrectly turned off by 
electrical problems. When I started the cluster, the slurm doesn´t work 
correctly, the state of the nodes appear in down *. I put it  to the idle state 
but put back in state down *, Any can help me?

Thanks,

Ing. Fany Pages Diaz