[slurm-dev] Re: the nodes in state down*
Sounds like slurmd isn't running on the other nodes and/or the compute nodes can't contact the controller. I think you should attach your slurmd logs to this thread. On Thu, Oct 8, 2015 at 7:23 PM, Novosielski, Ryan <novos...@ca.rutgers.edu> wrote: > Well, it's not going to work if you don't have that. The log files should > tell you why it won't start. My guess is that you don't have munged running. > > *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences* > || \\UTGERS |-*O*- > ||_// Biomedical | Ryan Novosielski - Senior Technologist > || \\ and Health | novos...@rutgers.edu- 973/972.0922 (2x0922) > || \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark > `' > > On Oct 8, 2015, at 11:25, Fany Pagés Díaz <fpa...@citi.cu> wrote: > > The node compute-0-0 is ok, but the nodes compute-0-1 and compute-0-2 are > failed the slumd demon. > > > > *De:* Paul Edmon [mailto:ped...@cfa.harvard.edu <ped...@cfa.harvard.edu>] > *Enviado el:* jueves, 08 de octubre de 2015 11:07 > *Para:* slurm-dev > *Asunto:* [slurm-dev] Re: the nodes in state down* > > > > The only other thing I can think of it check that the node daemons are up > and okay. > > -Paul Edmon- > > On 10/08/2015 11:04 AM, Fany Pagés Díaz wrote: > > My networks it looks fine, and the communication with the nodes too, the > problems is with slurm, the slumd demon is failed, I never made any > configuration with acls but I am going to check. if you have another idea, > please let me know. > > thanks > > *De:* Paul Edmon [mailto:ped...@cfa.harvard.edu <ped...@cfa.harvard.edu>] > *Enviado el:* jueves, 08 de octubre de 2015 10:20 > *Para:* slurm-dev > *Asunto:* [slurm-dev] Re: the nodes in state down* > > > > Typically that means that the master is having problems communicating with > the nodes. I would check your networking, especially your ACLs. > > -Paul Edmon- > > On 10/08/2015 10:15 AM, Fany Pagés Díaz wrote: > > I have a cluster with 3 nodes, and yesterday is incorrectly turned off by > electrical problems. When I started the cluster, the slurm doesn´t work > correctly, the state of the nodes appear in down *. I put it to the idle > state but put back in state down *, Any can help me? > > Thanks, > > Ing. Fany Pages Diaz > > > > > > -- *James Oguya*
[slurm-dev] Re: the nodes in state down*
Typically that means that the master is having problems communicating with the nodes. I would check your networking, especially your ACLs. -Paul Edmon- On 10/08/2015 10:15 AM, Fany Pagés Díaz wrote: I have a cluster with 3 nodes, and yesterday isincorrectly turned off by electrical problems. When I started the cluster, the slurm doesn´t work correctly, the state of the nodes appear in down *. I put it to the idle state but put back in state down *, Any can help me? Thanks, Ing. Fany Pages Diaz
[slurm-dev] Re: the nodes in state down*
My networks it looks fine, and the communication with the nodes too, the problems is with slurm, the slumd demon is failed, I never made any configuration with acls but I am going to check. if you have another idea, please let me know. thanks De: Paul Edmon [mailto:ped...@cfa.harvard.edu] Enviado el: jueves, 08 de octubre de 2015 10:20 Para: slurm-dev Asunto: [slurm-dev] Re: the nodes in state down* Typically that means that the master is having problems communicating with the nodes. I would check your networking, especially your ACLs. -Paul Edmon- On 10/08/2015 10:15 AM, Fany Pagés Díaz wrote: I have a cluster with 3 nodes, and yesterday is incorrectly turned off by electrical problems. When I started the cluster, the slurm doesn´t work correctly, the state of the nodes appear in down *. I put it to the idle state but put back in state down *, Any can help me? Thanks, Ing. Fany Pages Diaz
[slurm-dev] Re: the nodes in state down*
The only other thing I can think of it check that the node daemons are up and okay. -Paul Edmon- On 10/08/2015 11:04 AM, Fany Pagés Díaz wrote: My networks it looks fine, and the communication with the nodes too, the problems is with slurm, the slumd demon is failed, I never made any configuration with acls but I am going to check. if you have another idea, please let me know. thanks *De:*Paul Edmon [mailto:ped...@cfa.harvard.edu] *Enviado el:* jueves, 08 de octubre de 2015 10:20 *Para:* slurm-dev *Asunto:* [slurm-dev] Re: the nodes in state down* Typically that means that the master is having problems communicating with the nodes. I would check your networking, especially your ACLs. -Paul Edmon- On 10/08/2015 10:15 AM, Fany Pagés Díaz wrote: I have a cluster with 3 nodes, and yesterday isincorrectly turned off by electrical problems. When I started the cluster, the slurm doesn´t work correctly, the state of the nodes appear in down *. I put it to the idle state but put back in state down *, Any can help me? Thanks, Ing. Fany Pages Diaz
[slurm-dev] Re: the nodes in state down*
The node compute-0-0 is ok, but the nodes compute-0-1 and compute-0-2 are failed the slumd demon. De: Paul Edmon [mailto:ped...@cfa.harvard.edu] Enviado el: jueves, 08 de octubre de 2015 11:07 Para: slurm-dev Asunto: [slurm-dev] Re: the nodes in state down* The only other thing I can think of it check that the node daemons are up and okay. -Paul Edmon- On 10/08/2015 11:04 AM, Fany Pagés Díaz wrote: My networks it looks fine, and the communication with the nodes too, the problems is with slurm, the slumd demon is failed, I never made any configuration with acls but I am going to check. if you have another idea, please let me know. thanks De: Paul Edmon [mailto:ped...@cfa.harvard.edu] Enviado el: jueves, 08 de octubre de 2015 10:20 Para: slurm-dev Asunto: [slurm-dev] Re: the nodes in state down* Typically that means that the master is having problems communicating with the nodes. I would check your networking, especially your ACLs. -Paul Edmon- On 10/08/2015 10:15 AM, Fany Pagés Díaz wrote: I have a cluster with 3 nodes, and yesterday is incorrectly turned off by electrical problems. When I started the cluster, the slurm doesn´t work correctly, the state of the nodes appear in down *. I put it to the idle state but put back in state down *, Any can help me? Thanks, Ing. Fany Pages Diaz