VM-667 where you have slurmctld running is your master, you don't need the agent part on it. As i understand your setup VM-[669-671] are your actual nodes, so you need to check if slurmd is running on those 3 and start it if needed.
-- Nikita Burtsev On Thursday, August 22, 2013 at 12:02 PM, Sivasangari Nandy wrote: > that's what i have done yesterday actually : > > /etc/init.d/slurm-llnl start > > [ ok ] Starting slurm central management daemon: slurmctld. > /usr/sbin/slurmctld already running. > > > De: "Nikita Burtsev" <[email protected] > > (mailto:[email protected])> > > À: "slurm-dev" <[email protected] (mailto:[email protected])> > > Envoyé: Jeudi 22 Août 2013 09:59:52 > > Objet: [slurm-dev] Re: Required node not available (down or drained) > > > > Re: [slurm-dev] Re: Required node not available (down or drained) > > You need to have slurmd running on all nodes that will execute jobs, so you > > should start it with init script. > > > > -- > > Nikita Burtsev > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig) > > > > > > On Thursday, August 22, 2013 at 11:55 AM, Sivasangari Nandy wrote: > > > > > "check if the slurmd daemon is running with the command "ps -el | grep > > > slurmd"." > > > > > > Nothing is happened with ps -el ... > > > > > > root@VM-667:~# ps -el | grep slurmd > > > > > > > De: "Nikita Burtsev" <[email protected] > > > > (mailto:[email protected])> > > > > À: "slurm-dev" <[email protected] (mailto:[email protected])> > > > > Envoyé: Mercredi 21 Août 2013 18:58:52 > > > > Objet: [slurm-dev] Re: Required node not available (down or drained) > > > > > > > > Re: [slurm-dev] Re: Required node not available (down or drained) > > > > slurmctld is the management process and since your have access to > > > > squeue/sinfo information it is running just fine. You need to check if > > > > slurmd (which is the agent part) is running on your nodes, i.e. > > > > VM-[669-671] > > > > > > > > -- > > > > Nikita Burtsev > > > > > > > > > > > > On Wednesday, August 21, 2013 at 8:13 PM, Sivasangari Nandy wrote: > > > > > > > > > I have tried : > > > > > > > > > > /etc/init.d/slurm-llnl start > > > > > > > > > > [ ok ] Starting slurm central management daemon: slurmctld. > > > > > /usr/sbin/slurmctld already running. > > > > > > > > > > And : > > > > > > > > > > scontrol show slurmd > > > > > > > > > > scontrol: error: slurm_slurmd_info: Connection refused > > > > > slurm_load_slurmd_status: Connection refused > > > > > > > > > > > > > > > Hum how to proceed to repair that problem ? > > > > > > > > > > > > > > > > De: "Danny Auble" <[email protected] (mailto:[email protected])> > > > > > > À: "slurm-dev" <[email protected] > > > > > > (mailto:[email protected])> > > > > > > Envoyé: Mercredi 21 Août 2013 15:36:53 > > > > > > Objet: [slurm-dev] Re: Required node not available (down or drained) > > > > > > > > > > > > Check your slurmd log. It doesn't appear the slurmd is running. > > > > > > > > > > > > Sivasangari Nandy <[email protected] > > > > > > (mailto:[email protected])> wrote: > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > I'm trying to use Slurm for the first time, and I got a > > > > > > > > > problem with nodes I think. > > > > > > > > > I have this message when I used squeue : > > > > > > > > > > > > > > > > > > root@VM-667:~# squeue > > > > > > > > > JOBID PARTITION NAME USER ST TIME NODES > > > > > > > > > NODELIST(REASON) > > > > > > > > > 50 SLURM-deb test.sh (http://test.sh) root PD > > > > > > > > > ; 0:00 1 (ReqNodeNotAvail) > > > > > > > > > > > > > > > > > > or this one with an other squeue : > > > > > > > > > > > > > > > > > > root@VM-671:~# squeue > > > > > > > > > JOBID PARTITION NAME USER ST TIME NODES > > > > > > > > > NODELIST(REASON) > > > > > > > > > 50 SLURM-deb test.sh (http://test.sh) root PD > > > > > > > > > 0:00 &n bsp; 1 (Resources) > > > > > > > > > > > > > > > > > > sinfo gives me : > > > > > > > > > > > > > > > > > > PARTITION AVAIL TIMELIMIT NODES STATE NODELIST > > > > > > > > > SLURM-de* up infinite 3 down VM-[669-671] > > > > > > > > > > > > > > > > > > > > > > > > > > > I have already used slurm one time with the same > > > > > > > > > configuration and I wan able to run my job. > > > > > > > > > But now the second time I always got : > > > > > > > > > > > > > > > > > > srun: Required node not available (down or drained) > > > > > > > > > srun: job 51 queued and waiting for resources > > > > > > > > > > > > > > > > > > > > > > > > > > > Advance thanks for your help, > > > > > > > > > Siva > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Sivasangari NANDY - Plate-forme GenOuest > > > > > IRISA-INRIA, Campus de Beaulieu > > > > > 263 Avenue du Général Leclerc > > > > > 35042 Rennes cedex, France > > > > > Tél: +33 (0) 2 99 84 25 69 > > > > > Bureau : D152 > > > > > > > > > > > > > > > > > > > > > -- > > > Sivasangari NANDY - Plate-forme GenOuest > > > IRISA-INRIA, Campus de Beaulieu > > > 263 Avenue du Général Leclerc > > > 35042 Rennes cedex, France > > > Tél: +33 (0) 2 99 84 25 69 > > > Bureau : D152 > > > > > > > > > -- > Sivasangari NANDY - Plate-forme GenOuest > IRISA-INRIA, Campus de Beaulieu > 263 Avenue du Général Leclerc > 35042 Rennes cedex, France > Tél: +33 (0) 2 99 84 25 69 > Bureau : D152 >
