And the log file is not informative tail -f /var/log/slurm-llnl/slurmd.log
... [2013-08-26T11:52:16] Slurmd shutdown completing [2013-08-26T11:52:56] slurmd version 2.3.4 started [2013-08-26T11:52:56] slurmd started on Mon 26 Aug 2013 11:52:56 +0200 [2013-08-26T11:52:56] Procs=1 Sockets=1 Cores=1 Threads=1 Memory=2012 TmpDisk=9069 Uptime=1122626 ----- Mail original ----- > De: "Sivasangari Nandy" <[email protected]> > À: "slurm-dev" <[email protected]> > Envoyé: Lundi 26 Août 2013 14:28:28 > Objet: Re: [slurm-dev] Re: Required node not available (down or > drained) > Hi, > I have checked some things, now my slurmctld and slurmd are in a > single machine (using just one node) so the test is easier. > For that I have modified the conf file : vi > /etc/slurm-llnl/slurm.conf > Slurmctld and slurmd are both running, here my ps result : > root@VM-667:/etc/slurm-llnl# ps -ef | grep slurm > root 31712 31706 0 11:44 pts/1 00:00:00 tail -f > /var/log/slurm-llnl/slurmd.log > slurm 31990 1 0 11:52 ? 00:00:00 /usr/sbin/slurmctld > root 32103 1 0 11:52 ? 00:00:00 /usr/sbin/slurmd -c > root 32125 30346 0 11:53 pts/0 00:00:00 grep slurm > So i have tried srun again but got this error yet: > !srun > srun /omaha-beach/test.sh > srun: Required node not available (down or drained) > srun: job 64 queued and waiting for resources > Have you got any idea of the problem ? > thanks, > Siva > ----- Mail original ----- > > De: "Nikita Burtsev" <[email protected]> > > > À: "slurm-dev" <[email protected]> > > > Envoyé: Jeudi 22 Août 2013 09:59:52 > > > Objet: [slurm-dev] Re: Required node not available (down or > > drained) > > > Re: [slurm-dev] Re: Required node not available (down or drained) > > > You need to have slurmd running on all nodes that will execute > > jobs, > > so you should start it with init script. > > > -- > > > Nikita Burtsev > > > Sent with Sparrow > > > On Thursday, August 22, 2013 at 11:55 AM, Sivasangari Nandy wrote: > > > > " check if the slurmd daemon is running with the command " ps -el > > > | > > > grep slurmd ". " > > > > > > Nothing is happened with ps -el ... > > > > > > root@VM-667:~# ps -el | grep slurmd > > > > > > > De: "Nikita Burtsev" < [email protected] > > > > > > > > > > > À: "slurm-dev" < [email protected] > > > > > > > > > > > Envoyé: Mercredi 21 Août 2013 18:58:52 > > > > > > > > > > Objet: [slurm-dev] Re: Required node not available (down or > > > > drained) > > > > > > > > > > Re: [slurm-dev] Re: Required node not available (down or > > > > drained) > > > > > > > > > > slurmctld is the management process and since your have access > > > > to > > > > squeue/sinfo information it is running just fine. You need to > > > > check > > > > if slurmd (which is the agent part) is running on your nodes, > > > > i.e. > > > > VM-[669-671] > > > > > > > > > > -- > > > > > > > > > > Nikita Burtsev > > > > > > > > > > On Wednesday, August 21, 2013 at 8:13 PM, Sivasangari Nandy > > > > wrote: > > > > > > > > > > > I have tried : > > > > > > > > > > > > > > > /etc/init.d/slurm-llnl start > > > > > > > > > > > > > > > [ ok ] Starting slurm central management daemon: slurmctld. > > > > > > > > > > > > > > > /usr/sbin/slurmctld already running. > > > > > > > > > > > > > > > And : > > > > > > > > > > > > > > > scontrol show slurmd > > > > > > > > > > > > > > > scontrol: error: slurm_slurmd_info: Connection refused > > > > > > > > > > > > > > > slurm_load_slurmd_status: Connection refused > > > > > > > > > > > > > > > Hum how to proceed to repair that problem ? > > > > > > > > > > > > > > > > De: "Danny Auble" < [email protected] > > > > > > > > > > > > > > > > > > > > > > À: "slurm-dev" < [email protected] > > > > > > > > > > > > > > > > > > > > > > Envoyé: Mercredi 21 Août 2013 15:36:53 > > > > > > > > > > > > > > > > > > > > > Objet: [slurm-dev] Re: Required node not available (down or > > > > > > drained) > > > > > > > > > > > > > > > > > > > > > Check your slurmd log. It doesn't appear the slurmd is > > > > > > running. > > > > > > > > > > > > > > > > > > > > > Sivasangari Nandy < [email protected] > wrote: > > > > > > > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm trying to use Slurm for the first time, and I got > > > > > > > > > a > > > > > > > > > problem > > > > > > > > > with > > > > > > > > > nodes I think. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have this message when I used squeue : > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > root@VM-667:~# squeue > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > JOBID PARTITION NAME USER ST TIME NODES > > > > > > > > > NODELIST(REASON) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 50 SLURM-deb test.sh root PD ; 0:00 1 > > > > > > > > > (ReqNodeNotAvail) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > or this one with an other squeue : > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > root@VM-671:~# squeue > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > JOBID PARTITION NAME USER ST TIME NODES > > > > > > > > > NODELIST(REASON) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 50 SLURM-deb test.sh root PD 0:00 &n bsp; 1 > > > > > > > > > (Resources) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > sinfo gives me : > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > PARTITION AVAIL TIMELIMIT NODES STATE NODELIST > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > SLURM-de* up infinite 3 down VM-[669-671] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have already used slurm one time with the same > > > > > > > > > configuration > > > > > > > > > and > > > > > > > > > I > > > > > > > > > wan able to run my job. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > But now the second time I always got : > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > srun: Required node not available (down or drained) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > srun: job 51 queued and waiting for resources > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Advance thanks for your help, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Siva > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > Siva sangari NANDY - Plate-forme GenOuest > > > > > > > > > > > > > > > IRISA-INRIA, Campus de Beaulieu > > > > > > > > > > > > > > > 263 Avenue du Général Leclerc > > > > > > > > > > > > > > > 35042 Rennes cedex, France > > > > > > > > > > > > > > > Tél: +33 (0) 2 99 84 25 69 > > > > > > > > > > > > > > > Bureau : D152 > > > > > > > > > > > > > -- > > > > > > Siva sangari NANDY - Plate-forme GenOuest > > > > > > IRISA-INRIA, Campus de Beaulieu > > > > > > 263 Avenue du Général Leclerc > > > > > > 35042 Rennes cedex, France > > > > > > Tél: +33 (0) 2 99 84 25 69 > > > > > > Bureau : D152 > > > > -- > Siva sangari NANDY - Plate-forme GenOuest > IRISA-INRIA, Campus de Beaulieu > 263 Avenue du Général Leclerc > 35042 Rennes cedex, France > Tél: +33 (0) 2 99 84 25 69 > Bureau : D152 -- Siva sangari NANDY - Plate-forme GenOuest IRISA-INRIA, Campus de Beaulieu 263 Avenue du Général Leclerc 35042 Rennes cedex, France Tél: +33 (0) 2 99 84 25 69 Bureau : D152
