https://computing.llnl.gov/linux/slurm/troubleshoot.html#nodes  

--  
Nikita Burtsev


On Monday, August 26, 2013 at 7:43 PM, Sivasangari Nandy wrote:

> And the log file is not informative  
>  
> tail -f /var/log/slurm-llnl/slurmd.log
>  
> ...
> [2013-08-26T11:52:16] Slurmd shutdown completing
> [2013-08-26T11:52:56] slurmd version 2.3.4 started
> [2013-08-26T11:52:56] slurmd started on Mon 26 Aug 2013 11:52:56 +0200
> [2013-08-26T11:52:56] Procs=1 Sockets=1 Cores=1 Threads=1 Memory=2012 
> TmpDisk=9069 Uptime=1122626
>  
>  
> > De: "Sivasangari Nandy" <[email protected] 
> > (mailto:[email protected])>
> > À: "slurm-dev" <[email protected] (mailto:[email protected])>
> > Envoyé: Lundi 26 Août 2013 14:28:28
> > Objet: Re: [slurm-dev] Re: Required node not available (down or drained)
> >  
> > Hi,  
> >  
> > I have checked some things, now my slurmctld and slurmd are in a single 
> > machine (using just one node) so the test is easier.
> > For that I have modified the conf file : vi /etc/slurm-llnl/slurm.conf
> >  
> > Slurmctld and slurmd are both running, here my ps result :  
> >  
> > root@VM-667:/etc/slurm-llnl# ps -ef | grep slurm  
> > root     31712 31706  0 11:44 pts/1    00:00:00 tail -f 
> > /var/log/slurm-llnl/slurmd.log
> > slurm    31990     1  0 11:52 ?        00:00:00 /usr/sbin/slurmctld
> > root     32103     1  0 11:52 ?        00:00:00 /usr/sbin/slurmd -c
> > root     32125 30346  0 11:53 pts/0    00:00:00 grep slurm
> >  
> > So i have tried srun again but got this error yet:  
> >  
> > !srun
> > srun /omaha-beach/test.sh (http://test.sh)
> > srun: Required node not available (down or drained)
> > srun: job 64 queued and waiting for resources
> >  
> >  
> > Have you got any idea of the problem ?
> > thanks,
> >  
> > Siva
> >  
> > > De: "Nikita Burtsev" <[email protected] 
> > > (mailto:[email protected])>
> > > À: "slurm-dev" <[email protected] (mailto:[email protected])>
> > > Envoyé: Jeudi 22 Août 2013 09:59:52
> > > Objet: [slurm-dev] Re: Required node not available (down or drained)
> > >  
> > > Re: [slurm-dev] Re: Required node not available (down or drained)  
> > > You need to have slurmd running on all nodes that will execute jobs, so 
> > > you should start it with init script.   
> > >  
> > > --  
> > > Nikita Burtsev
> > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> > >  
> > >  
> > > On Thursday, August 22, 2013 at 11:55 AM, Sivasangari Nandy wrote:
> > >  
> > > > "check if the slurmd daemon is running with the command "ps -el | grep 
> > > > slurmd"."
> > > >  
> > > > Nothing is happened with ps -el ...
> > > >  
> > > > root@VM-667:~# ps -el | grep slurmd
> > > >  
> > > > > De: "Nikita Burtsev" <[email protected] 
> > > > > (mailto:[email protected])>
> > > > > À: "slurm-dev" <[email protected] (mailto:[email protected])>
> > > > > Envoyé: Mercredi 21 Août 2013 18:58:52
> > > > > Objet: [slurm-dev] Re: Required node not available (down or drained)
> > > > >  
> > > > > Re: [slurm-dev] Re: Required node not available (down or drained)  
> > > > > slurmctld is the management process and since your have access to 
> > > > > squeue/sinfo information it is running just fine. You need to check 
> > > > > if slurmd (which is the agent part) is running on your nodes, i.e. 
> > > > > VM-[669-671]  
> > > > >  
> > > > > --  
> > > > > Nikita Burtsev
> > > > >  
> > > > >  
> > > > > On Wednesday, August 21, 2013 at 8:13 PM, Sivasangari Nandy wrote:
> > > > >  
> > > > > > I have tried :  
> > > > > >  
> > > > > > /etc/init.d/slurm-llnl start
> > > > > >  
> > > > > > [ ok ] Starting slurm central management daemon: slurmctld.
> > > > > > /usr/sbin/slurmctld already running.
> > > > > >  
> > > > > > And :  
> > > > > >  
> > > > > > scontrol show slurmd
> > > > > >  
> > > > > > scontrol: error: slurm_slurmd_info: Connection refused
> > > > > > slurm_load_slurmd_status: Connection refused
> > > > > >  
> > > > > >  
> > > > > > Hum how to proceed to repair that problem ?
> > > > > >  
> > > > > >  
> > > > > > > De: "Danny Auble" <[email protected] (mailto:[email protected])>
> > > > > > > À: "slurm-dev" <[email protected] 
> > > > > > > (mailto:[email protected])>
> > > > > > > Envoyé: Mercredi 21 Août 2013 15:36:53
> > > > > > > Objet: [slurm-dev] Re: Required node not available (down or 
> > > > > > > drained)
> > > > > > >  
> > > > > > > Check your slurmd log. It doesn't appear the slurmd is running.
> > > > > > >  
> > > > > > > Sivasangari Nandy <[email protected] 
> > > > > > > (mailto:[email protected])> wrote:
> > > > > > > > > > Hello,  
> > > > > > > > > >  
> > > > > > > > > > I'm trying to use Slurm for the first time, and I got a 
> > > > > > > > > > problem with nodes I think.
> > > > > > > > > > I have this message when I used squeue :
> > > > > > > > > >  
> > > > > > > > > > root@VM-667:~# squeue
> > > > > > > > > >   JOBID PARTITION     NAME     USER  ST       TIME  NODES 
> > > > > > > > > > NODELIST(REASON)
> > > > > > > > > >      50 SLURM-deb  test.sh (http://test.sh)     root  PD    
> > > > > > > > > >  ;   0:00      1 (ReqNodeNotAvail)
> > > > > > > > > >  
> > > > > > > > > > or this one with an other squeue :
> > > > > > > > > >  
> > > > > > > > > > root@VM-671:~# squeue
> > > > > > > > > >   JOBID PARTITION     NAME     USER  ST       TIME  NODES 
> > > > > > > > > > NODELIST(REASON)
> > > > > > > > > >      50 SLURM-deb  test.sh (http://test.sh)     root  PD    
> > > > > > > > > >    0:00   &n bsp;  1 (Resources)
> > > > > > > > > >  
> > > > > > > > > > sinfo gives me :
> > > > > > > > > >  
> > > > > > > > > > PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
> > > > > > > > > > SLURM-de*    up   infinite      3   down VM-[669-671]
> > > > > > > > > >  
> > > > > > > > > >  
> > > > > > > > > > I have already used slurm one time with the same 
> > > > > > > > > > configuration and I wan able to run my job.
> > > > > > > > > > But now the second time I always got :  
> > > > > > > > > >  
> > > > > > > > > > srun: Required node not available (down or drained)
> > > > > > > > > > srun: job 51 queued and waiting for resources
> > > > > > > > > >  
> > > > > > > > > >  
> > > > > > > > > > Advance thanks for your help,  
> > > > > > > > > > Siva
> > > > > > > > > >  
> > > > > > > > > >  
> > > > > > > > >  
> > > > > > > > >  
> > > > > > > >  
> > > > > > > >  
> > > > > > > >  
> > > > > > >  
> > > > > > >  
> > > > > >  
> > > > > >  
> > > > > >  
> > > > > >  
> > > > > > --  
> > > > > > Sivasangari NANDY -  Plate-forme GenOuest
> > > > > > IRISA-INRIA, Campus de Beaulieu
> > > > > > 263 Avenue du Général Leclerc
> > > > > > 35042 Rennes cedex, France
> > > > > > Tél: +33 (0) 2 99 84 25 69
> > > > > > Bureau :  D152
> > > > > >  
> > > > >  
> > > >  
> > > >  
> > > >  
> > > > --  
> > > > Sivasangari NANDY -  Plate-forme GenOuest
> > > > IRISA-INRIA, Campus de Beaulieu
> > > > 263 Avenue du Général Leclerc
> > > > 35042 Rennes cedex, France
> > > > Tél: +33 (0) 2 99 84 25 69
> > > > Bureau :  D152
> > > >  
> > >  
> >  
> >  
> >  
> > --  
> > Sivasangari NANDY -  Plate-forme GenOuest
> > IRISA-INRIA, Campus de Beaulieu
> > 263 Avenue du Général Leclerc
> > 35042 Rennes cedex, France
> > Tél: +33 (0) 2 99 84 25 69
> > Bureau :  D152
> >  
>  
>  
>  
> --  
> Sivasangari NANDY -  Plate-forme GenOuest
> IRISA-INRIA, Campus de Beaulieu
> 263 Avenue du Général Leclerc
> 35042 Rennes cedex, France
> Tél: +33 (0) 2 99 84 25 69
> Bureau :  D152
>  

Reply via email to