This link points to SLURM 2.3 documentation. For more updated versions
and the currently released version 2.6.1 you may want to use this
documentation:
http://slurm.schedmd.com/troubleshoot.html#nodes
On 08/26/2013 10:10 AM, Nikita Burtsev wrote:
https://computing.llnl.gov/linux/slurm/troubleshoot.html#nodes
--
Nikita Burtsev
On Monday, August 26, 2013 at 7:43 PM, Sivasangari Nandy wrote:
And the log file is not informative
tail -f /var/log/slurm-llnl/slurmd.log
...
[2013-08-26T11:52:16] Slurmd shutdown completing
[2013-08-26T11:52:56] slurmd version 2.3.4 started
[2013-08-26T11:52:56] slurmd started on Mon 26 Aug 2013 11:52:56 +0200
[2013-08-26T11:52:56] Procs=1 Sockets=1 Cores=1 Threads=1 Memory=2012
TmpDisk=9069 Uptime=1122626
------------------------------------------------------------------------
*De: *"Sivasangari Nandy" <[email protected]
<mailto:[email protected]>>
*À: *"slurm-dev" <[email protected]
<mailto:[email protected]>>
*Envoyé: *Lundi 26 Août 2013 14:28:28
*Objet: *Re: [slurm-dev] Re: Required node not available (down or
drained)
Hi,
I have checked some things, now my slurmctld and slurmd are in a
single machine (using just one node) so the test is easier.
For that I have modified the conf file : vi /etc/slurm-llnl/slurm.conf
Slurmctld and slurmd are both running, here my ps result :
root@VM-667:/etc/slurm-llnl# ps -ef | grep slurm
root 31712 31706 0 11:44 pts/1 00:00:00 tail -f
/var/log/slurm-llnl/slurmd.log
slurm 31990 1 0 11:52 ? 00:00:00 /usr/sbin/slurmctld
root 32103 1 0 11:52 ? 00:00:00 /usr/sbin/slurmd -c
root 32125 30346 0 11:53 pts/0 00:00:00 grep slurm
So i have tried srun again but got this error yet:
!srun
srun /omaha-beach/test.sh <http://test.sh>
srun: Required node not available (down or drained)
srun: job 64 queued and waiting for resources
Have you got any idea of the problem ?
thanks,
Siva
------------------------------------------------------------------------
*De: *"Nikita Burtsev" <[email protected]
<mailto:[email protected]>>
*À: *"slurm-dev" <[email protected]
<mailto:[email protected]>>
*Envoyé: *Jeudi 22 Août 2013 09:59:52
*Objet: *[slurm-dev] Re: Required node not available (down or
drained)
Re: [slurm-dev] Re: Required node not available (down or drained)
You need to have slurmd running on all nodes that will execute
jobs, so you should start it with init script.
--
Nikita Burtsev
Sent with Sparrow <http://www.sparrowmailapp.com/?sig>
On Thursday, August 22, 2013 at 11:55 AM, Sivasangari Nandy wrote:
"check if the slurmd daemon is running with the command
"/ps -el | grep slurmd/"."
Nothing is happened with ps -el ...
root@VM-667:~# ps -el | grep slurmd
------------------------------------------------------------------------
*De: *"Nikita Burtsev" <[email protected]
<mailto:[email protected]>>
*À: *"slurm-dev" <[email protected]
<mailto:[email protected]>>
*Envoyé: *Mercredi 21 Août 2013 18:58:52
*Objet: *[slurm-dev] Re: Required node not available
(down or drained)
Re: [slurm-dev] Re: Required node not available (down
or drained)
slurmctld is the management process and since your
have access to squeue/sinfo information it is running
just fine. You need to check if slurmd (which is the
agent part) is running on your nodes, i.e. VM-[669-671]
--
Nikita Burtsev
On Wednesday, August 21, 2013 at 8:13 PM, Sivasangari
Nandy wrote:
I have tried :
/etc/init.d/slurm-llnl start
[ ok ] Starting slurm central management daemon:
slurmctld.
/usr/sbin/slurmctld already running.
And :
scontrol show slurmd
scontrol: error: slurm_slurmd_info: Connection refused
slurm_load_slurmd_status: Connection refused
Hum how to proceed to repair that problem ?
------------------------------------------------------------------------
*De: *"Danny Auble" <[email protected]
<mailto:[email protected]>>
*À: *"slurm-dev" <[email protected]
<mailto:[email protected]>>
*Envoyé: *Mercredi 21 Août 2013 15:36:53
*Objet: *[slurm-dev] Re: Required node not
available (down or drained)
Check your slurmd log. It doesn't appear the
slurmd is running.
Sivasangari Nandy <[email protected]
<mailto:[email protected]>> wrote:
Hello,
I'm trying to use Slurm for the
first time, and I got a problem
with nodes I think.
I have this message when I used
squeue :
root@VM-667:~# squeue
JOBID PARTITION NAME
USER ST TIME NODES
NODELIST(REASON)
50 SLURM-deb test.sh
<http://test.sh> root PD ;
0:00 1 (ReqNodeNotAvail)
or this one with an other squeue :
root@VM-671:~# squeue
JOBID PARTITION NAME
USER ST TIME NODES
NODELIST(REASON)
50 SLURM-deb test.sh
<http://test.sh> root PD
0:00 &n bsp; 1 (Resources)
sinfo gives me :
PARTITION AVAIL TIMELIMIT NODES
STATE NODELIST
SLURM-de* up infinite 3
down VM-[669-671]
I have already used slurm one time
with the same configuration and I
wan able to run my job.
But now the second time I always
got :
srun: Required node not available
(down or drained)
srun: job 51 queued and waiting
for resources
Advance thanks for your help,
Siva
--
*Siva*sangari NANDY- Plate-forme *GenOuest*
IRISA-INRIA, Campus de Beaulieu
263 Avenue du Général Leclerc
35042 Rennes cedex, France
Tél: +33 (0) 2 99 84 25 69
Bureau : D152
--
*Siva*sangari NANDY- Plate-forme *GenOuest*
IRISA-INRIA, Campus de Beaulieu
263 Avenue du Général Leclerc
35042 Rennes cedex, France
Tél: +33 (0) 2 99 84 25 69
Bureau : D152
--
*Siva*sangari NANDY- Plate-forme *GenOuest*
IRISA-INRIA, Campus de Beaulieu
263 Avenue du Général Leclerc
35042 Rennes cedex, France
Tél: +33 (0) 2 99 84 25 69
Bureau : D152
--
*Siva*sangari NANDY- Plate-forme *GenOuest*
IRISA-INRIA, Campus de Beaulieu
263 Avenue du Général Leclerc
35042 Rennes cedex, France
Tél: +33 (0) 2 99 84 25 69
Bureau : D152
--
Thanks,
/David