This link points to SLURM 2.3 documentation. For more updated versions and the currently released version 2.6.1 you may want to use this documentation:

http://slurm.schedmd.com/troubleshoot.html#nodes


On 08/26/2013 10:10 AM, Nikita Burtsev wrote:
https://computing.llnl.gov/linux/slurm/troubleshoot.html#nodes

--
Nikita Burtsev

On Monday, August 26, 2013 at 7:43 PM, Sivasangari Nandy wrote:

And the log file is not informative

tail -f /var/log/slurm-llnl/slurmd.log

...
[2013-08-26T11:52:16] Slurmd shutdown completing
[2013-08-26T11:52:56] slurmd version 2.3.4 started
[2013-08-26T11:52:56] slurmd started on Mon 26 Aug 2013 11:52:56 +0200
[2013-08-26T11:52:56] Procs=1 Sockets=1 Cores=1 Threads=1 Memory=2012
TmpDisk=9069 Uptime=1122626


------------------------------------------------------------------------

    *De: *"Sivasangari Nandy" <[email protected]
    <mailto:[email protected]>>
    *À: *"slurm-dev" <[email protected]
    <mailto:[email protected]>>
    *Envoyé: *Lundi 26 Août 2013 14:28:28
    *Objet: *Re: [slurm-dev] Re: Required node not available (down or
    drained)

    Hi,

    I have checked some things, now my slurmctld and slurmd are in a
    single machine (using just one node) so the test is easier.
    For that I have modified the conf file : vi /etc/slurm-llnl/slurm.conf

    Slurmctld and slurmd are both running, here my ps result :

    root@VM-667:/etc/slurm-llnl# ps -ef | grep slurm
    root     31712 31706  0 11:44 pts/1    00:00:00 tail -f
    /var/log/slurm-llnl/slurmd.log
    slurm    31990     1  0 11:52 ?        00:00:00 /usr/sbin/slurmctld
    root     32103     1  0 11:52 ?        00:00:00 /usr/sbin/slurmd -c
    root     32125 30346  0 11:53 pts/0    00:00:00 grep slurm

    So i have tried srun again but got this error yet:

    !srun
    srun /omaha-beach/test.sh <http://test.sh>
    srun: Required node not available (down or drained)
    srun: job 64 queued and waiting for resources

    Have you got any idea of the problem ?
    thanks,

    Siva

    ------------------------------------------------------------------------

        *De: *"Nikita Burtsev" <[email protected]
        <mailto:[email protected]>>
        *À: *"slurm-dev" <[email protected]
        <mailto:[email protected]>>
        *Envoyé: *Jeudi 22 Août 2013 09:59:52
        *Objet: *[slurm-dev] Re: Required node not available (down or
        drained)

        Re: [slurm-dev] Re: Required node not available (down or drained)
        You need to have slurmd running on all nodes that will execute
        jobs, so you should start it with init script.

        --
        Nikita Burtsev
        Sent with Sparrow <http://www.sparrowmailapp.com/?sig>

        On Thursday, August 22, 2013 at 11:55 AM, Sivasangari Nandy wrote:

            "check if the slurmd daemon is running with the command
            "/ps -el | grep slurmd/"."

            Nothing is happened with ps -el ...

            root@VM-667:~# ps -el | grep slurmd

            
------------------------------------------------------------------------

                *De: *"Nikita Burtsev" <[email protected]
                <mailto:[email protected]>>
                *À: *"slurm-dev" <[email protected]
                <mailto:[email protected]>>
                *Envoyé: *Mercredi 21 Août 2013 18:58:52
                *Objet: *[slurm-dev] Re: Required node not available
                (down or drained)

                Re: [slurm-dev] Re: Required node not available (down
                or drained)
                slurmctld is the management process and since your
                have access to squeue/sinfo information it is running
                just fine. You need to check if slurmd (which is the
                agent part) is running on your nodes, i.e. VM-[669-671]

                --
                Nikita Burtsev

                On Wednesday, August 21, 2013 at 8:13 PM, Sivasangari
                Nandy wrote:

                    I have tried :

                    /etc/init.d/slurm-llnl start

                    [ ok ] Starting slurm central management daemon:
                    slurmctld.
                    /usr/sbin/slurmctld already running.

                    And :

                    scontrol show slurmd

                    scontrol: error: slurm_slurmd_info: Connection refused
                    slurm_load_slurmd_status: Connection refused

                    Hum how to proceed to repair that problem ?


                    
------------------------------------------------------------------------

                        *De: *"Danny Auble" <[email protected]
                        <mailto:[email protected]>>
                        *À: *"slurm-dev" <[email protected]
                        <mailto:[email protected]>>
                        *Envoyé: *Mercredi 21 Août 2013 15:36:53
                        *Objet: *[slurm-dev] Re: Required node not
                        available (down or drained)

                        Check your slurmd log. It doesn't appear the
                        slurmd is running.

                        Sivasangari Nandy <[email protected]
                        <mailto:[email protected]>> wrote:

                                    Hello,

                                    I'm trying to use Slurm for the
                                    first time, and I got a problem
                                    with nodes I think.
                                    I have this message when I used
                                    squeue :

                                    root@VM-667:~# squeue
                                      JOBID PARTITION     NAME
                                    USER  ST       TIME  NODES
                                    NODELIST(REASON)
                                         50 SLURM-deb test.sh
                                    <http://test.sh>     root  PD ;
                                    0:00      1 (ReqNodeNotAvail)

                                    or this one with an other squeue :

                                    root@VM-671:~# squeue
                                      JOBID PARTITION     NAME
                                    USER  ST       TIME  NODES
                                    NODELIST(REASON)
                                         50 SLURM-deb test.sh
                                    <http://test.sh>     root  PD
                                      0:00   &n bsp;  1 (Resources)

                                    sinfo gives me :

                                    PARTITION AVAIL  TIMELIMIT  NODES
                                     STATE NODELIST
                                    SLURM-de*    up   infinite      3
                                      down VM-[669-671]

                                    I have already used slurm one time
                                    with the same configuration and I
                                    wan able to run my job.
                                    But now the second time I always
                                    got :

                                    srun: Required node not available
                                    (down or drained)
                                    srun: job 51 queued and waiting
                                    for resources

                                    Advance thanks for your help,
                                    Siva




                    --
                    *Siva*sangari NANDY-  Plate-forme *GenOuest*
                    IRISA-INRIA, Campus de Beaulieu
                    263 Avenue du Général Leclerc
                    35042 Rennes cedex, France
                    Tél: +33 (0) 2 99 84 25 69
                    Bureau :  D152





            --
            *Siva*sangari NANDY-  Plate-forme *GenOuest*
            IRISA-INRIA, Campus de Beaulieu
            263 Avenue du Général Leclerc
            35042 Rennes cedex, France
            Tél: +33 (0) 2 99 84 25 69
            Bureau :  D152





    --
    *Siva*sangari NANDY-  Plate-forme *GenOuest*
    IRISA-INRIA, Campus de Beaulieu
    263 Avenue du Général Leclerc
    35042 Rennes cedex, France
    Tél: +33 (0) 2 99 84 25 69
    Bureau :  D152




--
*Siva*sangari NANDY-  Plate-forme *GenOuest*
IRISA-INRIA, Campus de Beaulieu
263 Avenue du Général Leclerc
35042 Rennes cedex, France
Tél: +33 (0) 2 99 84 25 69
Bureau :  D152



--

Thanks,
      /David

Reply via email to