[slurm-dev] Re: Problem with PMI2 in Slurm 14.03.10

Andy Riebs Fri, 21 Nov 2014 13:27:43 -0800
   Thanks Artem! We'll keep you posted on what we find.
 
 Andy
 
 On 11/21/2014 12:06 PM, Artem Polyakov
   wrote:
   Re: [slurm-dev] Problem with PMI2 in Slurm 14.03.10
       2014-11-21 21:43 GMT+06:00 Artem
         Polyakov <[email protected]>:
             Hello, Andy.
             I am not SLURM expert, so please consider the
               following advices as optional.
             According to sources PMI2 fails when trying to
               broadcast cumulative database to the nodes. It tries 5
               times with delays that increases as powers of two:
             1, 2, 4, 8, 16 seconds.
             The cumulative size of Database should be
               proportional to 1534*24*437 = 16051776 = 16Mb which is
               not that much. I attached small patch to output exact
               DB size right before broadcasting it to compute nodes
               (I assume that you can change sources).
             One of the things that come to my mind regarding to
               this is what if you try twice less tasks with twice
               bigger message. 2x437 is still less than 1024
               (PMI2_MAX_VALLEN). We exclude the case of cumulative
               DB size problem.
             The most probably the problem is in
               slurm_forward_data I would do the following two
               things:
             1. Enable 3rd level of debug on slurmd's
               (SlurmdDebug=3 configuration option) to see what is
               failing there. You probably done that already but
               didn't mention in mail.
         Here I was wrong. Since it is "srun" process who does
           final broadcast of DB you need to launch your application
           with -vvv option increasing its verbosity (srun -vvv
           pmi2_allgather). I think you'll see more detailed error
           report.
             2. Optionally you can try to play with slurm fanout
               tree width (TreeWidth=10/50/100/whatever...
               configuration option).
                   2014-11-21 19:20
                     GMT+06:00 Andy Riebs <[email protected]>:
                     Hi
                       Slurm gurus!   We are seeing an issue when
                       launching large rank count jobs in our IB
                       cluster using PMI2 and could use your help.
                       When the jobs fail, the first line of output
                       seems to have the most useful information:
                       
                       srun: error: mpi/pmi2: failed to send temp kvs
                       to compute nodes
                       
                       One of our Mellanox friends stripped down the
                       test case to just include the PMI2 startup
                       code from MPI to help try to isolate the issue
                       further.  This code just takes one argument
                       which is how many bytes to put in the
                       message.  When we start up this test code on
                       1534 nodes with PPN=24, 436 byte messages will
                       pass, but 437 byte messages will fail.  (Maybe
                       those numbers will help someone figure this
                       out!)
                       
                       The only interesting slurm configuration
                       option that we have updated is:
                       MessageTimeout=60, but it did not impact this
                       issue.   We are looking for advice on how to
                       proceed in debugging/troubleshooting this
                       issue further.
                       
                       I have attached the test program to this
                       message.
                       
                       We are using RHEL 6.5 x86_64 and Slurm
                       14.03.10 on this system.
                           
                           Andy
                           
                           -- 
                           Andy Riebs
                           Hewlett-Packard Company
                           High Performance Computing
                           +1 404 648 9024
                           My opinions are not necessarily those of
                           HP
               -- 
                   С Уважением, Поляков Артем Юрьевич
                     Best regards, Artem Y. Polyakov
       -- 
       С Уважением, Поляков Артем
         Юрьевич
         Best regards, Artem Y. Polyakov
[slurm-dev] Re: Problem with PMI2 in Slurm 14.03.10

Reply via email to