[slurm-dev] Update re: Problem with --nnodes, --ntasks, and --ntasks-per-node?

Andy Riebs Tue, 24 Feb 2015 11:12:58 -0800
   Sorry, I should have done a better job of vetting my concern before
 sending the earlier mail.
 
 If I have a script, say runner.sh, as follows:
 ------
 #!/bin/bash
 srun --ntasks-per-node=4 hostname
 _____
 
 And execute it with 
 
 $ salloc -N2 ./runner.sh
 
 The results from Slurm 14.11.2 and earlier will be
 
 -----
 node01
 node01
 node01
 node01
 node02
 node02
 node02
 node02
 -----
 
 But under Slurm 14.11.3, the result will be
 
 -----
 node01
 node02
 -----
 
 According to the salloc and srun man pages, --ntasks-per-node
 "Request[s] that ntasks be invoked on each node.  If used with the
 --ntasks option, the --ntasks option will take precedence and the
 --ntasks-per-node will be treated as a maximum count of tasks per 
 node."
 
 If srun is executed on its own, the correct thing happens here. If
 it is invoked within a Slurm environment, it seems to get confused.
 
 Again, apologies for the misinformation in the earlier note.
 
 Andy
 
 On 02/24/15 09:12, Andy Riebs wrote:
   Serves me right for always running a version behind -- thanks for
   the info!
   
   Andy
   
   On 02/24/15 09:10, CB wrote:
     Re: [slurm-dev] Problem with --nnodes, --ntasks, and
       --ntasks-per-node? 
     It seems that it's fixed with Slurm 14.11.4
         $ sinfo -V
         slurm 14.11.4
         $ srun -N2 --ntasks-per-node=2 hostname
         compute-1
         compute-0
         compute-1
         compute-0
       On Tue, Feb 24, 2015 at 8:51 AM, Andy
         Riebs <[email protected]>
         wrote:
         
           When we moved from Slurm 14.11.2 to 14.11.3, a bunch of
           our Slurm scripts broke!
           
           In the past, if --nnodes and --ntasks-per-node were
           specified, --ntasks would default to
           (nnodes*ntasks-per-cpu); i.e.,
           
           $ srun -N2 --ntasks-per-node=2 hostname
           hadesn02
           hadesn02
           hadesn01
           hadesn01
           $
           
           With Slurm 14.11.3, we see
           $ srun -N2 --ntasks-per-node=2 hostname
           hadesn02
           hadesn01
           $
           
           Was this change intentional?
               
               Andy
               
               -- 
               Andy Riebs
               Hewlett-Packard Company
               High Performance Computing
               +1 404 648 9024
               My opinions are not necessarily those of HP
[slurm-dev] Update re: Problem with --nnodes, --ntasks, and --ntasks-per-node?

Reply via email to