body p { margin-bottom: 0cm; margin-top: 0pt; } 
 As a workaround - can you test
 srun --cpu_bind=verbose,map_cpu:
 mpirun -slot-list $SBATCH_CPU_BIND_LIST
 
 I'm thinking -slot-list doesn't handle cpu masks, and slurm should
 provide an explicit list of IDs.
 
 On 06/07/2016 04:14 PM, Jason Bacon
   wrote:
   Thanks for the tip, but does OpenMPI not use SBATCH_CPU_BIND_*
   when SLURM integration is compiled in?
   printenv in the sbatch script produces the following:
   Linux login.finch bacon
   ~/Data/Testing/Facil/Software/Src/Bench/MPI 379: grep SBATCH
   slurm-5*
   
   slurm-579.out:SBATCH_CPU_BIND_LIST=0x3
   
   slurm-579.out:SBATCH_CPU_BIND_VERBOSE=verbose
   
   slurm-579.out:SBATCH_CPU_BIND_TYPE=mask_cpu:
   
   slurm-579.out:SBATCH_CPU_BIND=verbose,mask_cpu:0x3
   
   slurm-580.out:SBATCH_CPU_BIND_LIST=0xC
   
   slurm-580.out:SBATCH_CPU_BIND_VERBOSE=verbose
   
   slurm-580.out:SBATCH_CPU_BIND_TYPE=mask_cpu:
   
   slurm-580.out:SBATCH_CPU_BIND=verbose,mask_cpu:0xC
   All OpenMPI jobs are using cores 0 and 2, although SLURM has
   assigned 0 and 1 to job 579 and 2 and 3 to 580.
   Regards,
       Jason
   On 06/06/16 21:11, Ralph Castain wrote:
   
   Running two jobs across the same nodes is
     indeed an issue. Regardless of which MPI you use, the second
     mpiexec has no idea that the first one exists. Thus, the
     bindings applied to the second job will be computed as if the
     first job doesn’t exist - and thus, the procs will overload on
     top of each other.
     The way you solve this with OpenMPI is by using the -slot-list
     <foo> option. This tells each mpiexec which cores are
     allocated to it, and it will constrain its binding calculation
     within that envelope. Thus, if you start the first job with
     -slot-list 0-2, and the second with -slot-list 3-5, the two jobs
     will be isolated from each other.
     You can use any specification for the slot-list - it takes a
     comma-separated list of cores.
     HTH
     
     Ralph
     On Jun 6, 2016, at 6:08 PM, Jason Bacon
       <[email protected] <mailto:[email protected]>>
       wrote:
       Actually, --bind-to core is the default for most OpenMPI jobs
       now, so adding this flag has no effect.  It refers to the
       processes within the job.
       I'm thinking this is an MPI-SLURM integration issue.
       Embarrassingly parallel SLURM jobs are binding properly, but
       MPI jobs are ignoring the SLURM environment and choosing their
       own cores.
       OpenMPI was built with --with-slurm and it appears from
       config.log that it located everything it needed.
       I can work around the problem with "mpirun --bind-to none",
       which I'm guessing will impact performance slightly for
       memory-intensive apps.
       We're still digging on this one and may be for a while...
          Jason
       On 06/03/16 15:48, Benjamin Redling wrote:
       
       On 2016-06-03 21:25, Jason Bacon
         wrote:
         
         It might be worth mentioning that
           the calcpi-parallel jobs are run with
           
           --array (no srun).
           Disabling the task/affinity plugin and using "mpirun
           --bind-to core"
           
           works around the issue.  The MPI processes bind to
           specific cores and
           
           the embarrassingly parallel jobs kindly move over and stay
           out of the way.
         Are the mpirun --bind-to core child processes the same as a
         slurm task?
         
         I have no experience at all with MPI jobs -- just trying to
         understand
         
         task/affinity and params.
         As far as I understand when you let mpirun do the binding it
         handles the
         
         binding different
         https://www.open-mpi.org/doc/v1.8/man1/mpirun.1.php
         If I grok the
         
         % mpirun ... --map-by core --bind-to core
         
         example in the "Mapping, Ranking, and Binding: Oh My!"
         section right.
         On 06/03/16 10:18, Jason Bacon
           wrote:
           
           We're having an issue with CPU
             binding when two jobs land on the same
             
             node.
             Some cores are shared by the 2 jobs while others are
             left idle. Below
         [...]
           TaskPluginParam=cores,verbose
         don't you bind each _job_ to a single core because you
         override
         
         automatic binding and thous prevent binding each child
         process to
         
         different core?
         Regards,
         
         Benjamin
       --
       
       All wars are civil wars, because all men are brothers ... Each
       one owes
       
       infinitely more to the human race than to the particular
       country in
       
       which he was born.
       
                      -- Francois Fenelon

Reply via email to