What happens if you use srun instead of mpirun? I would expect that to work.
On June 7, 2016 6:32:17 AM MST, Ralph Castain <[email protected]> wrote: >No, we don’t pick that up - suppose we could try. Those envars have a >history of changing, though, and it gets difficult to match the version >with the var. > >I can put this on my “nice to do someday” list and see if/when we can >get to it. Just so I don’t have to parse around more - what version of >slurm are you using? > > >> On Jun 7, 2016, at 6:15 AM, Jason Bacon <[email protected]> wrote: >> >> >> >> Thanks for the tip, but does OpenMPI not use SBATCH_CPU_BIND_* when >SLURM integration is compiled in? >> >> printenv in the sbatch script produces the following: >> >> Linux login.finch bacon ~/Data/Testing/Facil/Software/Src/Bench/MPI >379: grep SBATCH slurm-5* >> slurm-579.out:SBATCH_CPU_BIND_LIST=0x3 >> slurm-579.out:SBATCH_CPU_BIND_VERBOSE=verbose >> slurm-579.out:SBATCH_CPU_BIND_TYPE=mask_cpu: >> slurm-579.out:SBATCH_CPU_BIND=verbose,mask_cpu:0x3 >> slurm-580.out:SBATCH_CPU_BIND_LIST=0xC >> slurm-580.out:SBATCH_CPU_BIND_VERBOSE=verbose >> slurm-580.out:SBATCH_CPU_BIND_TYPE=mask_cpu: >> slurm-580.out:SBATCH_CPU_BIND=verbose,mask_cpu:0xC >> >> All OpenMPI jobs are using cores 0 and 2, although SLURM has assigned >0 and 1 to job 579 and 2 and 3 to 580. >> >> Regards, >> >> Jason >> >> On 06/06/16 21:11, Ralph Castain wrote: >>> Running two jobs across the same nodes is indeed an issue. >Regardless of which MPI you use, the second mpiexec has no idea that >the first one exists. Thus, the bindings applied to the second job will >be computed as if the first job doesn’t exist - and thus, the procs >will overload on top of each other. >>> >>> The way you solve this with OpenMPI is by using the -slot-list <foo> >option. This tells each mpiexec which cores are allocated to it, and it >will constrain its binding calculation within that envelope. Thus, if >you start the first job with -slot-list 0-2, and the second with >-slot-list 3-5, the two jobs will be isolated from each other. >>> >>> You can use any specification for the slot-list - it takes a >comma-separated list of cores. >>> >>> HTH >>> Ralph >>> >>>> On Jun 6, 2016, at 6:08 PM, Jason Bacon <[email protected] ><mailto:[email protected]> <mailto:[email protected] ><mailto:[email protected]>>> wrote: >>>> >>>> >>>> >>>> Actually, --bind-to core is the default for most OpenMPI jobs now, >so adding this flag has no effect. It refers to the processes within >the job. >>>> >>>> I'm thinking this is an MPI-SLURM integration issue. Embarrassingly >parallel SLURM jobs are binding properly, but MPI jobs are ignoring the >SLURM environment and choosing their own cores. >>>> >>>> OpenMPI was built with --with-slurm and it appears from config.log >that it located everything it needed. >>>> >>>> I can work around the problem with "mpirun --bind-to none", which >I'm guessing will impact performance slightly for memory-intensive >apps. >>>> >>>> We're still digging on this one and may be for a while... >>>> >>>> Jason >>>> >>>> On 06/03/16 15:48, Benjamin Redling wrote: >>>>> On 2016-06-03 21:25, Jason Bacon wrote: >>>>>> It might be worth mentioning that the calcpi-parallel jobs are >run with >>>>>> --array (no srun). >>>>>> >>>>>> Disabling the task/affinity plugin and using "mpirun --bind-to >core" >>>>>> works around the issue. The MPI processes bind to specific cores >and >>>>>> the embarrassingly parallel jobs kindly move over and stay out of >the way. >>>>> Are the mpirun --bind-to core child processes the same as a slurm >task? >>>>> I have no experience at all with MPI jobs -- just trying to >understand >>>>> task/affinity and params. >>>>> >>>>> As far as I understand when you let mpirun do the binding it >handles the >>>>> binding different >https://www.open-mpi.org/doc/v1.8/man1/mpirun.1.php >>>>> >>>>> If I grok the >>>>> % mpirun ... --map-by core --bind-to core >>>>> example in the "Mapping, Ranking, and Binding: Oh My!" section >right. >>>>> >>>>> >>>>>> On 06/03/16 10:18, Jason Bacon wrote: >>>>>>> We're having an issue with CPU binding when two jobs land on the >same >>>>>>> node. >>>>>>> >>>>>>> Some cores are shared by the 2 jobs while others are left idle. >Below >>>>> [...] >>>>>>> TaskPluginParam=cores,verbose >>>>> don't you bind each _job_ to a single core because you override >>>>> automatic binding and thous prevent binding each child process to >>>>> different core? >>>>> >>>>> >>>>> Regards, >>>>> Benjamin >>>> >>>> >>>> -- >>>> All wars are civil wars, because all men are brothers ... Each one >owes >>>> infinitely more to the human race than to the particular country in >>>> which he was born. >>>> -- Francois Fenelon >>> >> >> >> -- >> All wars are civil wars, because all men are brothers ... Each one >owes >> infinitely more to the human race than to the particular country in >> which he was born. >> -- Francois Fenelon
