Yes, it should - provided the job step executing each mpirun has been given a 
unique binding. I suspect this is the problem you are encountering, but can’t 
know for certain. You could run an app that prints out its binding and then see 
if two parallel executions of srun yield different values.


> On Jun 7, 2016, at 5:26 PM, Jason Bacon <[email protected]> wrote:
> 
> 
> So this *should* work even for two separate MPI jobs sharing a node?
> 
> Thanks much,
> 
>     Jason
> On 06/07/2016 09:09, Ralph Castain wrote:
>> Yes, it should. What’s odd is that mpirun launches its daemons using srun 
>> under the covers, and the daemon should therefore be bound. We detect that 
>> and use it, but I’m not sure why this isn’t working here.
>> 
>> 
>>> On Jun 7, 2016, at 6:52 AM, Bruce Roberts <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> What happens if you use srun instead of mpirun? I would expect that to work 
>>> correctly. 
>>> 
>>> On June 7, 2016 6:31:27 AM MST, Ralph Castain <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> No, we don’t pick that up - suppose we could try. Those envars have a 
>>> history of changing, though, and it gets difficult to match the version 
>>> with the var.
>>> 
>>> I can put this on my “nice to do someday” list and see if/when we can get 
>>> to it. Just so I don’t have to parse around more - what version of slurm 
>>> are you using?
>>> 
>>> 
>>>> On Jun 7, 2016, at 6:15 AM, Jason Bacon < 
>>>> <mailto:[email protected]>[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> 
>>>> 
>>>> Thanks for the tip, but does OpenMPI not use SBATCH_CPU_BIND_* when SLURM 
>>>> integration is compiled in?
>>>> 
>>>> printenv in the sbatch script produces the following:
>>>> 
>>>> Linux login.finch bacon ~/Data/Testing/Facil/Software/Src/Bench/MPI 379: 
>>>> grep SBATCH slurm-5*
>>>> slurm-579.out:SBATCH_CPU_BIND_LIST=0x3
>>>> slurm-579.out:SBATCH_CPU_BIND_VERBOSE=verbose
>>>> slurm-579.out:SBATCH_CPU_BIND_TYPE=mask_cpu:
>>>> slurm-579.out:SBATCH_CPU_BIND=verbose,mask_cpu:0x3
>>>> slurm-580.out:SBATCH_CPU_BIND_LIST=0xC
>>>> slurm-580.out:SBATCH_CPU_BIND_VERBOSE=verbose
>>>> slurm-580.out:SBATCH_CPU_BIND_TYPE=mask_cpu:
>>>> slurm-580.out:SBATCH_CPU_BIND=verbose,mask_cpu:0xC
>>>> 
>>>> All OpenMPI jobs are using cores 0 and 2, although SLURM has assigned 0 
>>>> and 1 to job 579 and 2 and 3 to 580.
>>>> 
>>>> Regards,
>>>> 
>>>>    Jason
>>>> 
>>>> On 06/06/16 21:11, Ralph Castain wrote:
>>>>> Running two jobs across the same nodes is indeed an issue. Regardless of 
>>>>> which MPI you use, the second mpiexec has no idea that the first one 
>>>>> exists. Thus, the bindings applied to the second job will be computed as 
>>>>> if the first job doesn’t exist - and thus, the procs will overload on top 
>>>>> of each other.
>>>>> 
>>>>> The way you solve this with OpenMPI is by using the -slot-list <foo> 
>>>>> option. This tells each mpiexec which cores are allocated to it, and it 
>>>>> will constrain its binding calculation within that envelope. Thus, if you 
>>>>> start the first job with -slot-list 0-2, and the second with -slot-list 
>>>>> 3-5, the two jobs will be isolated from each other.
>>>>> 
>>>>> You can use any specification for the slot-list - it takes a 
>>>>> comma-separated list of cores.
>>>>> 
>>>>> HTH
>>>>> Ralph
>>>>> 
>>>>>> On Jun 6, 2016, at 6:08 PM, Jason Bacon <[email protected] 
>>>>>> <mailto:[email protected]> < 
>>>>>> <mailto:[email protected]>mailto:[email protected] 
>>>>>> <mailto:[email protected]>>> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Actually, --bind-to core is the default for most OpenMPI jobs now, so 
>>>>>> adding this flag has no effect.  It refers to the processes within the 
>>>>>> job.
>>>>>> 
>>>>>> I'm thinking this is an MPI-SLURM integration issue. Embarrassingly 
>>>>>> parallel SLURM jobs are binding properly, but MPI jobs are ignoring the 
>>>>>> SLURM environment and choosing their own cores.
>>>>>> 
>>>>>> OpenMPI was built with --with-slurm and it appears from config.log that 
>>>>>> it located everything it needed.
>>>>>> 
>>>>>> I can work around the problem with "mpirun --bind-to none", which I'm 
>>>>>> guessing will impact performance slightly for memory-intensive apps.
>>>>>> 
>>>>>> We're still digging on this one and may be for a while...
>>>>>> 
>>>>>>   Jason
>>>>>> 
>>>>>> On 06/03/16 15:48, Benjamin Redling wrote:
>>>>>>> On 2016-06-03 21:25, Jason Bacon wrote:
>>>>>>>> It might be worth mentioning that the calcpi-parallel jobs are run with
>>>>>>>> --array (no srun).
>>>>>>>> 
>>>>>>>> Disabling the task/affinity plugin and using "mpirun --bind-to core"
>>>>>>>> works around the issue.  The MPI processes bind to specific cores and
>>>>>>>> the embarrassingly parallel jobs kindly move over and stay out of the 
>>>>>>>> way.
>>>>>>> Are the mpirun --bind-to core child processes the same as a slurm task?
>>>>>>> I have no experience at all with MPI jobs -- just trying to understand
>>>>>>> task/affinity and params.
>>>>>>> 
>>>>>>> As far as I understand when you let mpirun do the binding it handles the
>>>>>>> binding different  
>>>>>>> <https://www.open-mpi.org/doc/v1.8/man1/mpirun.1.php>https://www.open-mpi.org/doc/v1.8/man1/mpirun.1.php
>>>>>>>  <https://www.open-mpi.org/doc/v1.8/man1/mpirun.1.php>
>>>>>>> 
>>>>>>> If I grok the
>>>>>>> % mpirun ... --map-by core --bind-to core
>>>>>>> example in the "Mapping, Ranking, and Binding: Oh My!" section right.
>>>>>>> 
>>>>>>>> On 06/03/16 10:18, Jason Bacon wrote:
>>>>>>>>> We're having an issue with CPU binding when two jobs land on the same
>>>>>>>>> node.
>>>>>>>>> 
>>>>>>>>> Some cores are shared by the 2 jobs while others are left idle. Below
>>>>>>> [...]
>>>>>>>>> TaskPluginParam=cores,verbose
>>>>>>> don't you bind each _job_ to a single core because you override
>>>>>>> automatic binding and thous prevent binding each child process to
>>>>>>> different core?
>>>>>>> 
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Benjamin
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> All wars are civil wars, because all men are brothers ... Each one owes
>>>>>> infinitely more to the human race than to the parti cular country in
>>>>>> which he was born.
>>>>>>               -- Francois Fenelon
>>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> All wars are civil wars, because all men are bro thers ... Each one owes
>>>> infinitely more to the human race than to the particular country in
>>>> which he was born.
>>>>                -- Francois Fenelon
>>> 
>> 
> 

Reply via email to