[slurm-dev] Re: Processes sharing cores

Ralph Castain Tue, 07 Jun 2016 06:31:45 -0700

No, we don’t pick that up - suppose we could try. Those envars have a history 
of changing, though, and it gets difficult to match the version with the var.


I can put this on my “nice to do someday” list and see if/when we can get to 
it. Just so I don’t have to parse around more - what version of slurm are you 
using?


> On Jun 7, 2016, at 6:15 AM, Jason Bacon <[email protected]> wrote:
> 
> 
> 
> Thanks for the tip, but does OpenMPI not use SBATCH_CPU_BIND_* when SLURM 
> integration is compiled in?
> 
> printenv in the sbatch script produces the following:
> 
> Linux login.finch bacon ~/Data/Testing/Facil/Software/Src/Bench/MPI 379: grep 
> SBATCH slurm-5*
> slurm-579.out:SBATCH_CPU_BIND_LIST=0x3
> slurm-579.out:SBATCH_CPU_BIND_VERBOSE=verbose
> slurm-579.out:SBATCH_CPU_BIND_TYPE=mask_cpu:
> slurm-579.out:SBATCH_CPU_BIND=verbose,mask_cpu:0x3
> slurm-580.out:SBATCH_CPU_BIND_LIST=0xC
> slurm-580.out:SBATCH_CPU_BIND_VERBOSE=verbose
> slurm-580.out:SBATCH_CPU_BIND_TYPE=mask_cpu:
> slurm-580.out:SBATCH_CPU_BIND=verbose,mask_cpu:0xC
> 
> All OpenMPI jobs are using cores 0 and 2, although SLURM has assigned 0 and 1 
> to job 579 and 2 and 3 to 580.
> 
> Regards,
> 
>    Jason
> 
> On 06/06/16 21:11, Ralph Castain wrote:
>> Running two jobs across the same nodes is indeed an issue. Regardless of 
>> which MPI you use, the second mpiexec has no idea that the first one exists. 
>> Thus, the bindings applied to the second job will be computed as if the 
>> first job doesn’t exist - and thus, the procs will overload on top of each 
>> other.
>> 
>> The way you solve this with OpenMPI is by using the -slot-list <foo> option. 
>> This tells each mpiexec which cores are allocated to it, and it will 
>> constrain its binding calculation within that envelope. Thus, if you start 
>> the first job with -slot-list 0-2, and the second with -slot-list 3-5, the 
>> two jobs will be isolated from each other.
>> 
>> You can use any specification for the slot-list - it takes a comma-separated 
>> list of cores.
>> 
>> HTH
>> Ralph
>> 
>>> On Jun 6, 2016, at 6:08 PM, Jason Bacon <[email protected] 
>>> <mailto:[email protected]> <mailto:[email protected] 
>>> <mailto:[email protected]>>> wrote:
>>> 
>>> 
>>> 
>>> Actually, --bind-to core is the default for most OpenMPI jobs now, so 
>>> adding this flag has no effect.  It refers to the processes within the job.
>>> 
>>> I'm thinking this is an MPI-SLURM integration issue. Embarrassingly 
>>> parallel SLURM jobs are binding properly, but MPI jobs are ignoring the 
>>> SLURM environment and choosing their own cores.
>>> 
>>> OpenMPI was built with --with-slurm and it appears from config.log that it 
>>> located everything it needed.
>>> 
>>> I can work around the problem with "mpirun --bind-to none", which I'm 
>>> guessing will impact performance slightly for memory-intensive apps.
>>> 
>>> We're still digging on this one and may be for a while...
>>> 
>>>   Jason
>>> 
>>> On 06/03/16 15:48, Benjamin Redling wrote:
>>>> On 2016-06-03 21:25, Jason Bacon wrote:
>>>>> It might be worth mentioning that the calcpi-parallel jobs are run with
>>>>> --array (no srun).
>>>>> 
>>>>> Disabling the task/affinity plugin and using "mpirun --bind-to core"
>>>>> works around the issue.  The MPI processes bind to specific cores and
>>>>> the embarrassingly parallel jobs kindly move over and stay out of the way.
>>>> Are the mpirun --bind-to core child processes the same as a slurm task?
>>>> I have no experience at all with MPI jobs -- just trying to understand
>>>> task/affinity and params.
>>>> 
>>>> As far as I understand when you let mpirun do the binding it handles the
>>>> binding different https://www.open-mpi.org/doc/v1.8/man1/mpirun.1.php
>>>> 
>>>> If I grok the
>>>> % mpirun ... --map-by core --bind-to core
>>>> example in the "Mapping, Ranking, and Binding: Oh My!" section right.
>>>> 
>>>> 
>>>>> On 06/03/16 10:18, Jason Bacon wrote:
>>>>>> We're having an issue with CPU binding when two jobs land on the same
>>>>>> node.
>>>>>> 
>>>>>> Some cores are shared by the 2 jobs while others are left idle. Below
>>>> [...]
>>>>>> TaskPluginParam=cores,verbose
>>>> don't you bind each _job_ to a single core because you override
>>>> automatic binding and thous prevent binding each child process to
>>>> different core?
>>>> 
>>>> 
>>>> Regards,
>>>> Benjamin
>>> 
>>> 
>>> --
>>> All wars are civil wars, because all men are brothers ... Each one owes
>>> infinitely more to the human race than to the particular country in
>>> which he was born.
>>>               -- Francois Fenelon
>> 
> 
> 
> -- 
> All wars are civil wars, because all men are brothers ... Each one owes
> infinitely more to the human race than to the particular country in
> which he was born.
>                -- Francois Fenelon

[slurm-dev] Re: Processes sharing cores

Reply via email to