[slurm-dev] Re: Processes sharing cores

Bruce Roberts Tue, 07 Jun 2016 07:01:45 -0700

What happens if you use srun instead of mpirun?  I would expect that to work.


On June 7, 2016 6:32:17 AM MST, Ralph Castain <[email protected]> wrote:
>No, we don’t pick that up - suppose we could try. Those envars have a
>history of changing, though, and it gets difficult to match the version
>with the var.
>
>I can put this on my “nice to do someday” list and see if/when we can
>get to it. Just so I don’t have to parse around more - what version of
>slurm are you using?
>
>
>> On Jun 7, 2016, at 6:15 AM, Jason Bacon <[email protected]> wrote:
>> 
>> 
>> 
>> Thanks for the tip, but does OpenMPI not use SBATCH_CPU_BIND_* when
>SLURM integration is compiled in?
>> 
>> printenv in the sbatch script produces the following:
>> 
>> Linux login.finch bacon ~/Data/Testing/Facil/Software/Src/Bench/MPI
>379: grep SBATCH slurm-5*
>> slurm-579.out:SBATCH_CPU_BIND_LIST=0x3
>> slurm-579.out:SBATCH_CPU_BIND_VERBOSE=verbose
>> slurm-579.out:SBATCH_CPU_BIND_TYPE=mask_cpu:
>> slurm-579.out:SBATCH_CPU_BIND=verbose,mask_cpu:0x3
>> slurm-580.out:SBATCH_CPU_BIND_LIST=0xC
>> slurm-580.out:SBATCH_CPU_BIND_VERBOSE=verbose
>> slurm-580.out:SBATCH_CPU_BIND_TYPE=mask_cpu:
>> slurm-580.out:SBATCH_CPU_BIND=verbose,mask_cpu:0xC
>> 
>> All OpenMPI jobs are using cores 0 and 2, although SLURM has assigned
>0 and 1 to job 579 and 2 and 3 to 580.
>> 
>> Regards,
>> 
>>    Jason
>> 
>> On 06/06/16 21:11, Ralph Castain wrote:
>>> Running two jobs across the same nodes is indeed an issue.
>Regardless of which MPI you use, the second mpiexec has no idea that
>the first one exists. Thus, the bindings applied to the second job will
>be computed as if the first job doesn’t exist - and thus, the procs
>will overload on top of each other.
>>> 
>>> The way you solve this with OpenMPI is by using the -slot-list <foo>
>option. This tells each mpiexec which cores are allocated to it, and it
>will constrain its binding calculation within that envelope. Thus, if
>you start the first job with -slot-list 0-2, and the second with
>-slot-list 3-5, the two jobs will be isolated from each other.
>>> 
>>> You can use any specification for the slot-list - it takes a
>comma-separated list of cores.
>>> 
>>> HTH
>>> Ralph
>>> 
>>>> On Jun 6, 2016, at 6:08 PM, Jason Bacon <[email protected]
><mailto:[email protected]> <mailto:[email protected]
><mailto:[email protected]>>> wrote:
>>>> 
>>>> 
>>>> 
>>>> Actually, --bind-to core is the default for most OpenMPI jobs now,
>so adding this flag has no effect.  It refers to the processes within
>the job.
>>>> 
>>>> I'm thinking this is an MPI-SLURM integration issue. Embarrassingly
>parallel SLURM jobs are binding properly, but MPI jobs are ignoring the
>SLURM environment and choosing their own cores.
>>>> 
>>>> OpenMPI was built with --with-slurm and it appears from config.log
>that it located everything it needed.
>>>> 
>>>> I can work around the problem with "mpirun --bind-to none", which
>I'm guessing will impact performance slightly for memory-intensive
>apps.
>>>> 
>>>> We're still digging on this one and may be for a while...
>>>> 
>>>>   Jason
>>>> 
>>>> On 06/03/16 15:48, Benjamin Redling wrote:
>>>>> On 2016-06-03 21:25, Jason Bacon wrote:
>>>>>> It might be worth mentioning that the calcpi-parallel jobs are
>run with
>>>>>> --array (no srun).
>>>>>> 
>>>>>> Disabling the task/affinity plugin and using "mpirun --bind-to
>core"
>>>>>> works around the issue.  The MPI processes bind to specific cores
>and
>>>>>> the embarrassingly parallel jobs kindly move over and stay out of
>the way.
>>>>> Are the mpirun --bind-to core child processes the same as a slurm
>task?
>>>>> I have no experience at all with MPI jobs -- just trying to
>understand
>>>>> task/affinity and params.
>>>>> 
>>>>> As far as I understand when you let mpirun do the binding it
>handles the
>>>>> binding different
>https://www.open-mpi.org/doc/v1.8/man1/mpirun.1.php
>>>>> 
>>>>> If I grok the
>>>>> % mpirun ... --map-by core --bind-to core
>>>>> example in the "Mapping, Ranking, and Binding: Oh My!" section
>right.
>>>>> 
>>>>> 
>>>>>> On 06/03/16 10:18, Jason Bacon wrote:
>>>>>>> We're having an issue with CPU binding when two jobs land on the
>same
>>>>>>> node.
>>>>>>> 
>>>>>>> Some cores are shared by the 2 jobs while others are left idle.
>Below
>>>>> [...]
>>>>>>> TaskPluginParam=cores,verbose
>>>>> don't you bind each _job_ to a single core because you override
>>>>> automatic binding and thous prevent binding each child process to
>>>>> different core?
>>>>> 
>>>>> 
>>>>> Regards,
>>>>> Benjamin
>>>> 
>>>> 
>>>> --
>>>> All wars are civil wars, because all men are brothers ... Each one
>owes
>>>> infinitely more to the human race than to the particular country in
>>>> which he was born.
>>>>               -- Francois Fenelon
>>> 
>> 
>> 
>> -- 
>> All wars are civil wars, because all men are brothers ... Each one
>owes
>> infinitely more to the human race than to the particular country in
>> which he was born.
>>                -- Francois Fenelon

[slurm-dev] Re: Processes sharing cores

Reply via email to