Re: [OMPI users] Quality and details of implementation for Neighborhood collective operations

George Bosilca via users Wed, 08 Jun 2022 14:24:33 -0700

There is a lot of FUD regarding the so-called optimizations for
neighborhood collectives. In general, they all converge toward creating a
globally consistent communication order. If the neighborhood topology is
regular, some parts of the globally consistent communication order can be
inferred, but for all graph topologies (assuming irregular) the creation
overhead of this globally consistent communication order is significant and
can only be hidden if the collective pattern is reused multiple times (aka
persistent communications). So, while there are some opportunities for
optimizations for specific cases, the implementations we provide in OMPI
(both basic and libnbc), despite their apparent simplicity, should perform
reasonably well in most cases.


George.


On Wed, Jun 8, 2022 at 3:58 PM Michael Thomadakis <drmichaelt7...@gmail.com>
wrote:

> I see, thanks....
>
> Is there any plan to apply any optimizations on the Neighbor collectives
> at some point?
>
> regards
> Michael
>
> On Wed, Jun 8, 2022 at 1:29 PM George Bosilca <bosi...@icl.utk.edu> wrote:
>
>> Michael,
>>
>> As far as I know none of the implementations of the
>> neighborhood collectives in OMPI are architecture-aware. The only 2
>> components that provide support for neighborhood collectives are basic (for
>> the blocking version) and libnbc (for the non-blocking versions).
>>
>>   George.
>>
>>
>> On Wed, Jun 8, 2022 at 1:27 PM Michael Thomadakis via users <
>> users@lists.open-mpi.org> wrote:
>>
>>> Hello OpenMPI
>>>
>>> I was wondering if the MPI_Neighbor_xxxxx calls have received any
>>> special design and optimizations in OpenMPI 4.1.x+ for these patterns of
>>> communication.
>>>
>>> For instance, these could benefit from proximity awareness and intra- vs
>>> inter-node communications. However, even single node communications have
>>> hierarchical structure due to the increased number of num-domains, larger
>>> L3 caches and so on.
>>>
>>> Is OpenMPI 4.1.x+ leveraging any special logic to optimize these calls?
>>> Is UCX or UCC/HCOLL doing anything special or is OpenMPI using these lower
>>> layers in a more "intelligent" way to provide
>>> optimized neighborhood collectives?
>>>
>>> Thanks you much
>>> Michael
>>>
>>

Re: [OMPI users] Quality and details of implementation for Neighborhood collective operations

Reply via email to