Hello OpenMPI I was wondering if the MPI_Neighbor_xxxxx calls have received any special design and optimizations in OpenMPI 4.1.x+ for these patterns of communication.
For instance, these could benefit from proximity awareness and intra- vs inter-node communications. However, even single node communications have hierarchical structure due to the increased number of num-domains, larger L3 caches and so on. Is OpenMPI 4.1.x+ leveraging any special logic to optimize these calls? Is UCX or UCC/HCOLL doing anything special or is OpenMPI using these lower layers in a more "intelligent" way to provide optimized neighborhood collectives? Thanks you much Michael