Hi!
I'm wondering about the details of Bcast implementation in OpenMPI. I'm
specifically interested in IB interconnects, but information about other
architectures (and OpenMPI in general) would also be very useful.
I am working with a code, which sends the sameĀ (large) message to a
bunch of
If you have support for FCA then it might happen that the collective will
use the hardware support. In any case, most of the bcast algorithms have a
logarithmic behavior, so there will be at most O(log(P)) memory accesses on
the root.
If you want to take a look at the code in OMPI to understand wh