If you have support for FCA then it might happen that the collective will
use the hardware support. In any case, most of the bcast algorithms have a
logarithmic behavior, so there will be at most O(log(P)) memory accesses on
the root.
If you want to take a look at the code in OMPI to understand
Hi!
I'm wondering about the details of Bcast implementation in OpenMPI. I'm
specifically interested in IB interconnects, but information about other
architectures (and OpenMPI in general) would also be very useful.
I am working with a code, which sends the sameĀ (large) message to a
bunch