[Public]
Hi Folks,
I can run benchmarks and find the pml+btl (ob1, ucx, uct, vader, etc)
combination that gives the best performance,
but I wanted to hear from the community about what is generally used in
"__high_core_count_intra_node_" cases before jumping into conclusions.
As I am a
Arun,
First Open MPI selects a pml for **all** the MPI tasks (for example,
pml/ucx or pml/ob1)
Then, if pml/ob1 ends up being selected, a btl component (e.g. btl/uct,
btl/vader) is used for each pair of MPI tasks
(tasks on the same node will use btl/vader, tasks on different nodes will
use