fairly frequently, but not everytime when trying to run xhpl on a new machine i'm bumping into this. it happens with a single node or multiple nodes
node1 selected pml ob1, but peer on node1 selected pml ucx if i rerun the exact same command a few minutes later, it works fine. the machine is new and i'm the only one using it so there are no user conflicts the software stack is slurm 21.8.2.1 ompi 4.1.1 pmix 3.2.3 ucx 1.9.0 the hardware is HPE w/ mellanox edr cards (but i doubt that matters) any thoughts?