Re: [OMPI users] Openmpi5.0.7 causes fatal timeout on last rank

2025-07-02 Thread 'George Bosilca' via Open MPI users
OMPI 5.x has no support for the openib BTL, all IB traffic is now going through the UCX PML. This means that `-mca btl_openib_if_include XXX` is meaningless, but you can use the UCX_NET_DEVICES to direct UCX to a specific device. As the error happens for UD you can switch to a different transport

Re: [OMPI users] Openmpi5.0.7 causes fatal timeout on last rank

2025-07-02 Thread Achilles Vassilicos
I checked further using a modification to mpi_hello_world.c (that includes MPI_Barrier) and a test code that checks connectivity between all processes. 1. On the mpi_hello_world_barrier.c case, openmpi5 failed the same way as before. mpich-ofi completed without error. 2. On the connectivity_c.c c

Re: [OMPI users] Openmpi5.0.7 causes fatal timeout on last rank

2025-07-02 Thread Achilles Vassilicos
UCX1.18.0 On Wednesday, July 2, 2025 at 3:38:57 PM UTC-4 Achilles Vassilicos wrote: > I checked further using a modification to mpi_hello_world.c (that includes > MPI_Barrier) and a test code that checks connectivity between all processes. > 1. On the mpi_hello_world_barrier.c case, openmpi5 fai

Re: [OMPI users] Openmpi5.0.7 causes fatal timeout on last rank

2025-07-02 Thread Achilles Vassilicos
A lot to chew on. Thanks.AchillesSent from my iPhoneOn Jul 2, 2025, at 5:40 PM, George Bosilca wrote:OMPI 5.x has no support for the openib BTL, all IB traffic is now going through the UCX PML. This means that `-mca btl_openib_if_include XXX` is meaningless, but you can use the UCX_NET_DEVICES to

Re: [OMPI users] Openmpi5.0.7 causes fatal timeout on last rank

2025-07-02 Thread 'George Bosilca' via Open MPI users
UCX 1.8 or UCX 1.18 ? Your application does not exchange any data so it is possible that MPICH behavior differs from OMPI (aka not creating connections vs creating them during MPI_Init). That's why running a slightly different version of the hello_world with a barrier would clarify the connection'