Jorge,
pml/ucx used to be selected when no fast interconnect were detected
(since ucx provides driver for both TCP and shared memory).
These providers are now disabled by default, so unless your machine
has a supported fast interconnect (such as Infiniband),
pml/ucx cannot be used out of the box anymore.
if you really want to use pml/ucx on your notebook, you need to
manually re-enable these providers.
That being said, your best choice here is really not to force any pml,
and let Open MPI use pml/ob1
(that has support for both TCP and shared memory)
Cheers,
Gilles
On Sat, May 29, 2021 at 11:19 AM Jorge D'Elia via users
wrote:
>
> Hi,
>
> We routinely build OpenMPI on x86_64-pc-linux-gnu machines from
> the sources using gcc and usually everything works fine.
>
> In one case we recently installed Fedora 34 from scratch on an
> ASUS G53SX notebook (Intel Core i7-2630QM CPU 2.00GHz ×4 cores,
> without any IB device). Next we build OpenMPI using the file
> openmpi-4.1.1.tar.gz and the GCC 12.0.0 20210524 (experimental)
> compiler.
>
> However, when trying to experiment OpenMPI using UCX
> with a simple test, we get the runtime errors:
>
> No components were able to be opened in the btl framework.
> PML ucx cannot be selected
>
> while the test worked fine until Fedora 33 on the same
> machine using the same OpenMPI configuration.
>
> We attach below some info about a simple test run.
>
> Please, any clues where to check or maybe something is missing?
> Thanks in advance.
>
> Regards
> Jorge.
>
> --
> $ cat /proc/version
> Linux version 5.12.7-300.fc34.x86_64
> (mockbu...@bkernel01.iad2.fedoraproject.org) (gcc (GCC) 11.1.1 20210428 (Red
> Hat 11.1.1-1), GNU ld version 2.35.1-41.fc34) #1 SMP Wed May 26 12:58:58 UTC
> 2021
>
> $ mpifort --version
> GNU Fortran (GCC) 12.0.0 20210524 (experimental)
> Copyright (C) 2021 Free Software Foundation, Inc.
>
> $ which mpifort
> /usr/beta/openmpi/bin/mpifort
>
> $ mpifort -o hello_usempi_f08.exe hello_usempi_f08.f90
>
> $ mpirun --mca orte_base_help_aggregate 0 --mca btl self,vader,tcp --map-by
> node --report-bindings --machinefile ~/machi-openmpi.dat --np 2
> hello_usempi_f08.exe
> [verne:200650] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././.]
> [verne:200650] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./.]
> Hello, world, I am 0 of 2: Open MPI v4.1.1, package: Open MPI bigpack@verne
> Distribution, ident: 4.1.1, repo rev: v4.1.1, Apr 24, 2021
> Hello, world, I am 1 of 2: Open MPI v4.1.1, package: Open MPI bigpack@verne
> Distribution, ident: 4.1.1, repo rev: v4.1.1, Apr 24, 2021
>
> $ mpirun --mca orte_base_help_aggregate 0 --mca pml ucx --mca btl
> ^self,vader,tcp --map-by node --report-bindings --machinefile
> ~/machi-openmpi.dat --np 2 hello_usempi_f08.exe
> [verne:200772] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././.]
> [verne:200772] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./.]
> --
> No components were able to be opened in the btl framework.
>
> This typically means that either no components of this type were
> installed, or none of the installed components can be loaded.
> Sometimes this means that shared libraries required by these
> components are unable to be found/loaded.
>
> Host: verne
> Framework: btl
> --
> --
> No components were able to be opened in the btl framework.
>
> This typically means that either no components of this type were
> installed, or none of the installed components can be loaded.
> Sometimes this means that shared libraries required by these
> components are unable to be found/loaded.
>
> Host: verne
> Framework: btl
> --
> --
> No components were able to be opened in the pml framework.
>
> This typically means that either no components of this type were
> installed, or none of the installed components can be loaded.
> Sometimes this means that shared libraries required by these
> components are unable to be found/loaded.
>
> Host: verne
> Framework: pml
> --
> [verne:200777] PML ucx cannot be selected
> --
> No components were able to be opened in the pml framework.
>
> This typically means that either no components of this type were
> installed, or none of the installed components can be loaded.
> Sometimes this means that shared libraries required by these
> components are unable to be found/loaded.
>
> Host: verne
> Framework: pml
> --
> [verne:200772] PMIX ERROR: