Re: [OMPI users] (Fedora 34, x86_64-pc-linux-gnu, openmpi-4.1.1.tar.gz): PML ucx cannot be selected

2021-09-14 Thread Jorge D'Elia via users
Dear Gilles,

Great! By the way, there was only one missing value about the 
ucx device to use in the second flag suggested by you, e.g. 
I had to use:

mpirun --mca pml_ucx_tls any --mca pml_ucx_devices any --mca pml ucx ...

with the release 4.1.1 to recover the performance of the 
previous one 4.1.0.

Thank you very much for your time in responding 
and for the link:

https://github.com/open-mpi/ompi/pull/8549

where the changes were introduced.

Perhaps, it would help to add a short note (or 
links to official or unofficial pages) about the 
optional UCX environment parameters in the manual 
for the mpirun command?


Greetings. 
Jorge.

- Mensaje original -
> De: "Gilles Gouaillardet" 
> Para: "Jorge D'Elia" , "Open MPI Users" 
> Enviado: Lunes, 13 de Septiembre 2021 9:35:34
> Asunto: Re: [OMPI users] (Fedora 34, x86_64-pc-linux-gnu, 
> openmpi-4.1.1.tar.gz): PML ucx cannot be selected
>
> Jorge,
> 
> I am not that familiar with UCX, but I hope that will help:
> 
> The changes I mentioned were introduced by
> https://github.com/open-mpi/ompi/pull/8549
> 
> I suspect mpirun --mca pml_ucx_tls any --mca pml_ucx_devices --mca pml ucx
> ...
> 
> will do what you expect
> 
> Cheers,
> 
> Gilles
>
> On Mon, Sep 13, 2021 at 9:05 PM Jorge D'Elia via users <
> users@lists.open-mpi.org> wrote:
> 
>> Dear Gilles,
>>
>> Despite my last answer (see below), I am noticing that
>> some tests with a coarray fortran code on a laptop show a
>> performance drop of the order of 20% using the 4.1.1 version
>> (with --mca pml ucx disabled), versus the 4.1.0 one
>> (with --mca pml ucx enabled).
>>
>> I would like to experiment with pml/ucx framework using the 4.1.0
>> version on that laptop. Then, please, how do I manually re-enable
>> those providers? (e.g. perhaps, is it during the construction
>> stage?) or where can I find out how to do it? Thanks in advance.
>>
>> Regards.
>> Jorge.
>>
>> ----- Mensaje original -
>> > De: "Open MPI Users" 
>> > Para: "Open MPI Users" 
>> > CC: "Jorge D'Elia"
>> > Enviado: Sábado, 29 de Mayo 2021 7:18:23
>> > Asunto: Re: [OMPI users] (Fedora 34, x86_64-pc-linux-gnu,
>> openmpi-4.1.1.tar.gz): PML ucx cannot be selected
>> >
>> > Dear Gilles,
>> >
>> > Ahhh ... now the new behavior is better understood.
>> > The intention of using pml/ucx was simply for preliminary
>> > testing, and does not merit re-enabling these providers in
>> > this notebook.
>> >
>> > Thank you very much for the clarification.
>> >
>> > Regards,
>> > Jorge.
>> >
>> > - Mensaje original -
>> >> De: "Gilles Gouaillardet"
>> >> Para: "Jorge D'Elia" , "Open MPI Users" 
>> >> Enviado: Viernes, 28 de Mayo 2021 23:35:37
>> >> Asunto: Re: [OMPI users] (Fedora 34, x86_64-pc-linux-gnu,
>> openmpi-4.1.1.tar.gz):
>> >> PML ucx cannot be selected
>> >>
>> >> Jorge,
>> >>
>> >> pml/ucx used to be selected when no fast interconnect were detected
>> >> (since ucx provides driver for both TCP and shared memory).
>> >> These providers are now disabled by default, so unless your machine
>> >> has a supported fast interconnect (such as Infiniband),
>> >> pml/ucx cannot be used out of the box anymore.
>> >>
>> >> if you really want to use pml/ucx on your notebook, you need to
>> >> manually re-enable these providers.
>> >>
>> >> That being said, your best choice here is really not to force any pml,
>> >> and let Open MPI use pml/ob1
>> >> (that has support for both TCP and shared memory)
>> >>
>> >> Cheers,
>> >>
>> >> Gilles
>> >>
>> >> On Sat, May 29, 2021 at 11:19 AM Jorge D'Elia via users
>> >>  wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> We routinely build OpenMPI on x86_64-pc-linux-gnu machines from
>> >>> the sources using gcc and usually everything works fine.
>> >>>
>> >>> In one case we recently installed Fedora 34 from scratch on an
>> >>> ASUS G53SX notebook (Intel Core i7-2630QM CPU 2.00GHz ×4 cores,
>> >>> without any IB device). Next we build OpenMPI using the file
>> >>> openmpi-4.1.1.tar.gz and the GCC 12.0.0 20210524 (experimental)
>> >>> compiler.
>> >>>
>&g

Re: [OMPI users] (Fedora 34, x86_64-pc-linux-gnu, openmpi-4.1.1.tar.gz): PML ucx cannot be selected

2021-09-13 Thread Gilles Gouaillardet via users
Jorge,

I am not that familiar with UCX, but I hope that will help:

The changes I mentioned were introduced by
https://github.com/open-mpi/ompi/pull/8549

I suspect mpirun --mca pml_ucx_tls any --mca pml_ucx_devices --mca pml ucx
...

will do what you expect


Cheers,

Gilles

On Mon, Sep 13, 2021 at 9:05 PM Jorge D'Elia via users <
users@lists.open-mpi.org> wrote:

> Dear Gilles,
>
> Despite my last answer (see below), I am noticing that
> some tests with a coarray fortran code on a laptop show a
> performance drop of the order of 20% using the 4.1.1 version
> (with --mca pml ucx disabled), versus the 4.1.0 one
> (with --mca pml ucx enabled).
>
> I would like to experiment with pml/ucx framework using the 4.1.0
> version on that laptop. Then, please, how do I manually re-enable
> those providers? (e.g. perhaps, is it during the construction
> stage?) or where can I find out how to do it? Thanks in advance.
>
> Regards.
> Jorge.
>
> - Mensaje original -
> > De: "Open MPI Users" 
> > Para: "Open MPI Users" 
> > CC: "Jorge D'Elia"
> > Enviado: Sábado, 29 de Mayo 2021 7:18:23
> > Asunto: Re: [OMPI users] (Fedora 34, x86_64-pc-linux-gnu,
> openmpi-4.1.1.tar.gz): PML ucx cannot be selected
> >
> > Dear Gilles,
> >
> > Ahhh ... now the new behavior is better understood.
> > The intention of using pml/ucx was simply for preliminary
> > testing, and does not merit re-enabling these providers in
> > this notebook.
> >
> > Thank you very much for the clarification.
> >
> > Regards,
> > Jorge.
> >
> > ----- Mensaje original -
> >> De: "Gilles Gouaillardet"
> >> Para: "Jorge D'Elia" , "Open MPI Users" 
> >> Enviado: Viernes, 28 de Mayo 2021 23:35:37
> >> Asunto: Re: [OMPI users] (Fedora 34, x86_64-pc-linux-gnu,
> openmpi-4.1.1.tar.gz):
> >> PML ucx cannot be selected
> >>
> >> Jorge,
> >>
> >> pml/ucx used to be selected when no fast interconnect were detected
> >> (since ucx provides driver for both TCP and shared memory).
> >> These providers are now disabled by default, so unless your machine
> >> has a supported fast interconnect (such as Infiniband),
> >> pml/ucx cannot be used out of the box anymore.
> >>
> >> if you really want to use pml/ucx on your notebook, you need to
> >> manually re-enable these providers.
> >>
> >> That being said, your best choice here is really not to force any pml,
> >> and let Open MPI use pml/ob1
> >> (that has support for both TCP and shared memory)
> >>
> >> Cheers,
> >>
> >> Gilles
> >>
> >> On Sat, May 29, 2021 at 11:19 AM Jorge D'Elia via users
> >>  wrote:
> >>>
> >>> Hi,
> >>>
> >>> We routinely build OpenMPI on x86_64-pc-linux-gnu machines from
> >>> the sources using gcc and usually everything works fine.
> >>>
> >>> In one case we recently installed Fedora 34 from scratch on an
> >>> ASUS G53SX notebook (Intel Core i7-2630QM CPU 2.00GHz ×4 cores,
> >>> without any IB device). Next we build OpenMPI using the file
> >>> openmpi-4.1.1.tar.gz and the GCC 12.0.0 20210524 (experimental)
> >>> compiler.
> >>>
> >>> However, when trying to experiment OpenMPI using UCX
> >>> with a simple test, we get the runtime errors:
> >>>
> >>>   No components were able to be opened in the btl framework.
> >>>   PML ucx cannot be selected
> >>>
> >>> while the test worked fine until Fedora 33 on the same
> >>> machine using the same OpenMPI configuration.
> >>>
> >>> We attach below some info about a simple test run.
> >>>
> >>> Please, any clues where to check or maybe something is missing?
> >>> Thanks in advance.
> >>>
> >>> Regards
> >>> Jorge.
> >>>
> >>> --
> >>> $ cat /proc/version
> >>> Linux version 5.12.7-300.fc34.x86_64
> >>> (mockbu...@bkernel01.iad2.fedoraproject.org) (gcc (GCC) 11.1.1
> 20210428 (Red
> >>> Hat 11.1.1-1), GNU ld version 2.35.1-41.fc34) #1 SMP Wed May 26
> 12:58:58 UTC
> >>> 2021
> >>>
> >>> $ mpifort --version
> >>> GNU Fortran (GCC) 12.0.0 20210524 (experimental)
> >>> Copyright (C) 2021 Free Software Foundation, Inc.
> >>>
> >>> $ which mpifort
> >>> /usr/beta/openmpi

Re: [OMPI users] (Fedora 34, x86_64-pc-linux-gnu, openmpi-4.1.1.tar.gz): PML ucx cannot be selected

2021-09-13 Thread Jorge D'Elia via users
Dear Gilles,

Despite my last answer (see below), I am noticing that 
some tests with a coarray fortran code on a laptop show a 
performance drop of the order of 20% using the 4.1.1 version 
(with --mca pml ucx disabled), versus the 4.1.0 one 
(with --mca pml ucx enabled).

I would like to experiment with pml/ucx framework using the 4.1.0 
version on that laptop. Then, please, how do I manually re-enable 
those providers? (e.g. perhaps, is it during the construction 
stage?) or where can I find out how to do it? Thanks in advance.

Regards.
Jorge.

- Mensaje original -
> De: "Open MPI Users" 
> Para: "Open MPI Users" 
> CC: "Jorge D'Elia" 
> Enviado: Sábado, 29 de Mayo 2021 7:18:23
> Asunto: Re: [OMPI users] (Fedora 34, x86_64-pc-linux-gnu, 
> openmpi-4.1.1.tar.gz): PML ucx cannot be selected
>
> Dear Gilles,
> 
> Ahhh ... now the new behavior is better understood.
> The intention of using pml/ucx was simply for preliminary
> testing, and does not merit re-enabling these providers in
> this notebook.
> 
> Thank you very much for the clarification.
> 
> Regards,
> Jorge.
> 
> - Mensaje original -
>> De: "Gilles Gouaillardet"
>> Para: "Jorge D'Elia" , "Open MPI Users" 
>> Enviado: Viernes, 28 de Mayo 2021 23:35:37
>> Asunto: Re: [OMPI users] (Fedora 34, x86_64-pc-linux-gnu, 
>> openmpi-4.1.1.tar.gz):
>> PML ucx cannot be selected
>>
>> Jorge,
>> 
>> pml/ucx used to be selected when no fast interconnect were detected
>> (since ucx provides driver for both TCP and shared memory).
>> These providers are now disabled by default, so unless your machine
>> has a supported fast interconnect (such as Infiniband),
>> pml/ucx cannot be used out of the box anymore.
>> 
>> if you really want to use pml/ucx on your notebook, you need to
>> manually re-enable these providers.
>> 
>> That being said, your best choice here is really not to force any pml,
>> and let Open MPI use pml/ob1
>> (that has support for both TCP and shared memory)
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> On Sat, May 29, 2021 at 11:19 AM Jorge D'Elia via users
>>  wrote:
>>>
>>> Hi,
>>>
>>> We routinely build OpenMPI on x86_64-pc-linux-gnu machines from
>>> the sources using gcc and usually everything works fine.
>>>
>>> In one case we recently installed Fedora 34 from scratch on an
>>> ASUS G53SX notebook (Intel Core i7-2630QM CPU 2.00GHz ×4 cores,
>>> without any IB device). Next we build OpenMPI using the file
>>> openmpi-4.1.1.tar.gz and the GCC 12.0.0 20210524 (experimental)
>>> compiler.
>>>
>>> However, when trying to experiment OpenMPI using UCX
>>> with a simple test, we get the runtime errors:
>>>
>>>   No components were able to be opened in the btl framework.
>>>   PML ucx cannot be selected
>>>
>>> while the test worked fine until Fedora 33 on the same
>>> machine using the same OpenMPI configuration.
>>>
>>> We attach below some info about a simple test run.
>>>
>>> Please, any clues where to check or maybe something is missing?
>>> Thanks in advance.
>>>
>>> Regards
>>> Jorge.
>>>
>>> --
>>> $ cat /proc/version
>>> Linux version 5.12.7-300.fc34.x86_64
>>> (mockbu...@bkernel01.iad2.fedoraproject.org) (gcc (GCC) 11.1.1 20210428 (Red
>>> Hat 11.1.1-1), GNU ld version 2.35.1-41.fc34) #1 SMP Wed May 26 12:58:58 UTC
>>> 2021
>>>
>>> $ mpifort --version
>>> GNU Fortran (GCC) 12.0.0 20210524 (experimental)
>>> Copyright (C) 2021 Free Software Foundation, Inc.
>>>
>>> $ which mpifort
>>> /usr/beta/openmpi/bin/mpifort
>>>
>>> $ mpifort -o hello_usempi_f08.exe hello_usempi_f08.f90
>>>
>>> $ mpirun --mca orte_base_help_aggregate 0 --mca btl self,vader,tcp --map-by 
>>> node
>>> --report-bindings --machinefile ~/machi-openmpi.dat --np 2
>>> hello_usempi_f08.exe
>>> [verne:200650] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././.]
>>> [verne:200650] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./.]
>>> Hello, world, I am  0 of  2: Open MPI v4.1.1, package: Open MPI 
>>> bigpack@verne
>>> Distribution, ident: 4.1.1, repo rev: v4.1.1, Apr 24, 2021
>>> Hello, world, I am  1 of  2: Open MPI v4.1.1, package: Open MPI 
>>> bigpack@verne
>>> Distribution, ident: 4.1.1, repo rev: 

Re: [OMPI users] (Fedora 34, x86_64-pc-linux-gnu, openmpi-4.1.1.tar.gz): PML ucx cannot be selected

2021-05-29 Thread Jorge D'Elia via users
Dear Gilles,

Ahhh ... now the new behavior is better understood. 
The intention of using pml/ucx was simply for preliminary 
testing, and does not merit re-enabling these providers in 
this notebook. 

Thank you very much for the clarification. 

Regards,
Jorge.

- Mensaje original -
> De: "Gilles Gouaillardet" 
> Para: "Jorge D'Elia" , "Open MPI Users" 
> Enviado: Viernes, 28 de Mayo 2021 23:35:37
> Asunto: Re: [OMPI users] (Fedora 34, x86_64-pc-linux-gnu, 
> openmpi-4.1.1.tar.gz): PML ucx cannot be selected
>
> Jorge,
> 
> pml/ucx used to be selected when no fast interconnect were detected
> (since ucx provides driver for both TCP and shared memory).
> These providers are now disabled by default, so unless your machine
> has a supported fast interconnect (such as Infiniband),
> pml/ucx cannot be used out of the box anymore.
> 
> if you really want to use pml/ucx on your notebook, you need to
> manually re-enable these providers.
> 
> That being said, your best choice here is really not to force any pml,
> and let Open MPI use pml/ob1
> (that has support for both TCP and shared memory)
> 
> Cheers,
> 
> Gilles
> 
> On Sat, May 29, 2021 at 11:19 AM Jorge D'Elia via users
>  wrote:
>>
>> Hi,
>>
>> We routinely build OpenMPI on x86_64-pc-linux-gnu machines from
>> the sources using gcc and usually everything works fine.
>>
>> In one case we recently installed Fedora 34 from scratch on an
>> ASUS G53SX notebook (Intel Core i7-2630QM CPU 2.00GHz ×4 cores,
>> without any IB device). Next we build OpenMPI using the file
>> openmpi-4.1.1.tar.gz and the GCC 12.0.0 20210524 (experimental)
>> compiler.
>>
>> However, when trying to experiment OpenMPI using UCX
>> with a simple test, we get the runtime errors:
>>
>>   No components were able to be opened in the btl framework.
>>   PML ucx cannot be selected
>>
>> while the test worked fine until Fedora 33 on the same
>> machine using the same OpenMPI configuration.
>>
>> We attach below some info about a simple test run.
>>
>> Please, any clues where to check or maybe something is missing?
>> Thanks in advance.
>>
>> Regards
>> Jorge.
>>
>> --
>> $ cat /proc/version
>> Linux version 5.12.7-300.fc34.x86_64
>> (mockbu...@bkernel01.iad2.fedoraproject.org) (gcc (GCC) 11.1.1 20210428 (Red
>> Hat 11.1.1-1), GNU ld version 2.35.1-41.fc34) #1 SMP Wed May 26 12:58:58 UTC
>> 2021
>>
>> $ mpifort --version
>> GNU Fortran (GCC) 12.0.0 20210524 (experimental)
>> Copyright (C) 2021 Free Software Foundation, Inc.
>>
>> $ which mpifort
>> /usr/beta/openmpi/bin/mpifort
>>
>> $ mpifort -o hello_usempi_f08.exe hello_usempi_f08.f90
>>
>> $ mpirun --mca orte_base_help_aggregate 0 --mca btl self,vader,tcp --map-by 
>> node
>> --report-bindings --machinefile ~/machi-openmpi.dat --np 2
>> hello_usempi_f08.exe
>> [verne:200650] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././.]
>> [verne:200650] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./.]
>> Hello, world, I am  0 of  2: Open MPI v4.1.1, package: Open MPI bigpack@verne
>> Distribution, ident: 4.1.1, repo rev: v4.1.1, Apr 24, 2021
>> Hello, world, I am  1 of  2: Open MPI v4.1.1, package: Open MPI bigpack@verne
>> Distribution, ident: 4.1.1, repo rev: v4.1.1, Apr 24, 2021
>>
>> $ mpirun --mca orte_base_help_aggregate 0 --mca pml ucx --mca btl
>> ^self,vader,tcp --map-by node --report-bindings --machinefile
>> ~/machi-openmpi.dat --np 2  hello_usempi_f08.exe
>> [verne:200772] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././.]
>> [verne:200772] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./.]
>> --
>> No components were able to be opened in the btl framework.
>>
>> This typically means that either no components of this type were
>> installed, or none of the installed components can be loaded.
>> Sometimes this means that shared libraries required by these
>> components are unable to be found/loaded.
>>
>>   Host:  verne
>>   Framework: btl
>> --
>> --
>> No components were able to be opened in the btl framework.
>>
>> This typically means that either no components of this type were
>> installed, or none of the installed components can be loaded.
>> Sometimes this means that shared libraries required by these
>> component

Re: [OMPI users] (Fedora 34, x86_64-pc-linux-gnu, openmpi-4.1.1.tar.gz): PML ucx cannot be selected

2021-05-28 Thread Gilles Gouaillardet via users
Jorge,

pml/ucx used to be selected when no fast interconnect were detected
(since ucx provides driver for both TCP and shared memory).
These providers are now disabled by default, so unless your machine
has a supported fast interconnect (such as Infiniband),
pml/ucx cannot be used out of the box anymore.

if you really want to use pml/ucx on your notebook, you need to
manually re-enable these providers.

That being said, your best choice here is really not to force any pml,
and let Open MPI use pml/ob1
(that has support for both TCP and shared memory)

Cheers,

Gilles

On Sat, May 29, 2021 at 11:19 AM Jorge D'Elia via users
 wrote:
>
> Hi,
>
> We routinely build OpenMPI on x86_64-pc-linux-gnu machines from
> the sources using gcc and usually everything works fine.
>
> In one case we recently installed Fedora 34 from scratch on an
> ASUS G53SX notebook (Intel Core i7-2630QM CPU 2.00GHz ×4 cores,
> without any IB device). Next we build OpenMPI using the file
> openmpi-4.1.1.tar.gz and the GCC 12.0.0 20210524 (experimental)
> compiler.
>
> However, when trying to experiment OpenMPI using UCX
> with a simple test, we get the runtime errors:
>
>   No components were able to be opened in the btl framework.
>   PML ucx cannot be selected
>
> while the test worked fine until Fedora 33 on the same
> machine using the same OpenMPI configuration.
>
> We attach below some info about a simple test run.
>
> Please, any clues where to check or maybe something is missing?
> Thanks in advance.
>
> Regards
> Jorge.
>
> --
> $ cat /proc/version
> Linux version 5.12.7-300.fc34.x86_64 
> (mockbu...@bkernel01.iad2.fedoraproject.org) (gcc (GCC) 11.1.1 20210428 (Red 
> Hat 11.1.1-1), GNU ld version 2.35.1-41.fc34) #1 SMP Wed May 26 12:58:58 UTC 
> 2021
>
> $ mpifort --version
> GNU Fortran (GCC) 12.0.0 20210524 (experimental)
> Copyright (C) 2021 Free Software Foundation, Inc.
>
> $ which mpifort
> /usr/beta/openmpi/bin/mpifort
>
> $ mpifort -o hello_usempi_f08.exe hello_usempi_f08.f90
>
> $ mpirun --mca orte_base_help_aggregate 0 --mca btl self,vader,tcp --map-by 
> node --report-bindings --machinefile ~/machi-openmpi.dat --np 2  
> hello_usempi_f08.exe
> [verne:200650] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././.]
> [verne:200650] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./.]
> Hello, world, I am  0 of  2: Open MPI v4.1.1, package: Open MPI bigpack@verne 
> Distribution, ident: 4.1.1, repo rev: v4.1.1, Apr 24, 2021
> Hello, world, I am  1 of  2: Open MPI v4.1.1, package: Open MPI bigpack@verne 
> Distribution, ident: 4.1.1, repo rev: v4.1.1, Apr 24, 2021
>
> $ mpirun --mca orte_base_help_aggregate 0 --mca pml ucx --mca btl 
> ^self,vader,tcp --map-by node --report-bindings --machinefile 
> ~/machi-openmpi.dat --np 2  hello_usempi_f08.exe
> [verne:200772] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././.]
> [verne:200772] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./.]
> --
> No components were able to be opened in the btl framework.
>
> This typically means that either no components of this type were
> installed, or none of the installed components can be loaded.
> Sometimes this means that shared libraries required by these
> components are unable to be found/loaded.
>
>   Host:  verne
>   Framework: btl
> --
> --
> No components were able to be opened in the btl framework.
>
> This typically means that either no components of this type were
> installed, or none of the installed components can be loaded.
> Sometimes this means that shared libraries required by these
> components are unable to be found/loaded.
>
>   Host:  verne
>   Framework: btl
> --
> --
> No components were able to be opened in the pml framework.
>
> This typically means that either no components of this type were
> installed, or none of the installed components can be loaded.
> Sometimes this means that shared libraries required by these
> components are unable to be found/loaded.
>
>   Host:  verne
>   Framework: pml
> --
> [verne:200777] PML ucx cannot be selected
> --
> No components were able to be opened in the pml framework.
>
> This typically means that either no components of this type were
> installed, or none of the installed components can be loaded.
> Sometimes this means that shared libraries required by these
> components are unable to be found/loaded.
>
>   Host:  verne
>   Framework: pml
> --
> [verne:200772] PMIX ERROR: 

[OMPI users] (Fedora 34, x86_64-pc-linux-gnu, openmpi-4.1.1.tar.gz): PML ucx cannot be selected

2021-05-28 Thread Jorge D'Elia via users
Hi,

We routinely build OpenMPI on x86_64-pc-linux-gnu machines from 
the sources using gcc and usually everything works fine.

In one case we recently installed Fedora 34 from scratch on an
ASUS G53SX notebook (Intel Core i7-2630QM CPU 2.00GHz ×4 cores, 
without any IB device). Next we build OpenMPI using the file 
openmpi-4.1.1.tar.gz and the GCC 12.0.0 20210524 (experimental) 
compiler.

However, when trying to experiment OpenMPI using UCX 
with a simple test, we get the runtime errors:

  No components were able to be opened in the btl framework.
  PML ucx cannot be selected

while the test worked fine until Fedora 33 on the same 
machine using the same OpenMPI configuration.

We attach below some info about a simple test run.

Please, any clues where to check or maybe something is missing?
Thanks in advance.

Regards
Jorge.

--
$ cat /proc/version
Linux version 5.12.7-300.fc34.x86_64 
(mockbu...@bkernel01.iad2.fedoraproject.org) (gcc (GCC) 11.1.1 20210428 (Red 
Hat 11.1.1-1), GNU ld version 2.35.1-41.fc34) #1 SMP Wed May 26 12:58:58 UTC 
2021

$ mpifort --version
GNU Fortran (GCC) 12.0.0 20210524 (experimental)
Copyright (C) 2021 Free Software Foundation, Inc.

$ which mpifort
/usr/beta/openmpi/bin/mpifort

$ mpifort -o hello_usempi_f08.exe hello_usempi_f08.f90

$ mpirun --mca orte_base_help_aggregate 0 --mca btl self,vader,tcp --map-by 
node --report-bindings --machinefile ~/machi-openmpi.dat --np 2  
hello_usempi_f08.exe
[verne:200650] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././.]
[verne:200650] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./.]
Hello, world, I am  0 of  2: Open MPI v4.1.1, package: Open MPI bigpack@verne 
Distribution, ident: 4.1.1, repo rev: v4.1.1, Apr 24, 2021                      
                                             
Hello, world, I am  1 of  2: Open MPI v4.1.1, package: Open MPI bigpack@verne 
Distribution, ident: 4.1.1, repo rev: v4.1.1, Apr 24, 2021
                                                                                
                       
$ mpirun --mca orte_base_help_aggregate 0 --mca pml ucx --mca btl 
^self,vader,tcp --map-by node --report-bindings --machinefile 
~/machi-openmpi.dat --np 2  hello_usempi_f08.exe
[verne:200772] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././.]
[verne:200772] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./.]
--
No components were able to be opened in the btl framework.

This typically means that either no components of this type were
installed, or none of the installed components can be loaded.
Sometimes this means that shared libraries required by these
components are unable to be found/loaded.

  Host:      verne
  Framework: btl
--
--
No components were able to be opened in the btl framework.

This typically means that either no components of this type were
installed, or none of the installed components can be loaded.
Sometimes this means that shared libraries required by these
components are unable to be found/loaded.

  Host:      verne
  Framework: btl
--
--
No components were able to be opened in the pml framework.

This typically means that either no components of this type were
installed, or none of the installed components can be loaded.
Sometimes this means that shared libraries required by these
components are unable to be found/loaded.

  Host:      verne
  Framework: pml
--
[verne:200777] PML ucx cannot be selected
--
No components were able to be opened in the pml framework.

This typically means that either no components of this type were
installed, or none of the installed components can be loaded.
Sometimes this means that shared libraries required by these
components are unable to be found/loaded.

  Host:      verne
  Framework: pml
--
[verne:200772] PMIX ERROR: UNREACHABLE in file 
../../../../../../../opal/mca/pmix/pmix3x/pmix/src/server/pmix_server.c at line 
2198


$ ompi_info | grep ucx
  Configure command line: '--enable-ipv6' '--enable-sparse-groups' 
'--enable-mpi-ext' '--enable-mpi-cxx' '--enable-oshmem' 
'--with-libevent=internal' '--with-ucx' '--with-pmix=internal' 
'--without-libfabric' '--prefix=/usr/beta/openmpi'
                 MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v4.1.1)
                 MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component
                 v4.1.1)

$ ompi_info --param all all --level 9 | grep ucx
                 MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v4.1.1)