Re: [OMPI packagers] Recommended combinations of pmix/prrte/openmpi

2023-03-20 Thread Ralph Castain
FWIW: I was told today that PMIx v4.2.4 and PRRTE v3.0.1 would be the minimum 
versions for OMPI v5.0. We are working to coordinate the releases of those 
three packages in the fairly near future.


> On Mar 20, 2023, at 4:35 PM, Orion Poplawski  wrote:
> 
> Thanks for the information.  I'll start testing with the various RCs for pmix 
> 4.2.4, prrte 3.0.1, and openmpi 5.0.0.
> 
> On 3/18/23 20:14, Ralph Castain wrote:
>> First, sorry for all the confusion! I know it is a bit to parse as we 
>> transition from OMPI v4 and earlier (which had its own embedded runtime 
>> called ORTE) to OMPI v5, which includes a 3rd-party runtime called PRRTE. 
>> You never split ORTE out of those earlier OMPI versions and so these 
>> compatibility issues didn't previously exist. But now they will, starting 
>> with OMPI v5.
>> If it is any consolation, we are also trying to work our way thru all these 
>> combinations. It places constraints on multiple parties, especially as PRRTE 
>> and PMIx are used by a number of other parties not named OMPI. So none of us 
>> involved in the different projects enjoy total freedom, and we are still 
>> figuring out how to manage all the moving pieces.
>> Bottom line is that you don't need to be worrying about PRRTE for OMPI v4 
>> and earlier. It is orthogonal to those releases.
>> We have previously identified a problem with OMPI's PMIx integration in the 
>> v4.x series and a patch has been filed for it. This is the source of the 
>> issue you cite. It isn't a PMIx problem, but rather a case of replacing a 
>> previously non-standard function call with the standardized macro. The patch 
>> (see 
>> https://github.com/open-mpi/ompi/pull/11472)
>>  fixes it for all PMIx releases (both looking backwards and going forwards).
>> Looking forward to OMPI v5:
>> * PRRTE 2.x is not something that will be supported. Frankly, we do not 
>> recommend anyone use that series.
>> * OMPI v5 is going to require a minimum of PRRTE v3.0.1, although OMPI v5 
>> might include v3.1.0 (the precise release to be included in the OMPI v5 code 
>> remains TBD). Either PRRTE version will require a minimum of PMIx v4.2.4 to 
>> properly function
>> * OMPI v5 will require a minimum of PMIx v4.2.4 - OMPI might ship with the 
>> PMIx v5.0.0 release, but it will still work with v4.2.4
>> The problems you cite are unfortunately expected and part of the learning 
>> process. Part of the delay in releasing OMPI v5 has been a result of trying 
>> to resolve the logistics - hopefully, we are converging on a stable system.
>> Ralph
>>> On Mar 18, 2023, at 4:24 PM, Orion Poplawski  wrote:
>>> 
>>> What are the current recommendations for compatible combinations of 
>>> pmix/prrte/openmpi?
>>> 
>>> I'm looking into updating each of these in Fedora and running into a couple 
>>> of issues.  Currently in Fedora Rawhide we have:
>>> 
>>> - pmix 4.1.2
>>> - prrte 2.0.2
>>> - openmpi 4.1.5
>>> 
>>> After updating to pmix 4.2.3 I see the following:
>>> 
>>> - openmpi programs fail to run with:
>>> 
>>> mca_base_component_repository_open: unable to open mca_pmix_ext3x: 
>>> /usr/lib64/openmpi/lib/openmpi/mca_pmix_ext3x.so: undefined symbol: 
>>> pmix_value_load (ignored)
>>> [[3260,0],0] ORTE_ERROR_LOG: Not found in file ess_hnp_module.c at line 320
>>> --
>>> It looks like orte_init failed for some reason; your parallel process is
>>> likely to abort.  There are many reasons that a parallel process can
>>> fail during orte_init; some of which are due to configuration or
>>> environment problems.  This failure appears to be an internal failure;
>>> here's some additional information (which may only be relevant to an
>>> Open MPI developer):
>>> 
>>>  opal_pmix_base_select failed
>>>  --> Returned value Not found (-13) instead of ORTE_SUCCESS
>>> 
>>> 
>>> - prrte 2.0.2 fails to build with:
>>> 
>>> make[2]: Entering directory 
>>> '/builddir/build/BUILD/prrte-2.0.2/src/tools/prted'
>>> /bin/sh ../../../libtool  --tag=CC   --mode=link gcc 
>>> -DPRTE_CONFIGURE_USER="\"mockbuild\"" 
>>> -DPRTE_CONFIGURE_HOST="\"2188cae5486f485888615d01f56cd6c9\"" 
>>> -DPRTE_CONFIGURE_DATE="\"Fri Jan 20 00:00:00 UTC 2023\"" 
>>> -DPRTE_BUILD_USER="\"$USER\"" -DPRTE_BUILD_HOST="\"${HOSTNAME:-`(hostname 
>>> || uname -n) | sed 1q`}\"" 
>>> -DPRTE_BUILD_DATE="\"`../../../config/getdate.sh`\"" 
>>> -DPRTE_BUILD_CFLAGS="\"-DNDEBUG -O2 -flto=auto -ffat-lto-objects 
>>> -fexceptions -g -grecord-gcc-switches -pipe 
>>> -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong 
>>> -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic 
>>> -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection 
>>> -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -finline-functions 
>>> -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 
>>> -Wp,-D_GLIBCXX_ASSERTIONS\"" 
>>> 

Re: [OMPI packagers] Recommended combinations of pmix/prrte/openmpi

2023-03-20 Thread Orion Poplawski
Thanks for the information.  I'll start testing with the various RCs for 
pmix 4.2.4, prrte 3.0.1, and openmpi 5.0.0.


On 3/18/23 20:14, Ralph Castain wrote:
First, sorry for all the confusion! I know it is a bit to parse as we 
transition from OMPI v4 and earlier (which had its own embedded runtime 
called ORTE) to OMPI v5, which includes a 3rd-party runtime called 
PRRTE. You never split ORTE out of those earlier OMPI versions and so 
these compatibility issues didn't previously exist. But now they will, 
starting with OMPI v5.


If it is any consolation, we are also trying to work our way thru all 
these combinations. It places constraints on multiple parties, 
especially as PRRTE and PMIx are used by a number of other parties not 
named OMPI. So none of us involved in the different projects enjoy total 
freedom, and we are still figuring out how to manage all the moving pieces.


Bottom line is that you don't need to be worrying about PRRTE for OMPI 
v4 and earlier. It is orthogonal to those releases.


We have previously identified a problem with OMPI's PMIx integration in 
the v4.x series and a patch has been filed for it. This is the source of 
the issue you cite. It isn't a PMIx problem, but rather a case of 
replacing a previously non-standard function call with the standardized 
macro. The patch (see https://github.com/open-mpi/ompi/pull/11472 
) fixes it for all PMIx 
releases (both looking backwards and going forwards).


Looking forward to OMPI v5:

* PRRTE 2.x is not something that will be supported. Frankly, we do not 
recommend anyone use that series.


* OMPI v5 is going to require a minimum of PRRTE v3.0.1, although OMPI 
v5 might include v3.1.0 (the precise release to be included in the OMPI 
v5 code remains TBD). Either PRRTE version will require a minimum of 
PMIx v4.2.4 to properly function


* OMPI v5 will require a minimum of PMIx v4.2.4 - OMPI might ship with 
the PMIx v5.0.0 release, but it will still work with v4.2.4


The problems you cite are unfortunately expected and part of the 
learning process. Part of the delay in releasing OMPI v5 has been a 
result of trying to resolve the logistics - hopefully, we are converging 
on a stable system.


Ralph




On Mar 18, 2023, at 4:24 PM, Orion Poplawski  wrote:

What are the current recommendations for compatible combinations of 
pmix/prrte/openmpi?


I'm looking into updating each of these in Fedora and running into a 
couple of issues.  Currently in Fedora Rawhide we have:


- pmix 4.1.2
- prrte 2.0.2
- openmpi 4.1.5

After updating to pmix 4.2.3 I see the following:

- openmpi programs fail to run with:

mca_base_component_repository_open: unable to open mca_pmix_ext3x: 
/usr/lib64/openmpi/lib/openmpi/mca_pmix_ext3x.so: undefined symbol: 
pmix_value_load (ignored)
[[3260,0],0] ORTE_ERROR_LOG: Not found in file ess_hnp_module.c at 
line 320

--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 opal_pmix_base_select failed
 --> Returned value Not found (-13) instead of ORTE_SUCCESS


- prrte 2.0.2 fails to build with:

make[2]: Entering directory 
'/builddir/build/BUILD/prrte-2.0.2/src/tools/prted'
/bin/sh ../../../libtool  --tag=CC   --mode=link gcc 
-DPRTE_CONFIGURE_USER="\"mockbuild\"" 
-DPRTE_CONFIGURE_HOST="\"2188cae5486f485888615d01f56cd6c9\"" 
-DPRTE_CONFIGURE_DATE="\"Fri Jan 20 00:00:00 UTC 2023\"" 
-DPRTE_BUILD_USER="\"$USER\"" 
-DPRTE_BUILD_HOST="\"${HOSTNAME:-`(hostname || uname -n) | sed 1q`}\"" 
-DPRTE_BUILD_DATE="\"`../../../config/getdate.sh`\"" 
-DPRTE_BUILD_CFLAGS="\"-DNDEBUG -O2 -flto=auto -ffat-lto-objects 
-fexceptions -g -grecord-gcc-switches -pipe 
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 
-fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 
-m64 -mtune=generic -fasynchronous-unwind-tables 
-fstack-clash-protection -fcf-protection -fno-omit-frame-pointer 
-mno-omit-leaf-frame-pointer -finline-functions -Wall 
-Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 
-Wp,-D_GLIBCXX_ASSERTIONS\"" 
-DPRTE_BUILD_CPPFLAGS="\"-iquote/builddir/build/BUILD/prrte-2.0.2 
-iquote/builddir/build/BUILD/prrte-2.0.2/src/include\"" 
-DPRTE_BUILD_LDFLAGS="\"-Wl,-z,relro -Wl,--as-needed -Wl,-z,now 
-specs=/usr/lib/rpm/redhat/redhat-hardened-ld 
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 
-specs=/usr/lib/rpm/redhat/redhat-package-notes\"" 
-DPRTE_BUILD_LIBS="\"-lm -levent_core -levent_pthreads -lpmix 
-lhwloc\"" -DPRTE_CC_ABSOLUTE="\"/usr/bin/gcc\"" 
-DPRTE_GREEK_VERSION="\"\"" -DPRTE_REPO_REV="\"v2.0.1-8-gaa57929\"" 

Re: [OMPI packagers] Recommended combinations of pmix/prrte/openmpi

2023-03-18 Thread Ralph Castain
There is nothing wrong with PMIx 4.2.3 - as noted in the issue you cite, the 
problem is that OMPI used an internal function instead of the public API. You 
fixed that, and I added some definitions to point the old internal functions 
back to the official public APIs to help others who also made that mistake.

Let's not further confuse people into thinking there is a PMIx release that 
should be shunned because OMPI incorrectly used a non-public function.


> On Mar 18, 2023, at 10:19 PM, Gilles Gouaillardet 
>  wrote:
> 
> Orion,
> 
> PMIx 4.2.3 should be avoided since there is a bad interaction with Open MPI 
> 4.1.x
> (as reported in https://github.com/open-mpi/ompi/issues/10416 and 
> https://github.com/open-mpi/ompi/issues/11478)
> 
> The issue occurs with this specific version: 4.2.2 works fine and the bug is 
> expected to be fixed in 4.2.4
> 
> 
> Cheers,
> 
> Gilles
> 
> 
> 
> On Sun, Mar 19, 2023 at 8:26 AM Orion Poplawski  > wrote:
>> What are the current recommendations for compatible combinations of 
>> pmix/prrte/openmpi?
>> 
>> I'm looking into updating each of these in Fedora and running into a 
>> couple of issues.  Currently in Fedora Rawhide we have:
>> 
>> - pmix 4.1.2
>> - prrte 2.0.2
>> - openmpi 4.1.5
>> 
>> After updating to pmix 4.2.3 I see the following:
>> 
>> - openmpi programs fail to run with:
>> 
>> mca_base_component_repository_open: unable to open mca_pmix_ext3x: 
>> /usr/lib64/openmpi/lib/openmpi/mca_pmix_ext3x.so: undefined symbol: 
>> pmix_value_load (ignored)
>> [[3260,0],0] ORTE_ERROR_LOG: Not found in file ess_hnp_module.c at line 320
>> --
>> It looks like orte_init failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during orte_init; some of which are due to configuration or
>> environment problems.  This failure appears to be an internal failure;
>> here's some additional information (which may only be relevant to an
>> Open MPI developer):
>> 
>>opal_pmix_base_select failed
>>--> Returned value Not found (-13) instead of ORTE_SUCCESS
>> 
>> 
>> - prrte 2.0.2 fails to build with:
>> 
>> make[2]: Entering directory 
>> '/builddir/build/BUILD/prrte-2.0.2/src/tools/prted'
>> /bin/sh ../../../libtool  --tag=CC   --mode=link gcc 
>> -DPRTE_CONFIGURE_USER="\"mockbuild\"" 
>> -DPRTE_CONFIGURE_HOST="\"2188cae5486f485888615d01f56cd6c9\"" 
>> -DPRTE_CONFIGURE_DATE="\"Fri Jan 20 00:00:00 UTC 2023\"" 
>> -DPRTE_BUILD_USER="\"$USER\"" 
>> -DPRTE_BUILD_HOST="\"${HOSTNAME:-`(hostname || uname -n) | sed 1q`}\"" 
>> -DPRTE_BUILD_DATE="\"`../../../config/getdate.sh`\"" 
>> -DPRTE_BUILD_CFLAGS="\"-DNDEBUG -O2 -flto=auto -ffat-lto-objects 
>> -fexceptions -g -grecord-gcc-switches -pipe 
>> -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong 
>> -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic 
>> -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection 
>> -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -finline-functions 
>> -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 
>> -Wp,-D_GLIBCXX_ASSERTIONS\"" 
>> -DPRTE_BUILD_CPPFLAGS="\"-iquote/builddir/build/BUILD/prrte-2.0.2 
>> -iquote/builddir/build/BUILD/prrte-2.0.2/src/include\"" 
>> -DPRTE_BUILD_LDFLAGS="\"-Wl,-z,relro -Wl,--as-needed -Wl,-z,now 
>> -specs=/usr/lib/rpm/redhat/redhat-hardened-ld 
>> -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 
>> -specs=/usr/lib/rpm/redhat/redhat-package-notes\"" 
>> -DPRTE_BUILD_LIBS="\"-lm -levent_core -levent_pthreads -lpmix -lhwloc\"" 
>> -DPRTE_CC_ABSOLUTE="\"/usr/bin/gcc\"" -DPRTE_GREEK_VERSION="\"\"" 
>> -DPRTE_REPO_REV="\"v2.0.1-8-gaa57929\"" -DPRTE_RELEASE_DATE="\"Feb 10, 
>> 2022\"" -DNDEBUG -O2 -flto=auto -ffat-lto-objects -fexceptions -g 
>> -grecord-gcc-switches -pipe 
>> -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong 
>> -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic 
>> -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection 
>> -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -finline-functions 
>> -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 
>> -Wp,-D_GLIBCXX_ASSERTIONS  -Wl,-z,relro -Wl,--as-needed -Wl,-z,now 
>> -specs=/usr/lib/rpm/redhat/redhat-hardened-ld 
>> -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 
>> -specs=/usr/lib/rpm/redhat/redhat-package-notes -o prted prted.o 
>> ../../../src/libprrte.la  -lm -levent_core 
>> -levent_pthreads -lpmix -lhwloc
>> libtool: link: gcc -DPRTE_CONFIGURE_USER=\"mockbuild\" 
>> -DPRTE_CONFIGURE_HOST=\"2188cae5486f485888615d01f56cd6c9\" 
>> "-DPRTE_CONFIGURE_DATE=\"Fri Jan 20 00:00:00 UTC 2023\"" 
>> -DPRTE_BUILD_USER=\"mockbuild\" 
>> -DPRTE_BUILD_HOST=\"2188cae5486f485888615d01f56cd6c9\" 
>> "-DPRTE_BUILD_DATE=\"Fri Jan 20 00:00:00 UTC 

Re: [OMPI packagers] Recommended combinations of pmix/prrte/openmpi

2023-03-18 Thread Gilles Gouaillardet
Orion,

PMIx 4.2.3 should be avoided since there is a bad interaction with Open MPI
4.1.x
(as reported in https://github.com/open-mpi/ompi/issues/10416 and
https://github.com/open-mpi/ompi/issues/11478)

The issue occurs with this specific version: 4.2.2 works fine and the bug
is expected to be fixed in 4.2.4


Cheers,

Gilles



On Sun, Mar 19, 2023 at 8:26 AM Orion Poplawski  wrote:

> What are the current recommendations for compatible combinations of
> pmix/prrte/openmpi?
>
> I'm looking into updating each of these in Fedora and running into a
> couple of issues.  Currently in Fedora Rawhide we have:
>
> - pmix 4.1.2
> - prrte 2.0.2
> - openmpi 4.1.5
>
> After updating to pmix 4.2.3 I see the following:
>
> - openmpi programs fail to run with:
>
> mca_base_component_repository_open: unable to open mca_pmix_ext3x:
> /usr/lib64/openmpi/lib/openmpi/mca_pmix_ext3x.so: undefined symbol:
> pmix_value_load (ignored)
> [[3260,0],0] ORTE_ERROR_LOG: Not found in file ess_hnp_module.c at line 320
> --
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
>opal_pmix_base_select failed
>--> Returned value Not found (-13) instead of ORTE_SUCCESS
>
>
> - prrte 2.0.2 fails to build with:
>
> make[2]: Entering directory
> '/builddir/build/BUILD/prrte-2.0.2/src/tools/prted'
> /bin/sh ../../../libtool  --tag=CC   --mode=link gcc
> -DPRTE_CONFIGURE_USER="\"mockbuild\""
> -DPRTE_CONFIGURE_HOST="\"2188cae5486f485888615d01f56cd6c9\""
> -DPRTE_CONFIGURE_DATE="\"Fri Jan 20 00:00:00 UTC 2023\""
> -DPRTE_BUILD_USER="\"$USER\""
> -DPRTE_BUILD_HOST="\"${HOSTNAME:-`(hostname || uname -n) | sed 1q`}\""
> -DPRTE_BUILD_DATE="\"`../../../config/getdate.sh`\""
> -DPRTE_BUILD_CFLAGS="\"-DNDEBUG -O2 -flto=auto -ffat-lto-objects
> -fexceptions -g -grecord-gcc-switches -pipe
> -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong
> -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic
> -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection
> -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -finline-functions
> -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3
> -Wp,-D_GLIBCXX_ASSERTIONS\""
> -DPRTE_BUILD_CPPFLAGS="\"-iquote/builddir/build/BUILD/prrte-2.0.2
> -iquote/builddir/build/BUILD/prrte-2.0.2/src/include\""
> -DPRTE_BUILD_LDFLAGS="\"-Wl,-z,relro -Wl,--as-needed -Wl,-z,now
> -specs=/usr/lib/rpm/redhat/redhat-hardened-ld
> -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1
> -specs=/usr/lib/rpm/redhat/redhat-package-notes\""
> -DPRTE_BUILD_LIBS="\"-lm -levent_core -levent_pthreads -lpmix -lhwloc\""
> -DPRTE_CC_ABSOLUTE="\"/usr/bin/gcc\"" -DPRTE_GREEK_VERSION="\"\""
> -DPRTE_REPO_REV="\"v2.0.1-8-gaa57929\"" -DPRTE_RELEASE_DATE="\"Feb 10,
> 2022\"" -DNDEBUG -O2 -flto=auto -ffat-lto-objects -fexceptions -g
> -grecord-gcc-switches -pipe
> -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong
> -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic
> -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection
> -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -finline-functions
> -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3
> -Wp,-D_GLIBCXX_ASSERTIONS  -Wl,-z,relro -Wl,--as-needed -Wl,-z,now
> -specs=/usr/lib/rpm/redhat/redhat-hardened-ld
> -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1
> -specs=/usr/lib/rpm/redhat/redhat-package-notes -o prted prted.o
> ../../../src/libprrte.la -lm -levent_core -levent_pthreads -lpmix -lhwloc
> libtool: link: gcc -DPRTE_CONFIGURE_USER=\"mockbuild\"
> -DPRTE_CONFIGURE_HOST=\"2188cae5486f485888615d01f56cd6c9\"
> "-DPRTE_CONFIGURE_DATE=\"Fri Jan 20 00:00:00 UTC 2023\""
> -DPRTE_BUILD_USER=\"mockbuild\"
> -DPRTE_BUILD_HOST=\"2188cae5486f485888615d01f56cd6c9\"
> "-DPRTE_BUILD_DATE=\"Fri Jan 20 00:00:00 UTC 2023\""
> "-DPRTE_BUILD_CFLAGS=\"-DNDEBUG -O2 -flto=auto -ffat-lto-objects
> -fexceptions -g -grecord-gcc-switches -pipe
> -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong
> -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic
> -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection
> -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -finline-functions
> -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3
> -Wp,-D_GLIBCXX_ASSERTIONS\""
> "-DPRTE_BUILD_CPPFLAGS=\"-iquote/builddir/build/BUILD/prrte-2.0.2
> -iquote/builddir/build/BUILD/prrte-2.0.2/src/include\""
> "-DPRTE_BUILD_LDFLAGS=\"-Wl,-z,relro -Wl,--as-needed -Wl,-z,now
> 

Re: [OMPI packagers] Recommended combinations of pmix/prrte/openmpi

2023-03-18 Thread Ralph Castain
First, sorry for all the confusion! I know it is a bit to parse as we 
transition from OMPI v4 and earlier (which had its own embedded runtime called 
ORTE) to OMPI v5, which includes a 3rd-party runtime called PRRTE. You never 
split ORTE out of those earlier OMPI versions and so these compatibility issues 
didn't previously exist. But now they will, starting with OMPI v5.

If it is any consolation, we are also trying to work our way thru all these 
combinations. It places constraints on multiple parties, especially as PRRTE 
and PMIx are used by a number of other parties not named OMPI. So none of us 
involved in the different projects enjoy total freedom, and we are still 
figuring out how to manage all the moving pieces.

Bottom line is that you don't need to be worrying about PRRTE for OMPI v4 and 
earlier. It is orthogonal to those releases.

We have previously identified a problem with OMPI's PMIx integration in the 
v4.x series and a patch has been filed for it. This is the source of the issue 
you cite. It isn't a PMIx problem, but rather a case of replacing a previously 
non-standard function call with the standardized macro. The patch (see 
https://github.com/open-mpi/ompi/pull/11472) fixes it for all PMIx releases 
(both looking backwards and going forwards).

Looking forward to OMPI v5:

* PRRTE 2.x is not something that will be supported. Frankly, we do not 
recommend anyone use that series.

* OMPI v5 is going to require a minimum of PRRTE v3.0.1, although OMPI v5 might 
include v3.1.0 (the precise release to be included in the OMPI v5 code remains 
TBD). Either PRRTE version will require a minimum of PMIx v4.2.4 to properly 
function

* OMPI v5 will require a minimum of PMIx v4.2.4 - OMPI might ship with the PMIx 
v5.0.0 release, but it will still work with v4.2.4

The problems you cite are unfortunately expected and part of the learning 
process. Part of the delay in releasing OMPI v5 has been a result of trying to 
resolve the logistics - hopefully, we are converging on a stable system.

Ralph



> On Mar 18, 2023, at 4:24 PM, Orion Poplawski  wrote:
> 
> What are the current recommendations for compatible combinations of 
> pmix/prrte/openmpi?
> 
> I'm looking into updating each of these in Fedora and running into a couple 
> of issues.  Currently in Fedora Rawhide we have:
> 
> - pmix 4.1.2
> - prrte 2.0.2
> - openmpi 4.1.5
> 
> After updating to pmix 4.2.3 I see the following:
> 
> - openmpi programs fail to run with:
> 
> mca_base_component_repository_open: unable to open mca_pmix_ext3x: 
> /usr/lib64/openmpi/lib/openmpi/mca_pmix_ext3x.so: undefined symbol: 
> pmix_value_load (ignored)
> [[3260,0],0] ORTE_ERROR_LOG: Not found in file ess_hnp_module.c at line 320
> --
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> 
>  opal_pmix_base_select failed
>  --> Returned value Not found (-13) instead of ORTE_SUCCESS
> 
> 
> - prrte 2.0.2 fails to build with:
> 
> make[2]: Entering directory 
> '/builddir/build/BUILD/prrte-2.0.2/src/tools/prted'
> /bin/sh ../../../libtool  --tag=CC   --mode=link gcc 
> -DPRTE_CONFIGURE_USER="\"mockbuild\"" 
> -DPRTE_CONFIGURE_HOST="\"2188cae5486f485888615d01f56cd6c9\"" 
> -DPRTE_CONFIGURE_DATE="\"Fri Jan 20 00:00:00 UTC 2023\"" 
> -DPRTE_BUILD_USER="\"$USER\"" -DPRTE_BUILD_HOST="\"${HOSTNAME:-`(hostname || 
> uname -n) | sed 1q`}\"" -DPRTE_BUILD_DATE="\"`../../../config/getdate.sh`\"" 
> -DPRTE_BUILD_CFLAGS="\"-DNDEBUG -O2 -flto=auto -ffat-lto-objects -fexceptions 
> -g -grecord-gcc-switches -pipe -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 
> -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 
> -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection 
> -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer 
> -finline-functions -Wall -Werror=format-security 
> -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS\"" 
> -DPRTE_BUILD_CPPFLAGS="\"-iquote/builddir/build/BUILD/prrte-2.0.2 
> -iquote/builddir/build/BUILD/prrte-2.0.2/src/include\"" 
> -DPRTE_BUILD_LDFLAGS="\"-Wl,-z,relro -Wl,--as-needed -Wl,-z,now 
> -specs=/usr/lib/rpm/redhat/redhat-hardened-ld 
> -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 
> -specs=/usr/lib/rpm/redhat/redhat-package-notes\"" -DPRTE_BUILD_LIBS="\"-lm 
> -levent_core -levent_pthreads -lpmix -lhwloc\"" 
> -DPRTE_CC_ABSOLUTE="\"/usr/bin/gcc\"" -DPRTE_GREEK_VERSION="\"\"" 
> -DPRTE_REPO_REV="\"v2.0.1-8-gaa57929\"" -DPRTE_RELEASE_DATE="\"Feb 10, 
> 2022\"" -DNDEBUG -O2 -flto=auto -ffat-lto-objects -fexceptions -g 
> -grecord-gcc-switches -pipe 

[OMPI packagers] Recommended combinations of pmix/prrte/openmpi

2023-03-18 Thread Orion Poplawski
What are the current recommendations for compatible combinations of 
pmix/prrte/openmpi?


I'm looking into updating each of these in Fedora and running into a 
couple of issues.  Currently in Fedora Rawhide we have:


- pmix 4.1.2
- prrte 2.0.2
- openmpi 4.1.5

After updating to pmix 4.2.3 I see the following:

- openmpi programs fail to run with:

mca_base_component_repository_open: unable to open mca_pmix_ext3x: 
/usr/lib64/openmpi/lib/openmpi/mca_pmix_ext3x.so: undefined symbol: 
pmix_value_load (ignored)

[[3260,0],0] ORTE_ERROR_LOG: Not found in file ess_hnp_module.c at line 320
--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_pmix_base_select failed
  --> Returned value Not found (-13) instead of ORTE_SUCCESS


- prrte 2.0.2 fails to build with:

make[2]: Entering directory 
'/builddir/build/BUILD/prrte-2.0.2/src/tools/prted'
/bin/sh ../../../libtool  --tag=CC   --mode=link gcc 
-DPRTE_CONFIGURE_USER="\"mockbuild\"" 
-DPRTE_CONFIGURE_HOST="\"2188cae5486f485888615d01f56cd6c9\"" 
-DPRTE_CONFIGURE_DATE="\"Fri Jan 20 00:00:00 UTC 2023\"" 
-DPRTE_BUILD_USER="\"$USER\"" 
-DPRTE_BUILD_HOST="\"${HOSTNAME:-`(hostname || uname -n) | sed 1q`}\"" 
-DPRTE_BUILD_DATE="\"`../../../config/getdate.sh`\"" 
-DPRTE_BUILD_CFLAGS="\"-DNDEBUG -O2 -flto=auto -ffat-lto-objects 
-fexceptions -g -grecord-gcc-switches -pipe 
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong 
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic 
-fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection 
-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -finline-functions 
-Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 
-Wp,-D_GLIBCXX_ASSERTIONS\"" 
-DPRTE_BUILD_CPPFLAGS="\"-iquote/builddir/build/BUILD/prrte-2.0.2 
-iquote/builddir/build/BUILD/prrte-2.0.2/src/include\"" 
-DPRTE_BUILD_LDFLAGS="\"-Wl,-z,relro -Wl,--as-needed -Wl,-z,now 
-specs=/usr/lib/rpm/redhat/redhat-hardened-ld 
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 
-specs=/usr/lib/rpm/redhat/redhat-package-notes\"" 
-DPRTE_BUILD_LIBS="\"-lm -levent_core -levent_pthreads -lpmix -lhwloc\"" 
-DPRTE_CC_ABSOLUTE="\"/usr/bin/gcc\"" -DPRTE_GREEK_VERSION="\"\"" 
-DPRTE_REPO_REV="\"v2.0.1-8-gaa57929\"" -DPRTE_RELEASE_DATE="\"Feb 10, 
2022\"" -DNDEBUG -O2 -flto=auto -ffat-lto-objects -fexceptions -g 
-grecord-gcc-switches -pipe 
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong 
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic 
-fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection 
-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -finline-functions 
-Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 
-Wp,-D_GLIBCXX_ASSERTIONS  -Wl,-z,relro -Wl,--as-needed -Wl,-z,now 
-specs=/usr/lib/rpm/redhat/redhat-hardened-ld 
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 
-specs=/usr/lib/rpm/redhat/redhat-package-notes -o prted prted.o 
../../../src/libprrte.la -lm -levent_core -levent_pthreads -lpmix -lhwloc
libtool: link: gcc -DPRTE_CONFIGURE_USER=\"mockbuild\" 
-DPRTE_CONFIGURE_HOST=\"2188cae5486f485888615d01f56cd6c9\" 
"-DPRTE_CONFIGURE_DATE=\"Fri Jan 20 00:00:00 UTC 2023\"" 
-DPRTE_BUILD_USER=\"mockbuild\" 
-DPRTE_BUILD_HOST=\"2188cae5486f485888615d01f56cd6c9\" 
"-DPRTE_BUILD_DATE=\"Fri Jan 20 00:00:00 UTC 2023\"" 
"-DPRTE_BUILD_CFLAGS=\"-DNDEBUG -O2 -flto=auto -ffat-lto-objects 
-fexceptions -g -grecord-gcc-switches -pipe 
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong 
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic 
-fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection 
-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -finline-functions 
-Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 
-Wp,-D_GLIBCXX_ASSERTIONS\"" 
"-DPRTE_BUILD_CPPFLAGS=\"-iquote/builddir/build/BUILD/prrte-2.0.2 
-iquote/builddir/build/BUILD/prrte-2.0.2/src/include\"" 
"-DPRTE_BUILD_LDFLAGS=\"-Wl,-z,relro -Wl,--as-needed -Wl,-z,now 
-specs=/usr/lib/rpm/redhat/redhat-hardened-ld 
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 
-specs=/usr/lib/rpm/redhat/redhat-package-notes\"" 
"-DPRTE_BUILD_LIBS=\"-lm -levent_core -levent_pthreads -lpmix -lhwloc\"" 
-DPRTE_CC_ABSOLUTE=\"/usr/bin/gcc\" -DPRTE_GREEK_VERSION=\"\" 
-DPRTE_REPO_REV=\"v2.0.1-8-gaa57929\" "-DPRTE_RELEASE_DATE=\"Feb 10, 
2022\"" -DNDEBUG -O2 -flto=auto -ffat-lto-objects -fexceptions -g 
-grecord-gcc-switches -pipe 
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1