Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value
George: Have a failure with your patch applied on PPC64/Linux and gcc-4.4.6: Making all in asm make[2]: Entering directory `/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/BLD/opal/asm' CC asm.lo In file included from /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/openmpi-1.9a1r32369/opal/asm/asm.c:21:0: /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/openmpi-1.9a1r32369/opal/include/opal/sys/atomic.h:374:9: error: conflicting types for 'opal_atomic_cmpset_rel_64' /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/openmpi-1.9a1r32369/opal/include/opal/sys/powerpc/atomic.h:214:19: note: previous definition of 'opal_atomic_cmpset_rel_64' was here /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/openmpi-1.9a1r32369/opal/include/opal/sys/atomic.h:374:9: warning: 'opal_atomic_cmpset_rel_64' used but never defined [enabled by default] make[2]: *** [asm.lo] Error 1 BTW: the patch applied cleanly to trunk except the portion changing opal/include/opal/sys/osx/atomic.h, which does not exist. -Paul On Thu, Jul 31, 2014 at 4:25 PM, George Bosilcawrote: > Awesome, thanks Paul. When the results will be in we will fix whatever is > needed for these less common architectures. > > George. > > > > On Thu, Jul 31, 2014 at 7:24 PM, Paul Hargrove wrote: > >> >> >> On Thu, Jul 31, 2014 at 4:22 PM, Paul Hargrove >> wrote: >> >>> >>> On Thu, Jul 31, 2014 at 4:13 PM, George Bosilca >>> wrote: >>> Paul, I know you have a pretty diverse range computers. Can you try to compile and run a "make check" with the following patch? >>> >>> >>> I will see what I can do for ARMv7, MIPS, PPC and IA64 (or whatever >>> subset of those is still supported). >>> The ARM and MIPS system are emulators and take forever to build OMPI. >>> However, I am not even sure how soon I'll get to start this testing. >>> >> >> >> Add SPARC (v8plus and v9) to that list. >> >> >> >> -- >> Paul H. Hargrove phhargr...@lbl.gov >> Future Technologies Group >> Computer and Data Sciences Department Tel: +1-510-495-2352 >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15411.php >> > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15412.php > -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error
Paul, the ibm test suite from the non public ompi-tests repository has several tests for usempif08. Cheers, Gilles On 2014/08/01 11:04, Paul Hargrove wrote: > Second related issue: > > Can/should examples/hello_usempif08.f90 be extended to use more of the > module such that it would have illustrated the bug found with Tetsuya's > example code? I don't know about MTT, but my scripts for testing a > release candidate includes running "make" in the example subdir. > >
Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error
Nevermind my suggestion to revise examples/hello_usempif08.f90 I've just determined that it is already sufficient to reproduce the problem. (So now I need to see what's wrong in my testing scripts). -Paul On Thu, Jul 31, 2014 at 7:04 PM, Paul Hargrovewrote: > Second related issue: > > Can/should examples/hello_usempif08.f90 be extended to use more of the > module such that it would have illustrated the bug found with Tetsuya's > example code? I don't know about MTT, but my scripts for testing a > release candidate includes running "make" in the example subdir. > > -Paul > > > On Thu, Jul 31, 2014 at 6:17 PM, Jeff Squyres (jsquyres) < > jsquy...@cisco.com> wrote: > >> Many thanks guys, this thread was most helpful in finding the fix. >> >> Paul H. nailed 80% of it on the head in the post where he identified the >> Makefile.am change. That Makefile.am change was due to three things: >> >> 1. Fixing a real bug (elsewhere in that commit) >> 2. My misunderstanding of how module files work in Fortran >> 3. The fact that gfortran, Absoft, and ifort *don't* require you to link >> in the .o files generated by modules, but apparently pgfortran *does* >> >> Blarg. >> >> That led to the duplicate symbol issue which Paul also encountered when >> he tried to fix the original problem, so I fixed that, too (which was a >> direct consequence of the first fix). >> >> Should be fixed in the trunk now; we tested with pgfortran on Craig >> Rasmussen's cluster (many thanks, Craig!). >> >> CMR is https://svn.open-mpi.org/trac/ompi/ticket/4519. >> >> >> >> >> On Jul 31, 2014, at 7:27 AM, Paul Hargrove wrote: >> >> > Gilles, >> > >> > >> > Just as you speculate, PGI is creating a _-suffixed reference to the >> module name: >> > >> > $ pgf90 -c test.f90 >> > $ nm -u test.o | grep f08 >> > U mpi_f08_sizeof_ >> > U mpi_f08_sizeof_mpi_sizeof_real_s_4_ >> > >> > >> > >> > You suggested the following work-around in a previous email: >> > >> > $ INST/bin/mpifort ../test.f >> ./BLD/ompi/mpi/fortran/use-mpi-f08/.libs/libforce_usempif08_internal_modules_to_be_built.a >> > >> > That works fine. That doesn't surprise me, because I had already >> identified that file as having been removed from libmpi_usempif08.so >> between 1.8.1 and 1.8.2rc2. It includes the symbol for the module names >> plus trailing '_'. >> > >> > -Paul >> > >> > >> > On Thu, Jul 31, 2014 at 1:07 AM, Gilles Gouaillardet < >> gilles.gouaillar...@iferc.org> wrote: >> > Paul, >> > >> > in .../ompi/mpi/fortran/use-mpi-f08, can you create the following dumb >> > test program, >> > compile and run nm | grep f08 on the object : >> > >> > $ cat foo.f90 >> > program foo >> > use mpi_f08_sizeof >> > >> > implicit none >> > >> > real :: x >> > integer :: size, ierror >> > >> > call MPI_Sizeof_real_s_4(x, size, ierror) >> > >> > stop >> > end program >> > >> > >> > with intel compiler : >> > $ ifort -c foo.f90 >> > $ nm foo.o | grep f08 >> > U mpi_f08_sizeof_mp_mpi_sizeof_real_s_4_ >> > >> > i am wondering whether PGI compiler adds an additional undefined >> > reference to mpi_f08_sizeof_ ... >> > >> > Cheers, >> > >> > Gilles >> > >> > ___ >> > devel mailing list >> > de...@open-mpi.org >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15390.php >> > >> > >> > >> > -- >> > Paul H. Hargrove phhargr...@lbl.gov >> > Future Technologies Group >> > Computer and Data Sciences Department Tel: +1-510-495-2352 >> > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> > ___ >> > devel mailing list >> > de...@open-mpi.org >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15391.php >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15415.php >> > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Future Technologies Group > Computer and Data Sciences Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error
Related question: If I am understanding PGI's list of fixed-TPRs (bugs) then it looks like one (certainly not the only) difference between 13.x and 14.1 is a fix to a problem with PROCEDURE and zero-argument subroutines. As it happens, the configure probe for PROCEEDURE is a zero-argument subroutine, but the "real" usage in OMPI is *not* zero-argument. This opens the possibility (not certainty) that PROCEDURE may work as required in PGI-13.x, in which case only a "more accurate" configure test would be required to restore F08 support for PGI-13 (present in 1.8.1 and lacking in 1.8.2rc2). So, the most important question first: Does anybody care about PGI-13 (cannot use PGI-14 for some reason other than cost of license)? -Paul On Thu, Jul 31, 2014 at 6:17 PM, Jeff Squyres (jsquyres)wrote: > Many thanks guys, this thread was most helpful in finding the fix. > > Paul H. nailed 80% of it on the head in the post where he identified the > Makefile.am change. That Makefile.am change was due to three things: > > 1. Fixing a real bug (elsewhere in that commit) > 2. My misunderstanding of how module files work in Fortran > 3. The fact that gfortran, Absoft, and ifort *don't* require you to link > in the .o files generated by modules, but apparently pgfortran *does* > > Blarg. > > That led to the duplicate symbol issue which Paul also encountered when he > tried to fix the original problem, so I fixed that, too (which was a direct > consequence of the first fix). > > Should be fixed in the trunk now; we tested with pgfortran on Craig > Rasmussen's cluster (many thanks, Craig!). > > CMR is https://svn.open-mpi.org/trac/ompi/ticket/4519. > > > > > On Jul 31, 2014, at 7:27 AM, Paul Hargrove wrote: > > > Gilles, > > > > > > Just as you speculate, PGI is creating a _-suffixed reference to the > module name: > > > > $ pgf90 -c test.f90 > > $ nm -u test.o | grep f08 > > U mpi_f08_sizeof_ > > U mpi_f08_sizeof_mpi_sizeof_real_s_4_ > > > > > > > > You suggested the following work-around in a previous email: > > > > $ INST/bin/mpifort ../test.f > ./BLD/ompi/mpi/fortran/use-mpi-f08/.libs/libforce_usempif08_internal_modules_to_be_built.a > > > > That works fine. That doesn't surprise me, because I had already > identified that file as having been removed from libmpi_usempif08.so > between 1.8.1 and 1.8.2rc2. It includes the symbol for the module names > plus trailing '_'. > > > > -Paul > > > > > > On Thu, Jul 31, 2014 at 1:07 AM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: > > Paul, > > > > in .../ompi/mpi/fortran/use-mpi-f08, can you create the following dumb > > test program, > > compile and run nm | grep f08 on the object : > > > > $ cat foo.f90 > > program foo > > use mpi_f08_sizeof > > > > implicit none > > > > real :: x > > integer :: size, ierror > > > > call MPI_Sizeof_real_s_4(x, size, ierror) > > > > stop > > end program > > > > > > with intel compiler : > > $ ifort -c foo.f90 > > $ nm foo.o | grep f08 > > U mpi_f08_sizeof_mp_mpi_sizeof_real_s_4_ > > > > i am wondering whether PGI compiler adds an additional undefined > > reference to mpi_f08_sizeof_ ... > > > > Cheers, > > > > Gilles > > > > ___ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15390.php > > > > > > > > -- > > Paul H. Hargrove phhargr...@lbl.gov > > Future Technologies Group > > Computer and Data Sciences Department Tel: +1-510-495-2352 > > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > > ___ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15391.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15415.php > -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error
Many thanks guys, this thread was most helpful in finding the fix. Paul H. nailed 80% of it on the head in the post where he identified the Makefile.am change. That Makefile.am change was due to three things: 1. Fixing a real bug (elsewhere in that commit) 2. My misunderstanding of how module files work in Fortran 3. The fact that gfortran, Absoft, and ifort *don't* require you to link in the .o files generated by modules, but apparently pgfortran *does* Blarg. That led to the duplicate symbol issue which Paul also encountered when he tried to fix the original problem, so I fixed that, too (which was a direct consequence of the first fix). Should be fixed in the trunk now; we tested with pgfortran on Craig Rasmussen's cluster (many thanks, Craig!). CMR is https://svn.open-mpi.org/trac/ompi/ticket/4519. On Jul 31, 2014, at 7:27 AM, Paul Hargrovewrote: > Gilles, > > > Just as you speculate, PGI is creating a _-suffixed reference to the module > name: > > $ pgf90 -c test.f90 > $ nm -u test.o | grep f08 > U mpi_f08_sizeof_ > U mpi_f08_sizeof_mpi_sizeof_real_s_4_ > > > > You suggested the following work-around in a previous email: > > $ INST/bin/mpifort ../test.f > ./BLD/ompi/mpi/fortran/use-mpi-f08/.libs/libforce_usempif08_internal_modules_to_be_built.a > > That works fine. That doesn't surprise me, because I had already identified > that file as having been removed from libmpi_usempif08.so between 1.8.1 and > 1.8.2rc2. It includes the symbol for the module names plus trailing '_'. > > -Paul > > > On Thu, Jul 31, 2014 at 1:07 AM, Gilles Gouaillardet > wrote: > Paul, > > in .../ompi/mpi/fortran/use-mpi-f08, can you create the following dumb > test program, > compile and run nm | grep f08 on the object : > > $ cat foo.f90 > program foo > use mpi_f08_sizeof > > implicit none > > real :: x > integer :: size, ierror > > call MPI_Sizeof_real_s_4(x, size, ierror) > > stop > end program > > > with intel compiler : > $ ifort -c foo.f90 > $ nm foo.o | grep f08 > U mpi_f08_sizeof_mp_mpi_sizeof_real_s_4_ > > i am wondering whether PGI compiler adds an additional undefined > reference to mpi_f08_sizeof_ ... > > Cheers, > > Gilles > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15390.php > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Future Technologies Group > Computer and Data Sciences Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15391.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
[hwloc-devel] Create success (hwloc git dev-170-gabee241)
Creating nightly hwloc snapshot git tarball was a success. Snapshot: hwloc dev-170-gabee241 Start time: Thu Jul 31 21:01:01 EDT 2014 End time: Thu Jul 31 21:02:31 EDT 2014 Your friendly daemon, Cyrador
Re: [OMPI devel] Further questions about BTL OPAL move...
On Jul 31, 2014, at 3:41 PM, George Bosilcawrote: > > On Jul 31, 2014, at 18:26 , Jeff Squyres (jsquyres) > wrote: > >> George -- >> >> Got 2 questions for ya: >> >> 1. I see some orte_* specific symbols/functions in ompi_mpi_init.c. Was >> that intentional? Shouldn’t that stuff be in the RTE framework, or some >> such? > > Good catch. Fixed in r32384. > >> 2. In tracking down some stuff relating to process names, it looks like >> names are now setting set by ompi/proc/proc.c (i.e., it makes a call to >> opal_proc_local_set(...)). And this happens after the RTE is initialized. >> Is that right? Seems a little weird to me — shouldn't the RTE be the one >> that sets the process names? > > In my view the RTE should stay outside any local setting of the process. The > RTE role is to move the info around, not to force it on everybody else. When > multiple layers will use the BTL (and thus the OPAL level proc), we will have > to figure out who will be responsible for setting the data into the OPAL > level proc. Meanwhile, OMPI is the only one using this proc. Unot exactly. The dstore and pmi frameworks depend on it, and that the name matches the one in the RTE. So ORTE is going to have to set that proc object, and I imagine STCI will too. > > George. > >> >> Thanks! >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15407.php > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15408.php
Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value
Awesome, thanks Paul. When the results will be in we will fix whatever is needed for these less common architectures. George. On Thu, Jul 31, 2014 at 7:24 PM, Paul Hargrovewrote: > > > On Thu, Jul 31, 2014 at 4:22 PM, Paul Hargrove wrote: > >> >> On Thu, Jul 31, 2014 at 4:13 PM, George Bosilca >> wrote: >> >>> Paul, I know you have a pretty diverse range computers. Can you try to >>> compile and run a “make check” with the following patch? >> >> >> I will see what I can do for ARMv7, MIPS, PPC and IA64 (or whatever >> subset of those is still supported). >> The ARM and MIPS system are emulators and take forever to build OMPI. >> However, I am not even sure how soon I'll get to start this testing. >> > > > Add SPARC (v8plus and v9) to that list. > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Future Technologies Group > Computer and Data Sciences Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15411.php >
Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value
On Thu, Jul 31, 2014 at 4:22 PM, Paul Hargrovewrote: > > On Thu, Jul 31, 2014 at 4:13 PM, George Bosilca > wrote: > >> Paul, I know you have a pretty diverse range computers. Can you try to >> compile and run a "make check" with the following patch? > > > I will see what I can do for ARMv7, MIPS, PPC and IA64 (or whatever subset > of those is still supported). > The ARM and MIPS system are emulators and take forever to build OMPI. > However, I am not even sure how soon I'll get to start this testing. > Add SPARC (v8plus and v9) to that list. -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value
On Thu, Jul 31, 2014 at 4:13 PM, George Bosilcawrote: > Paul, I know you have a pretty diverse range computers. Can you try to > compile and run a "make check" with the following patch? I will see what I can do for ARMv7, MIPS, PPC and IA64 (or whatever subset of those is still supported). The ARM and MIPS system are emulators and take forever to build OMPI. However, I am not even sure how soon I'll get to start this testing. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value
All, Here is the patch that change the meaning of the atomics to make them always return the previous value (similar to sync_fetch_and_<*>). I tested this with the following atomics: OS X, gcc style intrinsics and AMD64. I did not change the base assembly files used when GCC style assembly operations are not supported. If someone feels like fixing them, feel free. Paul, I know you have a pretty diverse range computers. Can you try to compile and run a “make check” with the following patch? George. atomics.patch Description: Binary data On Jul 30, 2014, at 15:21 , Nathan Hjelmwrote: > > That is what I would prefer. I was trying to not disturb things too > much :). Please bring the changes over! > > -Nathan > > On Wed, Jul 30, 2014 at 03:18:44PM -0400, George Bosilca wrote: >> Why do you want to add new versions? This will lead to having two, almost >> identical, sets of atomics that are conceptually equivalent but different >> in terms of code. And we will have to maintained both! >> I did a similar change in a fork of OPAL in another project but instead of >> adding another flavor of atomics, I completely replaced the available ones >> with a set returning the old value. I can bring the code over. >> George. >> >> On Tue, Jul 29, 2014 at 5:29 PM, Paul Hargrove wrote: >> >> On Tue, Jul 29, 2014 at 2:10 PM, Nathan Hjelm wrote: >> >> Is there a reason why the >> current implementations of opal atomics (add, cmpset) do not return >> the >> old value? >> >> Because some CPUs don't implement such an atomic instruction? >> >> On any CPU one *can* certainly synthesize the desired operation with an >> added read before the compare-and-swap to return a value that was >> present at some time before a failed cmpset. That is almost certainly >> sufficient for your purposes. However, the added load makes it >> (marginally) more expensive on some CPUs that only have the native >> equivalent of gcc's __sync_bool_compare_and_swap(). >> >> -Paul >> -- >> Paul H. Hargrove phhargr...@lbl.gov >> Future Technologies Group >> Computer and Data Sciences Department Tel: +1-510-495-2352 >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15328.php > >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15369.php > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15370.php
[OMPI devel] Further questions about BTL OPAL move...
George -- Got 2 questions for ya: 1. I see some orte_* specific symbols/functions in ompi_mpi_init.c. Was that intentional? Shouldn't that stuff be in the RTE framework, or some such? 2. In tracking down some stuff relating to process names, it looks like names are now setting set by ompi/proc/proc.c (i.e., it makes a call to opal_proc_local_set(...)). And this happens after the RTE is initialized. Is that right? Seems a little weird to me -- shouldn't the RTE be the one that sets the process names? Thanks! -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] mca_PROJECT_FRAMEWORK_COMPONENT_symbol vs. mca_FRAMEWORK_COMPONENT_symbol
Yeah, I forgot that pure ANSI C doesn't really have namespaces, other than to fully qualify modules and variables. Bummer. Makes writing large, maintainable middleware more difficult. -Original Message- From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Kenneth A. Lloyd Sent: Thursday, July 31, 2014 6:04 AM To: 'Open MPI Developers' Subject: Re: [OMPI devel] mca_PROJECT_FRAMEWORK_COMPONENT_symbol vs. mca_FRAMEWORK_COMPONENT_symbol Doesn't namespacing obviate the need for this convoluted identifier scheme? See, for example, UML package import and include behaviors. -Original Message- From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Dave Goodell (dgoodell) Sent: Wednesday, July 30, 2014 3:35 PM To: Open MPI Developers Subject: [OMPI devel] mca_PROJECT_FRAMEWORK_COMPONENT_symbol vs. mca_FRAMEWORK_COMPONENT_symbol Jeff and I were talking about some namespacing issues that have come up in the recent BTL move from OMPI to OPAL. AFAIK, the current system for namespacing external symbols is to name them "mca_FRAMEWORK_COMPONENT_symbol" (e.g., "mca_btl_tcp_add_procs" in the tcp BTL). Similarly, the DSO for the component is named "mca_FRAMEWORK_COMPONENT.so" (e.g., "mca_btl_tcp.so"). Jeff asserted that the eventual goal is to move to a system where all MCA frameworks/components are also prefixed by the project name. So the above examples become "mca_ompi_btl_tcp_add_procs" and "mca_ompi_btl_tcp.so". Does anyone actually care about pursuing this goal? I ask because if nobody wants to pursue the goal of adding project names to namespaces then I already have an easy solution to most of our namespacing problems. OTOH, if someone does wish to pursue that goal, then I have a namespace-related RFC that I would like to propose (in a subsequent email). -Dave ___ devel mailing list de...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2014/07/15371.php - No virus found in this message. Checked by AVG - www.avg.com Version: 2014.0.4716 / Virus Database: 3986/7949 - Release Date: 07/30/14 ___ devel mailing list de...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2014/07/15392.php - No virus found in this message. Checked by AVG - www.avg.com Version: 2014.0.4716 / Virus Database: 3986/7951 - Release Date: 07/30/14
Re: [OMPI devel] RFC: job size info in OPAL
Fair enough - yeah, that is an issue I've been avoiding :-) On Jul 31, 2014, at 9:14 AM, Nathan Hjelmwrote: > > This approach will work now but we need to start thinking about how we > want to support multiple simultaneous btl users. Does each user call > add_procs with a single module (or set of modules) or does each user > call btl_component_init and get their own module? If we do the latter > then it might make sense to add a max_procs argument to the > btl_component_init. Keep in mind we need to change the > btl_component_init interface anyway because the threading arguments no > longer make sense in their current form. > > -Nathan > > On Thu, Jul 31, 2014 at 09:04:09AM -0700, Ralph Castain wrote: >> Like I said, why don't we just do the following: >> >>> I'd like to suggest an alternative solution. A BTL can exploit whatever >>> data it wants, but should first test if the data is available. If the data >>> is *required*, then the BTL gracefully disqualifies itself. If the data is >>> *desirable* for optimization, then the BTL writer (if they choose) can >>> include an alternate path that doesn't do the optimization if the data >>> isn't available. >> >> Seems like this should resolve the disagreement in a way that meets >> everyone's need. It basically is an attribute approach, but not requiring >> modification of the BTL interface. >> >> >> On Jul 31, 2014, at 8:26 AM, Pritchard Jr., Howard wrote: >> >>> Hi George, >>> >>> The ompi_process_info.num_procs thing that seems to have been an object >>> of some contention yesterday. >>> >>> The ugni use of this is cloned off of the way I designed the mpich netmod. >>> Leveraging off size of the job was an easy way to scale the mailbox size. >>> >>> If I'd been asked to have the netmod work in a context like it appears we >>> may want to be eventually using BTLs - not just within ompi but for other >>> things, I'd have worked with Darius (if still in mpich world) on changing >>> the netmod initialization >>> method to allow for an optional attributes struct to be passed into the >>> init >>> method to give hints about how many connections may need to be established, >>> etc. >>> >>> For the GNI BTL - the way its currently designed - if you are only expecting >>> to use it for a limited number of connections, then you want to initialize >>> for big mailboxes (IBer's can think many large buffers posted as RX WQEs). >>> But for very large jobs, with possibly highly connected communication >>> pattern, >>> you want very small mailboxes. >>> >>> Howard >>> >>> >>> -Original Message- >>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George Bosilca >>> Sent: Thursday, July 31, 2014 9:09 AM >>> To: Open MPI Developers >>> Subject: Re: [OMPI devel] RFC: job size info in OPAL >>> >>> What is your definition of "global job size"? >>> >>> George. >>> >>> On Jul 31, 2014, at 11:06 , Pritchard Jr., Howard wrote: >>> Hi Folks, I think given the way we want to use the btl's in lower levels like opal, it is pretty disgusting for a btl to need to figure out on its own something like a "global job size". That's not its business. Can't we add some attributes to the component's initialization method that provides hints for how to allocate resources it needs to provide its functionality? I'll see if there's something clever that can be done in ugni for now. I can always add in a hack to probe the apps placement info file and scale the smsg blocks by number of nodes rather than number of ranks. Howard -Original Message- From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm Sent: Thursday, July 31, 2014 8:58 AM To: Open MPI Developers Subject: Re: [OMPI devel] RFC: job size info in OPAL +2^1000 This information is absolutely necessary at this point. If someone has a better solution they can provide it as an alternative RFC. Until then this is how it should be done... Otherwise we loose uGNI support on the trunk. Because we ARE NOT going to remove the mailbox size optimization. -Nathan On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote: > WHAT: Should we make the job size (i.e., initial number of procs) > available in OPAL? > > WHY: At least 2 BTLs are using this info (*more below) > > WHERE: usnic and ugni > > TIMEOUT: there's already been some inflammatory emails about this; > let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014 > > MORE DETAIL: > > This is an open question. We *have* the information at the time that the > BTLs are initialized: do we allow that information to go down to OPAL? > > Ralph added this info down in OPAL in r32355, but
Re: [OMPI devel] RFC: job size info in OPAL
This approach will work now but we need to start thinking about how we want to support multiple simultaneous btl users. Does each user call add_procs with a single module (or set of modules) or does each user call btl_component_init and get their own module? If we do the latter then it might make sense to add a max_procs argument to the btl_component_init. Keep in mind we need to change the btl_component_init interface anyway because the threading arguments no longer make sense in their current form. -Nathan On Thu, Jul 31, 2014 at 09:04:09AM -0700, Ralph Castain wrote: > Like I said, why don't we just do the following: > > > I'd like to suggest an alternative solution. A BTL can exploit whatever > > data it wants, but should first test if the data is available. If the data > > is *required*, then the BTL gracefully disqualifies itself. If the data is > > *desirable* for optimization, then the BTL writer (if they choose) can > > include an alternate path that doesn't do the optimization if the data > > isn't available. > > Seems like this should resolve the disagreement in a way that meets > everyone's need. It basically is an attribute approach, but not requiring > modification of the BTL interface. > > > On Jul 31, 2014, at 8:26 AM, Pritchard Jr., Howardwrote: > > > Hi George, > > > > The ompi_process_info.num_procs thing that seems to have been an object > > of some contention yesterday. > > > > The ugni use of this is cloned off of the way I designed the mpich netmod. > > Leveraging off size of the job was an easy way to scale the mailbox size. > > > > If I'd been asked to have the netmod work in a context like it appears we > > may want to be eventually using BTLs - not just within ompi but for other > > things, I'd have worked with Darius (if still in mpich world) on changing > > the netmod initialization > > method to allow for an optional attributes struct to be passed into the > > init > > method to give hints about how many connections may need to be established, > > etc. > > > > For the GNI BTL - the way its currently designed - if you are only expecting > > to use it for a limited number of connections, then you want to initialize > > for big mailboxes (IBer's can think many large buffers posted as RX WQEs). > > But for very large jobs, with possibly highly connected communication > > pattern, > > you want very small mailboxes. > > > > Howard > > > > > > -Original Message- > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George Bosilca > > Sent: Thursday, July 31, 2014 9:09 AM > > To: Open MPI Developers > > Subject: Re: [OMPI devel] RFC: job size info in OPAL > > > > What is your definition of "global job size"? > > > > George. > > > > On Jul 31, 2014, at 11:06 , Pritchard Jr., Howard wrote: > > > >> Hi Folks, > >> > >> I think given the way we want to use the btl's in lower levels like > >> opal, it is pretty disgusting for a btl to need to figure out on its > >> own something like a "global job size". That's not its business. > >> Can't we add some attributes to the component's initialization method > >> that provides hints for how to allocate resources it needs to provide its > >> functionality? > >> > >> I'll see if there's something clever that can be done in ugni for now. > >> I can always add in a hack to probe the apps placement info file and > >> scale the smsg blocks by number of nodes rather than number of ranks. > >> > >> Howard > >> > >> > >> -Original Message- > >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan > >> Hjelm > >> Sent: Thursday, July 31, 2014 8:58 AM > >> To: Open MPI Developers > >> Subject: Re: [OMPI devel] RFC: job size info in OPAL > >> > >> > >> +2^1000 > >> > >> This information is absolutely necessary at this point. If someone has a > >> better solution they can provide it as an alternative RFC. Until then this > >> is how it should be done... Otherwise we loose uGNI support on the trunk. > >> Because we ARE NOT going to remove the mailbox size optimization. > >> > >> -Nathan > >> > >> On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote: > >>> WHAT: Should we make the job size (i.e., initial number of procs) > >>> available in OPAL? > >>> > >>> WHY: At least 2 BTLs are using this info (*more below) > >>> > >>> WHERE: usnic and ugni > >>> > >>> TIMEOUT: there's already been some inflammatory emails about this; > >>> let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014 > >>> > >>> MORE DETAIL: > >>> > >>> This is an open question. We *have* the information at the time that the > >>> BTLs are initialized: do we allow that information to go down to OPAL? > >>> > >>> Ralph added this info down in OPAL in r32355, but George reverted it in > >>> r32361. > >>> > >>> Points for: YES, WE SHOULD > >>> +++ 2 BTLs were using it (usinc, ugni) Other RTE job-related info are > >>> +++
Re: [OMPI devel] RFC: job size info in OPAL
Like I said, why don't we just do the following: > I'd like to suggest an alternative solution. A BTL can exploit whatever data > it wants, but should first test if the data is available. If the data is > *required*, then the BTL gracefully disqualifies itself. If the data is > *desirable* for optimization, then the BTL writer (if they choose) can > include an alternate path that doesn't do the optimization if the data isn't > available. Seems like this should resolve the disagreement in a way that meets everyone's need. It basically is an attribute approach, but not requiring modification of the BTL interface. On Jul 31, 2014, at 8:26 AM, Pritchard Jr., Howardwrote: > Hi George, > > The ompi_process_info.num_procs thing that seems to have been an object > of some contention yesterday. > > The ugni use of this is cloned off of the way I designed the mpich netmod. > Leveraging off size of the job was an easy way to scale the mailbox size. > > If I'd been asked to have the netmod work in a context like it appears we > may want to be eventually using BTLs - not just within ompi but for other > things, I'd have worked with Darius (if still in mpich world) on changing the > netmod initialization > method to allow for an optional attributes struct to be passed into the init > method to give hints about how many connections may need to be established, > etc. > > For the GNI BTL - the way its currently designed - if you are only expecting > to use it for a limited number of connections, then you want to initialize > for big mailboxes (IBer's can think many large buffers posted as RX WQEs). > But for very large jobs, with possibly highly connected communication pattern, > you want very small mailboxes. > > Howard > > > -Original Message- > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George Bosilca > Sent: Thursday, July 31, 2014 9:09 AM > To: Open MPI Developers > Subject: Re: [OMPI devel] RFC: job size info in OPAL > > What is your definition of "global job size"? > > George. > > On Jul 31, 2014, at 11:06 , Pritchard Jr., Howard wrote: > >> Hi Folks, >> >> I think given the way we want to use the btl's in lower levels like >> opal, it is pretty disgusting for a btl to need to figure out on its >> own something like a "global job size". That's not its business. >> Can't we add some attributes to the component's initialization method >> that provides hints for how to allocate resources it needs to provide its >> functionality? >> >> I'll see if there's something clever that can be done in ugni for now. >> I can always add in a hack to probe the apps placement info file and >> scale the smsg blocks by number of nodes rather than number of ranks. >> >> Howard >> >> >> -Original Message- >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan >> Hjelm >> Sent: Thursday, July 31, 2014 8:58 AM >> To: Open MPI Developers >> Subject: Re: [OMPI devel] RFC: job size info in OPAL >> >> >> +2^1000 >> >> This information is absolutely necessary at this point. If someone has a >> better solution they can provide it as an alternative RFC. Until then this >> is how it should be done... Otherwise we loose uGNI support on the trunk. >> Because we ARE NOT going to remove the mailbox size optimization. >> >> -Nathan >> >> On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote: >>> WHAT: Should we make the job size (i.e., initial number of procs) available >>> in OPAL? >>> >>> WHY: At least 2 BTLs are using this info (*more below) >>> >>> WHERE: usnic and ugni >>> >>> TIMEOUT: there's already been some inflammatory emails about this; >>> let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014 >>> >>> MORE DETAIL: >>> >>> This is an open question. We *have* the information at the time that the >>> BTLs are initialized: do we allow that information to go down to OPAL? >>> >>> Ralph added this info down in OPAL in r32355, but George reverted it in >>> r32361. >>> >>> Points for: YES, WE SHOULD >>> +++ 2 BTLs were using it (usinc, ugni) Other RTE job-related info are >>> +++ already in OPAL (num local ranks, local rank) >>> >>> Points for: NO, WE SHOULD NOT >>> --- What exactly is this number (e.g., num currently-connected procs?), and >>> when is it updated? >>> --- We need to precisely delineate what belongs in OPAL vs. >>> above-OPAL >>> >>> FWIW: here's how ompi_process_info.num_procs was used before the BTL move >>> down to OPAL: >>> >>> - usnic: for a minor latency optimization / sizing of a shared >>> receive buffer queue length, and for the initial size of a peer >>> lookup hash >>> - ugni: to determine the size of the per-peer buffers used for >>> send/recv communication >>> >>> -- >>> Jeff Squyres >>> jsquy...@cisco.com >>> For corporate legal information go to: >>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>> >>>
Re: [OMPI devel] RFC: job size info in OPAL
I do not like the fact that add_procs is called with every proc in the MPI_COMM_WORLD. That needs to change, so, I will not rely on the number of procs being added being the same as the world or universe size. -Nathan On Thu, Jul 31, 2014 at 09:22:00AM -0600, George Bosilca wrote: >I definitively think you misunderstood this scope of this RFC. The >information that is so important to you to configure the mailbox size is >available to you when you need it. This information is made available by >the PML through the call to add_procs, which comes with all the procs in >the MPI_COMM_WORLD. So, ugni doesn't need anything more than it is >available today. [This is of course under the assumption that someone >clean the BTL and remove the usage of MPI_COMM_WORLD.] > >The real scope of this RFC is to move this information before that in >order to allow the BTLs to have access to some possible number of >processes between the call to btl_open and the call to btl_all_proc (in >other words during btl_init). > > George. > >PS: here is the patch that fixes all issues in ugni. > >On Jul 31, 2014, at 10:58 , Nathan Hjelmwrote: > >> >> +2^1000 >> >> This information is absolutely necessary at this point. If someone has a >> better solution they can provide it as an alternative RFC. Until then >> this is how it should be done... Otherwise we loose uGNI support on the >> trunk. Because we ARE NOT going to remove the mailbox size optimization. >> >> -Nathan >> >> On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote: >>> WHAT: Should we make the job size (i.e., initial number of procs) >available in OPAL? >>> >>> WHY: At least 2 BTLs are using this info (*more below) >>> >>> WHERE: usnic and ugni >>> >>> TIMEOUT: there's already been some inflammatory emails about this; >let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014 >>> >>> MORE DETAIL: >>> >>> This is an open question. We *have* the information at the time that >the BTLs are initialized: do we allow that information to go down to OPAL? >>> >>> Ralph added this info down in OPAL in r32355, but George reverted it in >r32361. >>> >>> Points for: YES, WE SHOULD >>> +++ 2 BTLs were using it (usinc, ugni) >>> +++ Other RTE job-related info are already in OPAL (num local ranks, >local rank) >>> >>> Points for: NO, WE SHOULD NOT >>> --- What exactly is this number (e.g., num currently-connected procs?), >and when is it updated? >>> --- We need to precisely delineate what belongs in OPAL vs. above-OPAL >>> >>> FWIW: here's how ompi_process_info.num_procs was used before the BTL >move down to OPAL: >>> >>> - usnic: for a minor latency optimization / sizing of a shared receive >buffer queue length, and for the initial size of a peer lookup hash >>> - ugni: to determine the size of the per-peer buffers used for >send/recv communication >>> >>> -- >>> Jeff Squyres >>> jsquy...@cisco.com >>> For corporate legal information go to: >http://www.cisco.com/web/about/doing_business/legal/cri/ >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >http://www.open-mpi.org/community/lists/devel/2014/07/15373.php >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >http://www.open-mpi.org/community/lists/devel/2014/07/15394.php > >___ >devel mailing list >de...@open-mpi.org >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >Link to this post: >http://www.open-mpi.org/community/lists/devel/2014/07/15399.php pgpo6WjkLZPnT.pgp Description: PGP signature
Re: [OMPI devel] RFC: job size info in OPAL
Hi George, The ompi_process_info.num_procs thing that seems to have been an object of some contention yesterday. The ugni use of this is cloned off of the way I designed the mpich netmod. Leveraging off size of the job was an easy way to scale the mailbox size. If I'd been asked to have the netmod work in a context like it appears we may want to be eventually using BTLs - not just within ompi but for other things, I'd have worked with Darius (if still in mpich world) on changing the netmod initialization method to allow for an optional attributes struct to be passed into the init method to give hints about how many connections may need to be established, etc. For the GNI BTL - the way its currently designed - if you are only expecting to use it for a limited number of connections, then you want to initialize for big mailboxes (IBer's can think many large buffers posted as RX WQEs). But for very large jobs, with possibly highly connected communication pattern, you want very small mailboxes. Howard -Original Message- From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George Bosilca Sent: Thursday, July 31, 2014 9:09 AM To: Open MPI Developers Subject: Re: [OMPI devel] RFC: job size info in OPAL What is your definition of "global job size"? George. On Jul 31, 2014, at 11:06 , Pritchard Jr., Howardwrote: > Hi Folks, > > I think given the way we want to use the btl's in lower levels like > opal, it is pretty disgusting for a btl to need to figure out on its > own something like a "global job size". That's not its business. > Can't we add some attributes to the component's initialization method > that provides hints for how to allocate resources it needs to provide its > functionality? > > I'll see if there's something clever that can be done in ugni for now. > I can always add in a hack to probe the apps placement info file and > scale the smsg blocks by number of nodes rather than number of ranks. > > Howard > > > -Original Message- > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan > Hjelm > Sent: Thursday, July 31, 2014 8:58 AM > To: Open MPI Developers > Subject: Re: [OMPI devel] RFC: job size info in OPAL > > > +2^1000 > > This information is absolutely necessary at this point. If someone has a > better solution they can provide it as an alternative RFC. Until then this is > how it should be done... Otherwise we loose uGNI support on the trunk. > Because we ARE NOT going to remove the mailbox size optimization. > > -Nathan > > On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote: >> WHAT: Should we make the job size (i.e., initial number of procs) available >> in OPAL? >> >> WHY: At least 2 BTLs are using this info (*more below) >> >> WHERE: usnic and ugni >> >> TIMEOUT: there's already been some inflammatory emails about this; >> let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014 >> >> MORE DETAIL: >> >> This is an open question. We *have* the information at the time that the >> BTLs are initialized: do we allow that information to go down to OPAL? >> >> Ralph added this info down in OPAL in r32355, but George reverted it in >> r32361. >> >> Points for: YES, WE SHOULD >> +++ 2 BTLs were using it (usinc, ugni) Other RTE job-related info are >> +++ already in OPAL (num local ranks, local rank) >> >> Points for: NO, WE SHOULD NOT >> --- What exactly is this number (e.g., num currently-connected procs?), and >> when is it updated? >> --- We need to precisely delineate what belongs in OPAL vs. >> above-OPAL >> >> FWIW: here's how ompi_process_info.num_procs was used before the BTL move >> down to OPAL: >> >> - usnic: for a minor latency optimization / sizing of a shared >> receive buffer queue length, and for the initial size of a peer >> lookup hash >> - ugni: to determine the size of the per-peer buffers used for >> send/recv communication >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15395.php ___ devel mailing list de...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2014/07/15396.php
Re: [OMPI devel] RFC: job size info in OPAL
I definitively think you misunderstood this scope of this RFC. The information that is so important to you to configure the mailbox size is available to you when you need it. This information is made available by the PML through the call to add_procs, which comes with all the procs in the MPI_COMM_WORLD. So, ugni doesn’t need anything more than it is available today. [This is of course under the assumption that someone clean the BTL and remove the usage of MPI_COMM_WORLD.] The real scope of this RFC is to move this information before that in order to allow the BTLs to have access to some possible number of processes between the call to btl_open and the call to btl_all_proc (in other words during btl_init). George. PS: here is the patch that fixes all issues in ugni. ugni.patch Description: Binary data On Jul 31, 2014, at 10:58 , Nathan Hjelmwrote: > > +2^1000 > > This information is absolutely necessary at this point. If someone has a > better solution they can provide it as an alternative RFC. Until then > this is how it should be done... Otherwise we loose uGNI support on the > trunk. Because we ARE NOT going to remove the mailbox size optimization. > > -Nathan > > On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote: >> WHAT: Should we make the job size (i.e., initial number of procs) available >> in OPAL? >> >> WHY: At least 2 BTLs are using this info (*more below) >> >> WHERE: usnic and ugni >> >> TIMEOUT: there's already been some inflammatory emails about this; let's >> discuss next Tuesday on the teleconf: Tue, 5 Aug 2014 >> >> MORE DETAIL: >> >> This is an open question. We *have* the information at the time that the >> BTLs are initialized: do we allow that information to go down to OPAL? >> >> Ralph added this info down in OPAL in r32355, but George reverted it in >> r32361. >> >> Points for: YES, WE SHOULD >> +++ 2 BTLs were using it (usinc, ugni) >> +++ Other RTE job-related info are already in OPAL (num local ranks, local >> rank) >> >> Points for: NO, WE SHOULD NOT >> --- What exactly is this number (e.g., num currently-connected procs?), and >> when is it updated? >> --- We need to precisely delineate what belongs in OPAL vs. above-OPAL >> >> FWIW: here's how ompi_process_info.num_procs was used before the BTL move >> down to OPAL: >> >> - usnic: for a minor latency optimization / sizing of a shared receive >> buffer queue length, and for the initial size of a peer lookup hash >> - ugni: to determine the size of the per-peer buffers used for send/recv >> communication >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15394.php
Re: [OMPI devel] RFC: job size info in OPAL
The maximum number of peer processes that may be added over the course of the job will suffice. So either the world or universe size. This is a reasonable piece of information to expect the upper layers to provide to the communication layer. And the impact of providing this information is no less intrusive than providing information like the number of local ranks. -Nathan On Thu, Jul 31, 2014 at 11:09:24AM -0400, George Bosilca wrote: > What is your definition of “global job size”? > > George. > > On Jul 31, 2014, at 11:06 , Pritchard Jr., Howardwrote: > > > Hi Folks, > > > > I think given the way we want to use the btl's in lower levels like opal, > > it is pretty disgusting for a btl to need to figure out on its own something > > like a "global job size". That's not its business. Can't we add some > > attributes > > to the component's initialization method that provides hints for how to > > allocate resources it needs to provide its functionality? > > > > I'll see if there's something clever that can be done in ugni for now. > > I can always add in a hack to probe the apps placement info file and > > scale the smsg blocks by number of nodes rather than number of ranks. > > > > Howard > > > > > > -Original Message- > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm > > Sent: Thursday, July 31, 2014 8:58 AM > > To: Open MPI Developers > > Subject: Re: [OMPI devel] RFC: job size info in OPAL > > > > > > +2^1000 > > > > This information is absolutely necessary at this point. If someone has a > > better solution they can provide it as an alternative RFC. Until then this > > is how it should be done... Otherwise we loose uGNI support on the trunk. > > Because we ARE NOT going to remove the mailbox size optimization. > > > > -Nathan > > > > On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote: > >> WHAT: Should we make the job size (i.e., initial number of procs) > >> available in OPAL? > >> > >> WHY: At least 2 BTLs are using this info (*more below) > >> > >> WHERE: usnic and ugni > >> > >> TIMEOUT: there's already been some inflammatory emails about this; > >> let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014 > >> > >> MORE DETAIL: > >> > >> This is an open question. We *have* the information at the time that the > >> BTLs are initialized: do we allow that information to go down to OPAL? > >> > >> Ralph added this info down in OPAL in r32355, but George reverted it in > >> r32361. > >> > >> Points for: YES, WE SHOULD > >> +++ 2 BTLs were using it (usinc, ugni) Other RTE job-related info are > >> +++ already in OPAL (num local ranks, local rank) > >> > >> Points for: NO, WE SHOULD NOT > >> --- What exactly is this number (e.g., num currently-connected procs?), > >> and when is it updated? > >> --- We need to precisely delineate what belongs in OPAL vs. above-OPAL > >> > >> FWIW: here's how ompi_process_info.num_procs was used before the BTL move > >> down to OPAL: > >> > >> - usnic: for a minor latency optimization / sizing of a shared receive > >> buffer queue length, and for the initial size of a peer lookup hash > >> - ugni: to determine the size of the per-peer buffers used for > >> send/recv communication > >> > >> -- > >> Jeff Squyres > >> jsquy...@cisco.com > >> For corporate legal information go to: > >> http://www.cisco.com/web/about/doing_business/legal/cri/ > >> > >> ___ > >> devel mailing list > >> de...@open-mpi.org > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> Link to this post: > >> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php > > ___ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > > http://www.open-mpi.org/community/lists/devel/2014/07/15395.php > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15396.php pgpvqqekN2qM3.pgp Description: PGP signature
Re: [OMPI devel] RFC: job size info in OPAL
I'd like to suggest an alternative solution. A BTL can exploit whatever data it wants, but should first test if the data is available. If the data is *required*, then the BTL gracefully disqualifies itself. If the data is *desirable* for optimization, then the BTL writer (if they choose) can include an alternate path that doesn't do the optimization if the data isn't available. This would seem to meet everyone's needs, yes? On Jul 31, 2014, at 8:09 AM, George Bosilcawrote: > What is your definition of “global job size”? > > George. > > On Jul 31, 2014, at 11:06 , Pritchard Jr., Howard wrote: > >> Hi Folks, >> >> I think given the way we want to use the btl's in lower levels like opal, >> it is pretty disgusting for a btl to need to figure out on its own something >> like a "global job size". That's not its business. Can't we add some >> attributes >> to the component's initialization method that provides hints for how to >> allocate resources it needs to provide its functionality? >> >> I'll see if there's something clever that can be done in ugni for now. >> I can always add in a hack to probe the apps placement info file and >> scale the smsg blocks by number of nodes rather than number of ranks. >> >> Howard >> >> >> -Original Message- >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm >> Sent: Thursday, July 31, 2014 8:58 AM >> To: Open MPI Developers >> Subject: Re: [OMPI devel] RFC: job size info in OPAL >> >> >> +2^1000 >> >> This information is absolutely necessary at this point. If someone has a >> better solution they can provide it as an alternative RFC. Until then this >> is how it should be done... Otherwise we loose uGNI support on the trunk. >> Because we ARE NOT going to remove the mailbox size optimization. >> >> -Nathan >> >> On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote: >>> WHAT: Should we make the job size (i.e., initial number of procs) available >>> in OPAL? >>> >>> WHY: At least 2 BTLs are using this info (*more below) >>> >>> WHERE: usnic and ugni >>> >>> TIMEOUT: there's already been some inflammatory emails about this; >>> let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014 >>> >>> MORE DETAIL: >>> >>> This is an open question. We *have* the information at the time that the >>> BTLs are initialized: do we allow that information to go down to OPAL? >>> >>> Ralph added this info down in OPAL in r32355, but George reverted it in >>> r32361. >>> >>> Points for: YES, WE SHOULD >>> +++ 2 BTLs were using it (usinc, ugni) Other RTE job-related info are >>> +++ already in OPAL (num local ranks, local rank) >>> >>> Points for: NO, WE SHOULD NOT >>> --- What exactly is this number (e.g., num currently-connected procs?), and >>> when is it updated? >>> --- We need to precisely delineate what belongs in OPAL vs. above-OPAL >>> >>> FWIW: here's how ompi_process_info.num_procs was used before the BTL move >>> down to OPAL: >>> >>> - usnic: for a minor latency optimization / sizing of a shared receive >>> buffer queue length, and for the initial size of a peer lookup hash >>> - ugni: to determine the size of the per-peer buffers used for >>> send/recv communication >>> >>> -- >>> Jeff Squyres >>> jsquy...@cisco.com >>> For corporate legal information go to: >>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15395.php > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15396.php
Re: [OMPI devel] RFC: job size info in OPAL
What is your definition of “global job size”? George. On Jul 31, 2014, at 11:06 , Pritchard Jr., Howardwrote: > Hi Folks, > > I think given the way we want to use the btl's in lower levels like opal, > it is pretty disgusting for a btl to need to figure out on its own something > like a "global job size". That's not its business. Can't we add some > attributes > to the component's initialization method that provides hints for how to > allocate resources it needs to provide its functionality? > > I'll see if there's something clever that can be done in ugni for now. > I can always add in a hack to probe the apps placement info file and > scale the smsg blocks by number of nodes rather than number of ranks. > > Howard > > > -Original Message- > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm > Sent: Thursday, July 31, 2014 8:58 AM > To: Open MPI Developers > Subject: Re: [OMPI devel] RFC: job size info in OPAL > > > +2^1000 > > This information is absolutely necessary at this point. If someone has a > better solution they can provide it as an alternative RFC. Until then this is > how it should be done... Otherwise we loose uGNI support on the trunk. > Because we ARE NOT going to remove the mailbox size optimization. > > -Nathan > > On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote: >> WHAT: Should we make the job size (i.e., initial number of procs) available >> in OPAL? >> >> WHY: At least 2 BTLs are using this info (*more below) >> >> WHERE: usnic and ugni >> >> TIMEOUT: there's already been some inflammatory emails about this; >> let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014 >> >> MORE DETAIL: >> >> This is an open question. We *have* the information at the time that the >> BTLs are initialized: do we allow that information to go down to OPAL? >> >> Ralph added this info down in OPAL in r32355, but George reverted it in >> r32361. >> >> Points for: YES, WE SHOULD >> +++ 2 BTLs were using it (usinc, ugni) Other RTE job-related info are >> +++ already in OPAL (num local ranks, local rank) >> >> Points for: NO, WE SHOULD NOT >> --- What exactly is this number (e.g., num currently-connected procs?), and >> when is it updated? >> --- We need to precisely delineate what belongs in OPAL vs. above-OPAL >> >> FWIW: here's how ompi_process_info.num_procs was used before the BTL move >> down to OPAL: >> >> - usnic: for a minor latency optimization / sizing of a shared receive >> buffer queue length, and for the initial size of a peer lookup hash >> - ugni: to determine the size of the per-peer buffers used for >> send/recv communication >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15395.php
Re: [OMPI devel] RFC: job size info in OPAL
Hi Folks, I think given the way we want to use the btl's in lower levels like opal, it is pretty disgusting for a btl to need to figure out on its own something like a "global job size". That's not its business. Can't we add some attributes to the component's initialization method that provides hints for how to allocate resources it needs to provide its functionality? I'll see if there's something clever that can be done in ugni for now. I can always add in a hack to probe the apps placement info file and scale the smsg blocks by number of nodes rather than number of ranks. Howard -Original Message- From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm Sent: Thursday, July 31, 2014 8:58 AM To: Open MPI Developers Subject: Re: [OMPI devel] RFC: job size info in OPAL +2^1000 This information is absolutely necessary at this point. If someone has a better solution they can provide it as an alternative RFC. Until then this is how it should be done... Otherwise we loose uGNI support on the trunk. Because we ARE NOT going to remove the mailbox size optimization. -Nathan On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote: > WHAT: Should we make the job size (i.e., initial number of procs) available > in OPAL? > > WHY: At least 2 BTLs are using this info (*more below) > > WHERE: usnic and ugni > > TIMEOUT: there's already been some inflammatory emails about this; > let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014 > > MORE DETAIL: > > This is an open question. We *have* the information at the time that the > BTLs are initialized: do we allow that information to go down to OPAL? > > Ralph added this info down in OPAL in r32355, but George reverted it in > r32361. > > Points for: YES, WE SHOULD > +++ 2 BTLs were using it (usinc, ugni) Other RTE job-related info are > +++ already in OPAL (num local ranks, local rank) > > Points for: NO, WE SHOULD NOT > --- What exactly is this number (e.g., num currently-connected procs?), and > when is it updated? > --- We need to precisely delineate what belongs in OPAL vs. above-OPAL > > FWIW: here's how ompi_process_info.num_procs was used before the BTL move > down to OPAL: > > - usnic: for a minor latency optimization / sizing of a shared receive > buffer queue length, and for the initial size of a peer lookup hash > - ugni: to determine the size of the per-peer buffers used for > send/recv communication > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15373.php
Re: [OMPI devel] RFC: job size info in OPAL
+2^1000 This information is absolutely necessary at this point. If someone has a better solution they can provide it as an alternative RFC. Until then this is how it should be done... Otherwise we loose uGNI support on the trunk. Because we ARE NOT going to remove the mailbox size optimization. -Nathan On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote: > WHAT: Should we make the job size (i.e., initial number of procs) available > in OPAL? > > WHY: At least 2 BTLs are using this info (*more below) > > WHERE: usnic and ugni > > TIMEOUT: there's already been some inflammatory emails about this; let's > discuss next Tuesday on the teleconf: Tue, 5 Aug 2014 > > MORE DETAIL: > > This is an open question. We *have* the information at the time that the > BTLs are initialized: do we allow that information to go down to OPAL? > > Ralph added this info down in OPAL in r32355, but George reverted it in > r32361. > > Points for: YES, WE SHOULD > +++ 2 BTLs were using it (usinc, ugni) > +++ Other RTE job-related info are already in OPAL (num local ranks, local > rank) > > Points for: NO, WE SHOULD NOT > --- What exactly is this number (e.g., num currently-connected procs?), and > when is it updated? > --- We need to precisely delineate what belongs in OPAL vs. above-OPAL > > FWIW: here's how ompi_process_info.num_procs was used before the BTL move > down to OPAL: > > - usnic: for a minor latency optimization / sizing of a shared receive buffer > queue length, and for the initial size of a peer lookup hash > - ugni: to determine the size of the per-peer buffers used for send/recv > communication > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15373.php pgpRELGUpwpHm.pgp Description: PGP signature
[OMPI devel] RFC: Change default behavior of calling ibv_fork_init
WHAT: Change default behavior in openib to not call ibv_fork_init() even if available. WHY: There are some strange interactions with ummunotify that cause errors. In addition, see the additional points below. WHEN: After next weekly meeting, August 5, 2014 DETAILS: This change will just be a couple of lines. Current default behavior is to call ibv_fork_init() if support exists. New default behavior is to call it only if asked for. Essentially, default setting of btl_openib_want_fork_support will change from -1 (use it if available) to 0 (do not use unless asked for) Here are details from an earlier post last year. http://www.open-mpi.org/community/lists/devel/2013/12/13395.php Subject: [OMPI devel] RFC: Calling ibv_fork_init() in the openib BTL From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden]) List-Post: devel@lists.open-mpi.org Date: 2013-12-06 10:15:02 To those who care about the openib BTL... SHORT VERSION - Do you really want to call ibv_fork_init() in the openib BTL by default? MORE DETAIL --- Rolf V. pointed out to me yesterday that we're calling ibv_fork_init() in the openib BTL. He asked if we did the same in the usnic BTL. We don't, and here's why: 1. it adds a slight performance penalty for ibv_reg_mr/ibv_dereg_mr 2. the only thing ibv_fork_init() protects against is the child sending from memory that it thinks should already be registered: - MPI_Init(...) if (0 == fork()) { ibv_post_send(some_previously_pinned_buffer, ...); // ^^ this can't work because the buffer is *not* pinned in the child // (for lack of a longer explanation here) } - 3. ibv_fork_init() is not intended to protect against a child invoking an MPI function (if they do that; they get what they deserve!). Note that #2 can't happen, because MPI doesn't expose its protection domains, queue pairs, or registrations (or any of its verbs constructs) at all. Hence, all ibv_fork_init() does is a) impose a performance penalty, and b) make memory physically unavailable in a child process, such that: ibv_fork_init(); a = malloc(...); a[0] = 17; ibv_reg_mr(a, ...); if (0 == fork()) { printf("this is a[0]: %d\n", a[0]); // ^^ This will segv } - But the registered memory may actually be useful in the child. So I just thought I'd pass this along, and ask the openib-caring people of the world if you really still want to be calling ibv_fork_init() by default in the openib BTL. -- Jeff Squyres --- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ---
Re: [OMPI devel] mca_PROJECT_FRAMEWORK_COMPONENT_symbol vs. mca_FRAMEWORK_COMPONENT_symbol
Doesn't namespacing obviate the need for this convoluted identifier scheme? See, for example, UML package import and include behaviors. -Original Message- From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Dave Goodell (dgoodell) Sent: Wednesday, July 30, 2014 3:35 PM To: Open MPI Developers Subject: [OMPI devel] mca_PROJECT_FRAMEWORK_COMPONENT_symbol vs. mca_FRAMEWORK_COMPONENT_symbol Jeff and I were talking about some namespacing issues that have come up in the recent BTL move from OMPI to OPAL. AFAIK, the current system for namespacing external symbols is to name them "mca_FRAMEWORK_COMPONENT_symbol" (e.g., "mca_btl_tcp_add_procs" in the tcp BTL). Similarly, the DSO for the component is named "mca_FRAMEWORK_COMPONENT.so" (e.g., "mca_btl_tcp.so"). Jeff asserted that the eventual goal is to move to a system where all MCA frameworks/components are also prefixed by the project name. So the above examples become "mca_ompi_btl_tcp_add_procs" and "mca_ompi_btl_tcp.so". Does anyone actually care about pursuing this goal? I ask because if nobody wants to pursue the goal of adding project names to namespaces then I already have an easy solution to most of our namespacing problems. OTOH, if someone does wish to pursue that goal, then I have a namespace-related RFC that I would like to propose (in a subsequent email). -Dave ___ devel mailing list de...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2014/07/15371.php - No virus found in this message. Checked by AVG - www.avg.com Version: 2014.0.4716 / Virus Database: 3986/7949 - Release Date: 07/30/14
Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error
Gilles, Just as you speculate, PGI is creating a _-suffixed reference to the module name: $ pgf90 -c test.f90 $ nm -u test.o | grep f08 U mpi_f08_sizeof_ U mpi_f08_sizeof_mpi_sizeof_real_s_4_ You suggested the following work-around in a previous email: $ INST/bin/mpifort ../test.f ./BLD/ompi/mpi/fortran/use-mpi-f08/.libs/libforce_usempif08_internal_modules_to_be_built.a That works fine. That doesn't surprise me, because I had already identified that file as having been removed from libmpi_usempif08.so between 1.8.1 and 1.8.2rc2. It includes the symbol for the module names plus trailing '_'. -Paul On Thu, Jul 31, 2014 at 1:07 AM, Gilles Gouaillardet < gilles.gouaillar...@iferc.org> wrote: > Paul, > > in .../ompi/mpi/fortran/use-mpi-f08, can you create the following dumb > test program, > compile and run nm | grep f08 on the object : > > $ cat foo.f90 > program foo > use mpi_f08_sizeof > > implicit none > > real :: x > integer :: size, ierror > > call MPI_Sizeof_real_s_4(x, size, ierror) > > stop > end program > > > with intel compiler : > $ ifort -c foo.f90 > $ nm foo.o | grep f08 > U mpi_f08_sizeof_mp_mpi_sizeof_real_s_4_ > > i am wondering whether PGI compiler adds an additional undefined > reference to mpi_f08_sizeof_ ... > > Cheers, > > Gilles > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15390.php > -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error
Paul, in .../ompi/mpi/fortran/use-mpi-f08, can you create the following dumb test program, compile and run nm | grep f08 on the object : $ cat foo.f90 program foo use mpi_f08_sizeof implicit none real :: x integer :: size, ierror call MPI_Sizeof_real_s_4(x, size, ierror) stop end program with intel compiler : $ ifort -c foo.f90 $ nm foo.o | grep f08 U mpi_f08_sizeof_mp_mpi_sizeof_real_s_4_ i am wondering whether PGI compiler adds an additional undefined reference to mpi_f08_sizeof_ ... Cheers, Gilles
Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error
Paul and all, For what it's worth, with openmpi 1.8.2rc2 and the intel fortran compiler version 14.0.3.174 : $ nm libmpi_usempif08.so| grep -i sizeof there is no such undefined symbol (mpi_f08_sizeof_) as a temporary workaround, did you try to force the linker use libforce_usempif08_internal_modules_to_be_built.a /* this library does not get installed (at least with intel compilers), but it is in the compilation tree */ Cheers, Gilles On 2014/07/31 12:53, Paul Hargrove wrote: > In 1.8.2rc2: > $ nm openmpi-1.8.2rc2-linux-x86_64-pgi-14.4/INST/lib/libmpi_usempif08.so | > grep ' mpi_f08_sizeof_' > U mpi_f08_sizeof_
Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error
Hi Paul, Thank you for your investigation. I'm sure it's very close to fix the problem although I myself can't do that. So I must owe you something... Please try Awamori, which is Okinawa's sake and very good in such a hot day. Tetsuya > On Wed, Jul 30, 2014 at 8:53 PM, Paul Hargrovewrote: > [...] > I have a clear answer to *what* is different (below) and am next looking into the why/how now. > It seems that 1.8.1 has included all dependencies into libmpi_usempif08 while 1.8.2rc2 does not. > [...] > > The difference appears to stem from the following difference in ompi/mpi/fortran/use-mpi-f08/Makefile.am: > > 1.8.1: > libmpi_usempif08_la_LIBADD = \ > $(module_sentinel_file) \ > $(OMPI_MPIEXT_USEMPIF08_LIBS) \ > $(top_builddir)/ompi/libmpi.la > > 1.8.2rc2: > libmpi_usempif08_la_LIBADD = \ > $(OMPI_MPIEXT_USEMPIF08_LIBS) \ > $(top_builddir)/ompi/libmpi.la > libmpi_usempif08_la_DEPENDENCIES = $(module_sentinel_file) > > Where in both cases one has: > > module_sentinel_file = \ > libforce_usempif08_internal_modules_to_be_built.la > > which contains all of the symbols which my previous testing found had "disappeared" from libmpi_usempif08.so between 1.8.1 and 1.8.2rc2. > > I don't have recent enough autotools to attempt the change the Makefile.am, but instead restored the removed item from libmpi_usempif08_la_LIBADD directly in Makefile.in. However, rather than fixing > the problem, that resulted in multiple definitions of a bunch of _eq and _ne functions (e.g. mpi_f08_types_ompi_request_op_ne_). So, I am uncertain how to proceed. > > Use svn blame points at a "bulk" CMR of many fortran related changes, including one related to the eq/ne operators. So, I am turning over this investigation to Jeff and/or Ralph to figure out what > actually is required to fix this without loss of whatever benefits were in that CMR. I am still available to test the proposed fixes. Happy hunting... > > Somebody owes me a virtual beer (or nihonshu) ;-) > -Paul > > > -- > > Paul H. Hargrove phhargr...@lbl.gov > Future Technologies Group > Computer and Data Sciences Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/develLink to this post: http://www.open-mpi.org/community/lists/devel/2014/07/15387.php
Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error
On Wed, Jul 30, 2014 at 8:53 PM, Paul Hargrovewrote: [...] > I have a clear answer to *what* is different (below) and am next looking > into the why/how now. > It seems that 1.8.1 has included all dependencies into libmpi_usempif08 > while 1.8.2rc2 does not. > [...] The difference appears to stem from the following difference in ompi/mpi/fortran/use-mpi-f08/Makefile.am: 1.8.1: libmpi_usempif08_la_LIBADD = \ $(module_sentinel_file) \ $(OMPI_MPIEXT_USEMPIF08_LIBS) \ $(top_builddir)/ompi/libmpi.la 1.8.2rc2: libmpi_usempif08_la_LIBADD = \ $(OMPI_MPIEXT_USEMPIF08_LIBS) \ $(top_builddir)/ompi/libmpi.la libmpi_usempif08_la_DEPENDENCIES = $(module_sentinel_file) Where in both cases one has: module_sentinel_file = \ libforce_usempif08_internal_modules_to_be_built.la which contains all of the symbols which my previous testing found had "disappeared" from libmpi_usempif08.so between 1.8.1 and 1.8.2rc2. I don't have recent enough autotools to attempt the change the Makefile.am, but instead restored the removed item from libmpi_usempif08_la_LIBADD directly in Makefile.in. However, rather than fixing the problem, that resulted in multiple definitions of a bunch of _eq and _ne functions (e.g. mpi_f08_types_ompi_request_op_ne_). So, I am uncertain how to proceed. Use svn blame points at a "bulk" CMR of many fortran related changes, including one related to the eq/ne operators. So, I am turning over this investigation to Jeff and/or Ralph to figure out what actually is required to fix this without loss of whatever benefits were in that CMR. I am still available to test the proposed fixes. Happy hunting... Somebody owes me a virtual beer (or nihonshu) ;-) -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error
On Wed, Jul 30, 2014 at 6:20 PM, Paul Hargrovewrote: > > On Wed, Jul 30, 2014 at 6:15 PM, wrote: > [...] > >> Strange thing is that openmpi-1.8 with PGI14.7 works fine. >> What's the difference with openmpi-1.8 and openmpi-1.8.2rc2? >> > [...] > > Tetsuya, > > Now that I can reproduce the problem you have reported, I am building > 1.8.1 with PGI14.4. > Then I may be able to answer the question about what is different. > > -Paul > I have a clear answer to *what* is different (below) and am next looking into the why/how now. It seems that 1.8.1 has included all dependencies into libmpi_usempif08 while 1.8.2rc2 does not. My reflex is to blame libtool, but config/lt* are unchanged between the two versions. I am rebuilding now with "V=1" passed to make so I can see how the libs were built. I'd appreciate guidance if Jeff or anybody else has suggestions as to an alternative approach to investigate this. When completed, I will be (more than) happy to turn over the verbose make output for somebody else to examine. -Paul In 1.8.1: $ nm openmpi-1.8.1-linux-x86_64-pgi-14.4/INST/lib/libmpi_usempif08.so | grep ' mpi_f08_sizeof_' 0004a9a0 T mpi_f08_sizeof_ 0004ad70 T mpi_f08_sizeof_mpi_sizeof_complex_a_16_ 0004acf0 T mpi_f08_sizeof_mpi_sizeof_complex_a_8_ 0004ad30 T mpi_f08_sizeof_mpi_sizeof_complex_s_16_ 0004acb0 T mpi_f08_sizeof_mpi_sizeof_complex_s_8_ 0004a9f0 T mpi_f08_sizeof_mpi_sizeof_integer_a_1_ 0004aa70 T mpi_f08_sizeof_mpi_sizeof_integer_a_2_ 0004aaf0 T mpi_f08_sizeof_mpi_sizeof_integer_a_4_ 0004ab70 T mpi_f08_sizeof_mpi_sizeof_integer_a_8_ 0004a9b0 T mpi_f08_sizeof_mpi_sizeof_integer_s_1_ 0004aa30 T mpi_f08_sizeof_mpi_sizeof_integer_s_2_ 0004aab0 T mpi_f08_sizeof_mpi_sizeof_integer_s_4_ 0004ab30 T mpi_f08_sizeof_mpi_sizeof_integer_s_8_ 0004abf0 T mpi_f08_sizeof_mpi_sizeof_real_a_4_ 0004ac70 T mpi_f08_sizeof_mpi_sizeof_real_a_8_ 0004abb0 T mpi_f08_sizeof_mpi_sizeof_real_s_4_ 0004ac30 T mpi_f08_sizeof_mpi_sizeof_real_s_8_ In 1.8.2rc2: $ nm openmpi-1.8.2rc2-linux-x86_64-pgi-14.4/INST/lib/libmpi_usempif08.so | grep ' mpi_f08_sizeof_' U mpi_f08_sizeof_ Similar differences exist corresponding to the other three modules that give undefined references in Tetsuya's simple test code. -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900