Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-07-31 Thread Paul Hargrove
George:

Have a failure with your patch applied on PPC64/Linux and gcc-4.4.6:

Making all in asm
make[2]: Entering directory
`/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/BLD/opal/asm'
  CC   asm.lo
In file included from
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/openmpi-1.9a1r32369/opal/asm/asm.c:21:0:
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/openmpi-1.9a1r32369/opal/include/opal/sys/atomic.h:374:9:
error: conflicting types for 'opal_atomic_cmpset_rel_64'
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/openmpi-1.9a1r32369/opal/include/opal/sys/powerpc/atomic.h:214:19:
note: previous definition of 'opal_atomic_cmpset_rel_64' was here
/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/openmpi-1.9a1r32369/opal/include/opal/sys/atomic.h:374:9:
warning: 'opal_atomic_cmpset_rel_64' used but never defined [enabled by
default]
make[2]: *** [asm.lo] Error 1


BTW: the patch applied cleanly to trunk except the portion
changing opal/include/opal/sys/osx/atomic.h, which does not exist.

-Paul


On Thu, Jul 31, 2014 at 4:25 PM, George Bosilca  wrote:

> Awesome, thanks Paul. When the results will be in we will fix whatever is
> needed for these less common architectures.
>
>   George.
>
>
>
> On Thu, Jul 31, 2014 at 7:24 PM, Paul Hargrove  wrote:
>
>>
>>
>> On Thu, Jul 31, 2014 at 4:22 PM, Paul Hargrove 
>> wrote:
>>
>>>
>>> On Thu, Jul 31, 2014 at 4:13 PM, George Bosilca 
>>> wrote:
>>>
 Paul, I know you have a pretty diverse range computers. Can you try to
 compile and run a "make check" with the following patch?
>>>
>>>
>>> I will see what I can do for ARMv7, MIPS, PPC and IA64 (or whatever
>>> subset of those is still supported).
>>> The ARM and MIPS system are emulators and take forever to build OMPI.
>>> However, I am not even sure how soon I'll get to start this testing.
>>>
>>
>>
>> Add SPARC (v8plus and v9) to that list.
>>
>>
>>
>> --
>> Paul H. Hargrove  phhargr...@lbl.gov
>>  Future Technologies Group
>> Computer and Data Sciences Department Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/07/15411.php
>>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15412.php
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread Gilles Gouaillardet
Paul,

the ibm test suite from the non public ompi-tests repository has several
tests for usempif08.

Cheers,

Gilles

On 2014/08/01 11:04, Paul Hargrove wrote:
> Second related issue:
>
> Can/should examples/hello_usempif08.f90 be extended to use more of the
> module such that it would have illustrated the bug found with Tetsuya's
> example code?   I don't know about MTT, but my scripts for testing a
> release candidate includes running "make" in the example subdir.
>
>



Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread Paul Hargrove
Nevermind my suggestion to revise examples/hello_usempif08.f90
I've just determined that it is already sufficient to reproduce the problem.
(So now I need to see what's wrong in my testing scripts).

-Paul


On Thu, Jul 31, 2014 at 7:04 PM, Paul Hargrove  wrote:

> Second related issue:
>
> Can/should examples/hello_usempif08.f90 be extended to use more of the
> module such that it would have illustrated the bug found with Tetsuya's
> example code?   I don't know about MTT, but my scripts for testing a
> release candidate includes running "make" in the example subdir.
>
> -Paul
>
>
> On Thu, Jul 31, 2014 at 6:17 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
>
>> Many thanks guys, this thread was most helpful in finding the fix.
>>
>> Paul H. nailed 80% of it on the head in the post where he identified the
>> Makefile.am change.  That Makefile.am change was due to three things:
>>
>> 1. Fixing a real bug (elsewhere in that commit)
>> 2. My misunderstanding of how module files work in Fortran
>> 3. The fact that gfortran, Absoft, and ifort *don't* require you to link
>> in the .o files generated by modules, but apparently pgfortran *does*
>>
>> Blarg.
>>
>> That led to the duplicate symbol issue which Paul also encountered when
>> he tried to fix the original problem, so I fixed that, too (which was a
>> direct consequence of the first fix).
>>
>> Should be fixed in the trunk now; we tested with pgfortran on Craig
>> Rasmussen's cluster (many thanks, Craig!).
>>
>> CMR is https://svn.open-mpi.org/trac/ompi/ticket/4519.
>>
>>
>>
>>
>> On Jul 31, 2014, at 7:27 AM, Paul Hargrove  wrote:
>>
>> > Gilles,
>> >
>> >
>> > Just as you speculate, PGI is creating a _-suffixed reference to the
>> module name:
>> >
>> > $ pgf90 -c test.f90
>> > $ nm -u test.o | grep f08
>> >  U mpi_f08_sizeof_
>> >  U mpi_f08_sizeof_mpi_sizeof_real_s_4_
>> >
>> >
>> >
>> > You suggested the following work-around in a previous email:
>> >
>> > $ INST/bin/mpifort  ../test.f
>> ./BLD/ompi/mpi/fortran/use-mpi-f08/.libs/libforce_usempif08_internal_modules_to_be_built.a
>> >
>> > That works fine.  That doesn't surprise me, because I had already
>> identified that file as having been removed from libmpi_usempif08.so
>> between 1.8.1 and 1.8.2rc2.  It includes the symbol for the module names
>> plus trailing '_'.
>> >
>> > -Paul
>> >
>> >
>> > On Thu, Jul 31, 2014 at 1:07 AM, Gilles Gouaillardet <
>> gilles.gouaillar...@iferc.org> wrote:
>> > Paul,
>> >
>> > in .../ompi/mpi/fortran/use-mpi-f08, can you create the following dumb
>> > test program,
>> > compile and run nm | grep f08 on the object :
>> >
>> > $ cat foo.f90
>> > program foo
>> > use mpi_f08_sizeof
>> >
>> > implicit none
>> >
>> > real :: x
>> > integer :: size, ierror
>> >
>> > call MPI_Sizeof_real_s_4(x, size, ierror)
>> >
>> > stop
>> > end program
>> >
>> >
>> > with intel compiler :
>> > $ ifort -c foo.f90
>> > $ nm foo.o | grep f08
>> >  U mpi_f08_sizeof_mp_mpi_sizeof_real_s_4_
>> >
>> > i am wondering whether PGI compiler adds an additional undefined
>> > reference to mpi_f08_sizeof_ ...
>> >
>> > Cheers,
>> >
>> > Gilles
>> >
>> > ___
>> > devel mailing list
>> > de...@open-mpi.org
>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> > Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/07/15390.php
>> >
>> >
>> >
>> > --
>> > Paul H. Hargrove  phhargr...@lbl.gov
>> > Future Technologies Group
>> > Computer and Data Sciences Department Tel: +1-510-495-2352
>> > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>> > ___
>> > devel mailing list
>> > de...@open-mpi.org
>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> > Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/07/15391.php
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/07/15415.php
>>
>
>
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread Paul Hargrove
Related question:

If I am understanding PGI's list of fixed-TPRs (bugs) then it looks like
one (certainly not the only) difference between 13.x and 14.1 is a fix to a
problem with PROCEDURE and zero-argument subroutines.  As it happens, the
configure probe for PROCEEDURE is a zero-argument subroutine, but the
"real" usage in OMPI is *not* zero-argument.  This opens the possibility
(not certainty) that PROCEDURE may work as required in PGI-13.x, in which
case only a "more accurate" configure test would be required to restore F08
support for PGI-13 (present in 1.8.1 and lacking in 1.8.2rc2).

So, the most important question first:
Does anybody care about PGI-13 (cannot use PGI-14 for some reason other
than cost of license)?

-Paul


On Thu, Jul 31, 2014 at 6:17 PM, Jeff Squyres (jsquyres)  wrote:

> Many thanks guys, this thread was most helpful in finding the fix.
>
> Paul H. nailed 80% of it on the head in the post where he identified the
> Makefile.am change.  That Makefile.am change was due to three things:
>
> 1. Fixing a real bug (elsewhere in that commit)
> 2. My misunderstanding of how module files work in Fortran
> 3. The fact that gfortran, Absoft, and ifort *don't* require you to link
> in the .o files generated by modules, but apparently pgfortran *does*
>
> Blarg.
>
> That led to the duplicate symbol issue which Paul also encountered when he
> tried to fix the original problem, so I fixed that, too (which was a direct
> consequence of the first fix).
>
> Should be fixed in the trunk now; we tested with pgfortran on Craig
> Rasmussen's cluster (many thanks, Craig!).
>
> CMR is https://svn.open-mpi.org/trac/ompi/ticket/4519.
>
>
>
>
> On Jul 31, 2014, at 7:27 AM, Paul Hargrove  wrote:
>
> > Gilles,
> >
> >
> > Just as you speculate, PGI is creating a _-suffixed reference to the
> module name:
> >
> > $ pgf90 -c test.f90
> > $ nm -u test.o | grep f08
> >  U mpi_f08_sizeof_
> >  U mpi_f08_sizeof_mpi_sizeof_real_s_4_
> >
> >
> >
> > You suggested the following work-around in a previous email:
> >
> > $ INST/bin/mpifort  ../test.f
> ./BLD/ompi/mpi/fortran/use-mpi-f08/.libs/libforce_usempif08_internal_modules_to_be_built.a
> >
> > That works fine.  That doesn't surprise me, because I had already
> identified that file as having been removed from libmpi_usempif08.so
> between 1.8.1 and 1.8.2rc2.  It includes the symbol for the module names
> plus trailing '_'.
> >
> > -Paul
> >
> >
> > On Thu, Jul 31, 2014 at 1:07 AM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
> > Paul,
> >
> > in .../ompi/mpi/fortran/use-mpi-f08, can you create the following dumb
> > test program,
> > compile and run nm | grep f08 on the object :
> >
> > $ cat foo.f90
> > program foo
> > use mpi_f08_sizeof
> >
> > implicit none
> >
> > real :: x
> > integer :: size, ierror
> >
> > call MPI_Sizeof_real_s_4(x, size, ierror)
> >
> > stop
> > end program
> >
> >
> > with intel compiler :
> > $ ifort -c foo.f90
> > $ nm foo.o | grep f08
> >  U mpi_f08_sizeof_mp_mpi_sizeof_real_s_4_
> >
> > i am wondering whether PGI compiler adds an additional undefined
> > reference to mpi_f08_sizeof_ ...
> >
> > Cheers,
> >
> > Gilles
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15390.php
> >
> >
> >
> > --
> > Paul H. Hargrove  phhargr...@lbl.gov
> > Future Technologies Group
> > Computer and Data Sciences Department Tel: +1-510-495-2352
> > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15391.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15415.php
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread Jeff Squyres (jsquyres)
Many thanks guys, this thread was most helpful in finding the fix.

Paul H. nailed 80% of it on the head in the post where he identified the 
Makefile.am change.  That Makefile.am change was due to three things:

1. Fixing a real bug (elsewhere in that commit)
2. My misunderstanding of how module files work in Fortran
3. The fact that gfortran, Absoft, and ifort *don't* require you to link in the 
.o files generated by modules, but apparently pgfortran *does*

Blarg.

That led to the duplicate symbol issue which Paul also encountered when he 
tried to fix the original problem, so I fixed that, too (which was a direct 
consequence of the first fix).

Should be fixed in the trunk now; we tested with pgfortran on Craig Rasmussen's 
cluster (many thanks, Craig!).

CMR is https://svn.open-mpi.org/trac/ompi/ticket/4519.




On Jul 31, 2014, at 7:27 AM, Paul Hargrove  wrote:

> Gilles,
> 
> 
> Just as you speculate, PGI is creating a _-suffixed reference to the module 
> name:
> 
> $ pgf90 -c test.f90
> $ nm -u test.o | grep f08
>  U mpi_f08_sizeof_
>  U mpi_f08_sizeof_mpi_sizeof_real_s_4_
> 
> 
> 
> You suggested the following work-around in a previous email:
> 
> $ INST/bin/mpifort  ../test.f 
> ./BLD/ompi/mpi/fortran/use-mpi-f08/.libs/libforce_usempif08_internal_modules_to_be_built.a
> 
> That works fine.  That doesn't surprise me, because I had already identified 
> that file as having been removed from libmpi_usempif08.so between 1.8.1 and 
> 1.8.2rc2.  It includes the symbol for the module names plus trailing '_'.
> 
> -Paul
> 
> 
> On Thu, Jul 31, 2014 at 1:07 AM, Gilles Gouaillardet 
>  wrote:
> Paul,
> 
> in .../ompi/mpi/fortran/use-mpi-f08, can you create the following dumb
> test program,
> compile and run nm | grep f08 on the object :
> 
> $ cat foo.f90
> program foo
> use mpi_f08_sizeof
> 
> implicit none
> 
> real :: x
> integer :: size, ierror
> 
> call MPI_Sizeof_real_s_4(x, size, ierror)
> 
> stop
> end program
> 
> 
> with intel compiler :
> $ ifort -c foo.f90
> $ nm foo.o | grep f08
>  U mpi_f08_sizeof_mp_mpi_sizeof_real_s_4_
> 
> i am wondering whether PGI compiler adds an additional undefined
> reference to mpi_f08_sizeof_ ...
> 
> Cheers,
> 
> Gilles
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15390.php
> 
> 
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15391.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[hwloc-devel] Create success (hwloc git dev-170-gabee241)

2014-07-31 Thread MPI Team
Creating nightly hwloc snapshot git tarball was a success.

Snapshot:   hwloc dev-170-gabee241
Start time: Thu Jul 31 21:01:01 EDT 2014
End time:   Thu Jul 31 21:02:31 EDT 2014

Your friendly daemon,
Cyrador


Re: [OMPI devel] Further questions about BTL OPAL move...

2014-07-31 Thread Ralph Castain

On Jul 31, 2014, at 3:41 PM, George Bosilca  wrote:

> 
> On Jul 31, 2014, at 18:26 , Jeff Squyres (jsquyres)  
> wrote:
> 
>> George --
>> 
>> Got 2 questions for ya:
>> 
>> 1. I see some orte_* specific symbols/functions in ompi_mpi_init.c.  Was 
>> that intentional?  Shouldn’t that stuff be in the RTE framework, or some 
>> such?
> 
> Good catch. Fixed in r32384.
> 
>> 2. In tracking down some stuff relating to process names, it looks like 
>> names are now setting set by ompi/proc/proc.c (i.e., it makes a call to 
>> opal_proc_local_set(...)).  And this happens after the RTE is initialized.  
>> Is that right?  Seems a little weird to me — shouldn't the RTE be the one 
>> that sets the process names?
> 
> In my view the RTE should stay outside any local setting of the process. The 
> RTE role is to move the info around, not to force it on everybody else. When 
> multiple layers will use the BTL (and thus the OPAL level proc), we will have 
> to figure out who will be responsible for setting the data into the OPAL 
> level proc. Meanwhile, OMPI is the only one using this proc.


Unot exactly. The dstore and pmi frameworks depend on it, and that the 
name matches the one in the RTE. So ORTE is going to have to set that proc 
object, and I imagine STCI will too.


> 
>  George.
> 
>> 
>> Thanks!
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15407.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15408.php



Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-07-31 Thread George Bosilca
Awesome, thanks Paul. When the results will be in we will fix whatever is
needed for these less common architectures.

  George.



On Thu, Jul 31, 2014 at 7:24 PM, Paul Hargrove  wrote:

>
>
> On Thu, Jul 31, 2014 at 4:22 PM, Paul Hargrove  wrote:
>
>>
>> On Thu, Jul 31, 2014 at 4:13 PM, George Bosilca 
>> wrote:
>>
>>> Paul, I know you have a pretty diverse range computers. Can you try to
>>> compile and run a “make check” with the following patch?
>>
>>
>> I will see what I can do for ARMv7, MIPS, PPC and IA64 (or whatever
>> subset of those is still supported).
>> The ARM and MIPS system are emulators and take forever to build OMPI.
>> However, I am not even sure how soon I'll get to start this testing.
>>
>
>
> Add SPARC (v8plus and v9) to that list.
>
>
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15411.php
>


Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-07-31 Thread Paul Hargrove
On Thu, Jul 31, 2014 at 4:22 PM, Paul Hargrove  wrote:

>
> On Thu, Jul 31, 2014 at 4:13 PM, George Bosilca 
> wrote:
>
>> Paul, I know you have a pretty diverse range computers. Can you try to
>> compile and run a "make check" with the following patch?
>
>
> I will see what I can do for ARMv7, MIPS, PPC and IA64 (or whatever subset
> of those is still supported).
> The ARM and MIPS system are emulators and take forever to build OMPI.
> However, I am not even sure how soon I'll get to start this testing.
>


Add SPARC (v8plus and v9) to that list.



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-07-31 Thread Paul Hargrove
On Thu, Jul 31, 2014 at 4:13 PM, George Bosilca  wrote:

> Paul, I know you have a pretty diverse range computers. Can you try to
> compile and run a "make check" with the following patch?


I will see what I can do for ARMv7, MIPS, PPC and IA64 (or whatever subset
of those is still supported).
The ARM and MIPS system are emulators and take forever to build OMPI.
However, I am not even sure how soon I'll get to start this testing.

-Paul



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-07-31 Thread George Bosilca
All,

Here is the patch that change the meaning of the atomics to make them always 
return the previous value (similar to sync_fetch_and_<*>). I tested this with 
the following atomics: OS X, gcc style intrinsics and AMD64.

I did not change the base assembly files used when GCC style assembly 
operations are not supported. If someone feels like fixing them, feel free.

Paul, I know you have a pretty diverse range computers. Can you try to compile 
and run a “make check” with the following patch?

  George.



atomics.patch
Description: Binary data


On Jul 30, 2014, at 15:21 , Nathan Hjelm  wrote:

> 
> That is what I would prefer. I was trying to not disturb things too
> much :). Please bring the changes over!
> 
> -Nathan
> 
> On Wed, Jul 30, 2014 at 03:18:44PM -0400, George Bosilca wrote:
>>   Why do you want to add new versions? This will lead to having two, almost
>>   identical, sets of atomics that are conceptually equivalent but different
>>   in terms of code. And we will have to maintained both!
>>   I did a similar change in a fork of OPAL in another project but instead of
>>   adding another flavor of atomics, I completely replaced the available ones
>>   with a set returning the old value. I can bring the code over.
>> George.
>> 
>>   On Tue, Jul 29, 2014 at 5:29 PM, Paul Hargrove  wrote:
>> 
>> On Tue, Jul 29, 2014 at 2:10 PM, Nathan Hjelm  wrote:
>> 
>>   Is there a reason why the
>>   current implementations of opal atomics (add, cmpset) do not return
>>   the
>>   old value?
>> 
>> Because some CPUs don't implement such an atomic instruction?
>> 
>> On any CPU one *can* certainly synthesize the desired operation with an
>> added read before the compare-and-swap to return a value that was
>> present at some time before a failed cmpset.  That is almost certainly
>> sufficient for your purposes.  However, the added load makes it
>> (marginally) more expensive on some CPUs that only have the native
>> equivalent of gcc's __sync_bool_compare_and_swap().
>> 
>> -Paul
>> --
>> Paul H. Hargrove  phhargr...@lbl.gov
>> Future Technologies Group
>> Computer and Data Sciences Department Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/07/15328.php
> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15369.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15370.php



[OMPI devel] Further questions about BTL OPAL move...

2014-07-31 Thread Jeff Squyres (jsquyres)
George --

Got 2 questions for ya:

1. I see some orte_* specific symbols/functions in ompi_mpi_init.c.  Was that 
intentional?  Shouldn't that stuff be in the RTE framework, or some such?

2. In tracking down some stuff relating to process names, it looks like names 
are now setting set by ompi/proc/proc.c (i.e., it makes a call to 
opal_proc_local_set(...)).  And this happens after the RTE is initialized.  Is 
that right?  Seems a little weird to me -- shouldn't the RTE be the one that 
sets the process names?

Thanks!

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] mca_PROJECT_FRAMEWORK_COMPONENT_symbol vs. mca_FRAMEWORK_COMPONENT_symbol

2014-07-31 Thread Kenneth A. Lloyd
Yeah, I forgot that pure ANSI C doesn't really have namespaces, other than
to fully qualify modules and variables. Bummer.

Makes writing large, maintainable middleware more difficult.

-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Kenneth A.
Lloyd
Sent: Thursday, July 31, 2014 6:04 AM
To: 'Open MPI Developers'
Subject: Re: [OMPI devel] mca_PROJECT_FRAMEWORK_COMPONENT_symbol vs.
mca_FRAMEWORK_COMPONENT_symbol

Doesn't namespacing obviate the need for this convoluted identifier scheme?
See, for example, UML package import and include behaviors.

-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Dave Goodell
(dgoodell)
Sent: Wednesday, July 30, 2014 3:35 PM
To: Open MPI Developers
Subject: [OMPI devel] mca_PROJECT_FRAMEWORK_COMPONENT_symbol vs.
mca_FRAMEWORK_COMPONENT_symbol

Jeff and I were talking about some namespacing issues that have come up in
the recent BTL move from OMPI to OPAL.  AFAIK, the current system for
namespacing external symbols is to name them
"mca_FRAMEWORK_COMPONENT_symbol" (e.g., "mca_btl_tcp_add_procs" in the tcp
BTL).  Similarly, the DSO for the component is named
"mca_FRAMEWORK_COMPONENT.so" (e.g., "mca_btl_tcp.so").

Jeff asserted that the eventual goal is to move to a system where all MCA
frameworks/components are also prefixed by the project name.  So the above
examples become "mca_ompi_btl_tcp_add_procs" and "mca_ompi_btl_tcp.so".
Does anyone actually care about pursuing this goal?

I ask because if nobody wants to pursue the goal of adding project names to
namespaces then I already have an easy solution to most of our namespacing
problems.  OTOH, if someone does wish to pursue that goal, then I have a
namespace-related RFC that I would like to propose (in a subsequent email).

-Dave

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/07/15371.php


-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2014.0.4716 / Virus Database: 3986/7949 - Release Date: 07/30/14

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/07/15392.php


-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2014.0.4716 / Virus Database: 3986/7951 - Release Date: 07/30/14



Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Ralph Castain
Fair enough - yeah, that is an issue I've been avoiding :-)

On Jul 31, 2014, at 9:14 AM, Nathan Hjelm  wrote:

> 
> This approach will work now but we need to start thinking about how we
> want to support multiple simultaneous btl users. Does each user call
> add_procs with a single module (or set of modules) or does each user
> call btl_component_init and get their own module? If we do the latter
> then it might make sense to add a max_procs argument to the
> btl_component_init. Keep in mind we need to change the
> btl_component_init interface anyway because the threading arguments no
> longer make sense in their current form.
> 
> -Nathan
> 
> On Thu, Jul 31, 2014 at 09:04:09AM -0700, Ralph Castain wrote:
>> Like I said, why don't we just do the following:
>> 
>>> I'd like to suggest an alternative solution. A BTL can exploit whatever 
>>> data it wants, but should first test if the data is available. If the data 
>>> is *required*, then the BTL gracefully disqualifies itself. If the data is 
>>> *desirable* for optimization, then the BTL writer (if they choose) can 
>>> include an alternate path that doesn't do the optimization if the data 
>>> isn't available.
>> 
>> Seems like this should resolve the disagreement in a way that meets 
>> everyone's need. It basically is an attribute approach, but not requiring 
>> modification of the BTL interface.
>> 
>> 
>> On Jul 31, 2014, at 8:26 AM, Pritchard Jr., Howard  wrote:
>> 
>>> Hi  George,
>>> 
>>> The ompi_process_info.num_procs thing that seems to have been an object
>>> of some contention yesterday.
>>> 
>>> The ugni use of this is cloned off of the way I designed the mpich  netmod.
>>> Leveraging off size of the job was an easy way to scale the mailbox size.
>>> 
>>> If I'd been asked to have the netmod work in a context like it appears we
>>> may want to be eventually using BTLs - not just within ompi but for other
>>> things, I'd have worked with Darius (if still in mpich world) on changing 
>>> the netmod initialization
>>> method to allow for an optional attributes struct to be passed into the 
>>> init 
>>> method to give hints about how many connections may need to be established,
>>> etc.  
>>> 
>>> For the GNI BTL - the way its currently designed - if you are only expecting
>>> to use it for a limited number of connections, then you want to initialize
>>> for big mailboxes (IBer's can think many large buffers posted as RX WQEs).
>>> But for very large jobs, with possibly highly connected communication 
>>> pattern,
>>> you want very small mailboxes.
>>> 
>>> Howard
>>> 
>>> 
>>> -Original Message-
>>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George Bosilca
>>> Sent: Thursday, July 31, 2014 9:09 AM
>>> To: Open MPI Developers
>>> Subject: Re: [OMPI devel] RFC: job size info in OPAL
>>> 
>>> What is your definition of "global job size"?
>>> 
>>> George.
>>> 
>>> On Jul 31, 2014, at 11:06 , Pritchard Jr., Howard  wrote:
>>> 
 Hi Folks,
 
 I think given the way we want to use the btl's in lower levels like 
 opal, it is pretty disgusting for a btl to need to figure out on its 
 own something like a "global job size".  That's not its business.  
 Can't we add some attributes to the component's initialization method 
 that provides hints for how to allocate resources it needs to provide its 
 functionality?
 
 I'll see if there's something clever that can be done in ugni for now.
 I can always add in a hack to probe the apps placement info file and 
 scale the smsg blocks by number of nodes rather than number of ranks.
 
 Howard
 
 
 -Original Message-
 From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan 
 Hjelm
 Sent: Thursday, July 31, 2014 8:58 AM
 To: Open MPI Developers
 Subject: Re: [OMPI devel] RFC: job size info in OPAL
 
 
 +2^1000
 
 This information is absolutely necessary at this point. If someone has a 
 better solution they can provide it as an alternative RFC. Until then this 
 is how it should be done... Otherwise we loose uGNI support on the trunk. 
 Because we ARE NOT going to remove the mailbox size optimization.
 
 -Nathan
 
 On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote:
> WHAT: Should we make the job size (i.e., initial number of procs) 
> available in OPAL?
> 
> WHY: At least 2 BTLs are using this info (*more below)
> 
> WHERE: usnic and ugni
> 
> TIMEOUT: there's already been some inflammatory emails about this; 
> let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
> 
> MORE DETAIL:
> 
> This is an open question.  We *have* the information at the time that the 
> BTLs are initialized: do we allow that information to go down to OPAL?
> 
> Ralph added this info down in OPAL in r32355, but 

Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Nathan Hjelm

This approach will work now but we need to start thinking about how we
want to support multiple simultaneous btl users. Does each user call
add_procs with a single module (or set of modules) or does each user
call btl_component_init and get their own module? If we do the latter
then it might make sense to add a max_procs argument to the
btl_component_init. Keep in mind we need to change the
btl_component_init interface anyway because the threading arguments no
longer make sense in their current form.

-Nathan

On Thu, Jul 31, 2014 at 09:04:09AM -0700, Ralph Castain wrote:
> Like I said, why don't we just do the following:
> 
> > I'd like to suggest an alternative solution. A BTL can exploit whatever 
> > data it wants, but should first test if the data is available. If the data 
> > is *required*, then the BTL gracefully disqualifies itself. If the data is 
> > *desirable* for optimization, then the BTL writer (if they choose) can 
> > include an alternate path that doesn't do the optimization if the data 
> > isn't available.
> 
> Seems like this should resolve the disagreement in a way that meets 
> everyone's need. It basically is an attribute approach, but not requiring 
> modification of the BTL interface.
> 
> 
> On Jul 31, 2014, at 8:26 AM, Pritchard Jr., Howard  wrote:
> 
> > Hi  George,
> > 
> > The ompi_process_info.num_procs thing that seems to have been an object
> > of some contention yesterday.
> > 
> > The ugni use of this is cloned off of the way I designed the mpich  netmod.
> > Leveraging off size of the job was an easy way to scale the mailbox size.
> > 
> > If I'd been asked to have the netmod work in a context like it appears we
> > may want to be eventually using BTLs - not just within ompi but for other
> > things, I'd have worked with Darius (if still in mpich world) on changing 
> > the netmod initialization
> > method to allow for an optional attributes struct to be passed into the 
> > init 
> > method to give hints about how many connections may need to be established,
> > etc.  
> > 
> > For the GNI BTL - the way its currently designed - if you are only expecting
> > to use it for a limited number of connections, then you want to initialize
> > for big mailboxes (IBer's can think many large buffers posted as RX WQEs).
> > But for very large jobs, with possibly highly connected communication 
> > pattern,
> > you want very small mailboxes.
> > 
> > Howard
> > 
> > 
> > -Original Message-
> > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George Bosilca
> > Sent: Thursday, July 31, 2014 9:09 AM
> > To: Open MPI Developers
> > Subject: Re: [OMPI devel] RFC: job size info in OPAL
> > 
> > What is your definition of "global job size"?
> > 
> >  George.
> > 
> > On Jul 31, 2014, at 11:06 , Pritchard Jr., Howard  wrote:
> > 
> >> Hi Folks,
> >> 
> >> I think given the way we want to use the btl's in lower levels like 
> >> opal, it is pretty disgusting for a btl to need to figure out on its 
> >> own something like a "global job size".  That's not its business.  
> >> Can't we add some attributes to the component's initialization method 
> >> that provides hints for how to allocate resources it needs to provide its 
> >> functionality?
> >> 
> >> I'll see if there's something clever that can be done in ugni for now.
> >> I can always add in a hack to probe the apps placement info file and 
> >> scale the smsg blocks by number of nodes rather than number of ranks.
> >> 
> >> Howard
> >> 
> >> 
> >> -Original Message-
> >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan 
> >> Hjelm
> >> Sent: Thursday, July 31, 2014 8:58 AM
> >> To: Open MPI Developers
> >> Subject: Re: [OMPI devel] RFC: job size info in OPAL
> >> 
> >> 
> >> +2^1000
> >> 
> >> This information is absolutely necessary at this point. If someone has a 
> >> better solution they can provide it as an alternative RFC. Until then this 
> >> is how it should be done... Otherwise we loose uGNI support on the trunk. 
> >> Because we ARE NOT going to remove the mailbox size optimization.
> >> 
> >> -Nathan
> >> 
> >> On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote:
> >>> WHAT: Should we make the job size (i.e., initial number of procs) 
> >>> available in OPAL?
> >>> 
> >>> WHY: At least 2 BTLs are using this info (*more below)
> >>> 
> >>> WHERE: usnic and ugni
> >>> 
> >>> TIMEOUT: there's already been some inflammatory emails about this; 
> >>> let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
> >>> 
> >>> MORE DETAIL:
> >>> 
> >>> This is an open question.  We *have* the information at the time that the 
> >>> BTLs are initialized: do we allow that information to go down to OPAL?
> >>> 
> >>> Ralph added this info down in OPAL in r32355, but George reverted it in 
> >>> r32361.
> >>> 
> >>> Points for: YES, WE SHOULD
> >>> +++ 2 BTLs were using it (usinc, ugni) Other RTE job-related info are 
> >>> +++ 

Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Ralph Castain
Like I said, why don't we just do the following:

> I'd like to suggest an alternative solution. A BTL can exploit whatever data 
> it wants, but should first test if the data is available. If the data is 
> *required*, then the BTL gracefully disqualifies itself. If the data is 
> *desirable* for optimization, then the BTL writer (if they choose) can 
> include an alternate path that doesn't do the optimization if the data isn't 
> available.

Seems like this should resolve the disagreement in a way that meets everyone's 
need. It basically is an attribute approach, but not requiring modification of 
the BTL interface.


On Jul 31, 2014, at 8:26 AM, Pritchard Jr., Howard  wrote:

> Hi  George,
> 
> The ompi_process_info.num_procs thing that seems to have been an object
> of some contention yesterday.
> 
> The ugni use of this is cloned off of the way I designed the mpich  netmod.
> Leveraging off size of the job was an easy way to scale the mailbox size.
> 
> If I'd been asked to have the netmod work in a context like it appears we
> may want to be eventually using BTLs - not just within ompi but for other
> things, I'd have worked with Darius (if still in mpich world) on changing the 
> netmod initialization
> method to allow for an optional attributes struct to be passed into the init 
> method to give hints about how many connections may need to be established,
> etc.  
> 
> For the GNI BTL - the way its currently designed - if you are only expecting
> to use it for a limited number of connections, then you want to initialize
> for big mailboxes (IBer's can think many large buffers posted as RX WQEs).
> But for very large jobs, with possibly highly connected communication pattern,
> you want very small mailboxes.
> 
> Howard
> 
> 
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George Bosilca
> Sent: Thursday, July 31, 2014 9:09 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] RFC: job size info in OPAL
> 
> What is your definition of "global job size"?
> 
>  George.
> 
> On Jul 31, 2014, at 11:06 , Pritchard Jr., Howard  wrote:
> 
>> Hi Folks,
>> 
>> I think given the way we want to use the btl's in lower levels like 
>> opal, it is pretty disgusting for a btl to need to figure out on its 
>> own something like a "global job size".  That's not its business.  
>> Can't we add some attributes to the component's initialization method 
>> that provides hints for how to allocate resources it needs to provide its 
>> functionality?
>> 
>> I'll see if there's something clever that can be done in ugni for now.
>> I can always add in a hack to probe the apps placement info file and 
>> scale the smsg blocks by number of nodes rather than number of ranks.
>> 
>> Howard
>> 
>> 
>> -Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan 
>> Hjelm
>> Sent: Thursday, July 31, 2014 8:58 AM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] RFC: job size info in OPAL
>> 
>> 
>> +2^1000
>> 
>> This information is absolutely necessary at this point. If someone has a 
>> better solution they can provide it as an alternative RFC. Until then this 
>> is how it should be done... Otherwise we loose uGNI support on the trunk. 
>> Because we ARE NOT going to remove the mailbox size optimization.
>> 
>> -Nathan
>> 
>> On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote:
>>> WHAT: Should we make the job size (i.e., initial number of procs) available 
>>> in OPAL?
>>> 
>>> WHY: At least 2 BTLs are using this info (*more below)
>>> 
>>> WHERE: usnic and ugni
>>> 
>>> TIMEOUT: there's already been some inflammatory emails about this; 
>>> let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
>>> 
>>> MORE DETAIL:
>>> 
>>> This is an open question.  We *have* the information at the time that the 
>>> BTLs are initialized: do we allow that information to go down to OPAL?
>>> 
>>> Ralph added this info down in OPAL in r32355, but George reverted it in 
>>> r32361.
>>> 
>>> Points for: YES, WE SHOULD
>>> +++ 2 BTLs were using it (usinc, ugni) Other RTE job-related info are 
>>> +++ already in OPAL (num local ranks, local rank)
>>> 
>>> Points for: NO, WE SHOULD NOT
>>> --- What exactly is this number (e.g., num currently-connected procs?), and 
>>> when is it updated?
>>> --- We need to precisely delineate what belongs in OPAL vs. 
>>> above-OPAL
>>> 
>>> FWIW: here's how ompi_process_info.num_procs was used before the BTL move 
>>> down to OPAL:
>>> 
>>> - usnic: for a minor latency optimization / sizing of a shared 
>>> receive buffer queue length, and for the initial size of a peer 
>>> lookup hash
>>> - ugni: to determine the size of the per-peer buffers used for 
>>> send/recv communication
>>> 
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to: 
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> 

Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Nathan Hjelm
I do not like the fact that add_procs is called with every proc in the
MPI_COMM_WORLD. That needs to change, so, I will not rely on the number
of procs being added being the same as the world or universe size.

-Nathan

On Thu, Jul 31, 2014 at 09:22:00AM -0600, George Bosilca wrote:
>I definitively think you misunderstood this scope of this RFC. The
>information that is so important to you to configure the mailbox size is
>available to you when you need it. This information is made available by
>the PML through the call to add_procs, which comes with all the procs in
>the MPI_COMM_WORLD. So, ugni doesn't need anything more than it is
>available today. [This is of course under the assumption that someone
>clean the BTL and remove the usage of MPI_COMM_WORLD.]
> 
>The real scope of this RFC is to move this information before that in
>order to allow the BTLs to have access to some possible number of
>processes between the call to btl_open and the call to btl_all_proc (in
>other words during btl_init).
> 
>  George.
> 
>PS: here is the patch that fixes all issues in ugni.
> 
>On Jul 31, 2014, at 10:58 , Nathan Hjelm  wrote:
> 
>>
>> +2^1000
>>
>> This information is absolutely necessary at this point. If someone has a
>> better solution they can provide it as an alternative RFC. Until then
>> this is how it should be done... Otherwise we loose uGNI support on the
>> trunk. Because we ARE NOT going to remove the mailbox size optimization.
>>
>> -Nathan
>>
>> On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote:
>>> WHAT: Should we make the job size (i.e., initial number of procs)
>available in OPAL?
>>>
>>> WHY: At least 2 BTLs are using this info (*more below)
>>>
>>> WHERE: usnic and ugni
>>>
>>> TIMEOUT: there's already been some inflammatory emails about this;
>let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
>>>
>>> MORE DETAIL:
>>>
>>> This is an open question.  We *have* the information at the time that
>the BTLs are initialized: do we allow that information to go down to OPAL?
>>>
>>> Ralph added this info down in OPAL in r32355, but George reverted it in
>r32361.
>>>
>>> Points for: YES, WE SHOULD
>>> +++ 2 BTLs were using it (usinc, ugni)
>>> +++ Other RTE job-related info are already in OPAL (num local ranks,
>local rank)
>>>
>>> Points for: NO, WE SHOULD NOT
>>> --- What exactly is this number (e.g., num currently-connected procs?),
>and when is it updated?
>>> --- We need to precisely delineate what belongs in OPAL vs. above-OPAL
>>>
>>> FWIW: here's how ompi_process_info.num_procs was used before the BTL
>move down to OPAL:
>>>
>>> - usnic: for a minor latency optimization / sizing of a shared receive
>buffer queue length, and for the initial size of a peer lookup hash
>>> - ugni: to determine the size of the per-peer buffers used for
>send/recv communication
>>>
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>http://www.open-mpi.org/community/lists/devel/2014/07/15373.php
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>http://www.open-mpi.org/community/lists/devel/2014/07/15394.php
> 
>___
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Link to this post:
>http://www.open-mpi.org/community/lists/devel/2014/07/15399.php




pgpo6WjkLZPnT.pgp
Description: PGP signature


Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Pritchard Jr., Howard
Hi  George,

The ompi_process_info.num_procs thing that seems to have been an object
of some contention yesterday.

The ugni use of this is cloned off of the way I designed the mpich  netmod.
Leveraging off size of the job was an easy way to scale the mailbox size.

If I'd been asked to have the netmod work in a context like it appears we
may want to be eventually using BTLs - not just within ompi but for other
things, I'd have worked with Darius (if still in mpich world) on changing the 
netmod initialization
method to allow for an optional attributes struct to be passed into the init 
method to give hints about how many connections may need to be established,
etc.  
 
For the GNI BTL - the way its currently designed - if you are only expecting
to use it for a limited number of connections, then you want to initialize
for big mailboxes (IBer's can think many large buffers posted as RX WQEs).
But for very large jobs, with possibly highly connected communication pattern,
you want very small mailboxes.

Howard


-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George Bosilca
Sent: Thursday, July 31, 2014 9:09 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] RFC: job size info in OPAL

What is your definition of "global job size"?

  George.

On Jul 31, 2014, at 11:06 , Pritchard Jr., Howard  wrote:

> Hi Folks,
> 
> I think given the way we want to use the btl's in lower levels like 
> opal, it is pretty disgusting for a btl to need to figure out on its 
> own something like a "global job size".  That's not its business.  
> Can't we add some attributes to the component's initialization method 
> that provides hints for how to allocate resources it needs to provide its 
> functionality?
> 
> I'll see if there's something clever that can be done in ugni for now.
> I can always add in a hack to probe the apps placement info file and 
> scale the smsg blocks by number of nodes rather than number of ranks.
> 
> Howard
> 
> 
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan 
> Hjelm
> Sent: Thursday, July 31, 2014 8:58 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] RFC: job size info in OPAL
> 
> 
> +2^1000
> 
> This information is absolutely necessary at this point. If someone has a 
> better solution they can provide it as an alternative RFC. Until then this is 
> how it should be done... Otherwise we loose uGNI support on the trunk. 
> Because we ARE NOT going to remove the mailbox size optimization.
> 
> -Nathan
> 
> On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote:
>> WHAT: Should we make the job size (i.e., initial number of procs) available 
>> in OPAL?
>> 
>> WHY: At least 2 BTLs are using this info (*more below)
>> 
>> WHERE: usnic and ugni
>> 
>> TIMEOUT: there's already been some inflammatory emails about this; 
>> let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
>> 
>> MORE DETAIL:
>> 
>> This is an open question.  We *have* the information at the time that the 
>> BTLs are initialized: do we allow that information to go down to OPAL?
>> 
>> Ralph added this info down in OPAL in r32355, but George reverted it in 
>> r32361.
>> 
>> Points for: YES, WE SHOULD
>> +++ 2 BTLs were using it (usinc, ugni) Other RTE job-related info are 
>> +++ already in OPAL (num local ranks, local rank)
>> 
>> Points for: NO, WE SHOULD NOT
>> --- What exactly is this number (e.g., num currently-connected procs?), and 
>> when is it updated?
>> --- We need to precisely delineate what belongs in OPAL vs. 
>> above-OPAL
>> 
>> FWIW: here's how ompi_process_info.num_procs was used before the BTL move 
>> down to OPAL:
>> 
>> - usnic: for a minor latency optimization / sizing of a shared 
>> receive buffer queue length, and for the initial size of a peer 
>> lookup hash
>> - ugni: to determine the size of the per-peer buffers used for 
>> send/recv communication
>> 
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15395.php

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/07/15396.php


Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread George Bosilca
I definitively think you misunderstood this scope of this RFC. The information 
that is so important to you to configure the mailbox size is available to you 
when you need it. This information is made available by the PML through the 
call to add_procs, which comes with all the procs in the MPI_COMM_WORLD. So, 
ugni doesn’t need anything more than it is available today. [This is of course 
under the assumption that someone clean the BTL and remove the usage of 
MPI_COMM_WORLD.]

The real scope of this RFC is to move this information before that in order to 
allow the BTLs to have access to some possible number of processes between the 
call to btl_open and the call to btl_all_proc (in other words during btl_init).

  George.

PS: here is the patch that fixes all issues in ugni.



ugni.patch
Description: Binary data

On Jul 31, 2014, at 10:58 , Nathan Hjelm  wrote:

> 
> +2^1000
> 
> This information is absolutely necessary at this point. If someone has a
> better solution they can provide it as an alternative RFC. Until then
> this is how it should be done... Otherwise we loose uGNI support on the
> trunk. Because we ARE NOT going to remove the mailbox size optimization.
> 
> -Nathan
> 
> On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote:
>> WHAT: Should we make the job size (i.e., initial number of procs) available 
>> in OPAL?
>> 
>> WHY: At least 2 BTLs are using this info (*more below)
>> 
>> WHERE: usnic and ugni
>> 
>> TIMEOUT: there's already been some inflammatory emails about this; let's 
>> discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
>> 
>> MORE DETAIL:
>> 
>> This is an open question.  We *have* the information at the time that the 
>> BTLs are initialized: do we allow that information to go down to OPAL?
>> 
>> Ralph added this info down in OPAL in r32355, but George reverted it in 
>> r32361.
>> 
>> Points for: YES, WE SHOULD
>> +++ 2 BTLs were using it (usinc, ugni)
>> +++ Other RTE job-related info are already in OPAL (num local ranks, local 
>> rank)
>> 
>> Points for: NO, WE SHOULD NOT
>> --- What exactly is this number (e.g., num currently-connected procs?), and 
>> when is it updated?
>> --- We need to precisely delineate what belongs in OPAL vs. above-OPAL
>> 
>> FWIW: here's how ompi_process_info.num_procs was used before the BTL move 
>> down to OPAL:
>> 
>> - usnic: for a minor latency optimization / sizing of a shared receive 
>> buffer queue length, and for the initial size of a peer lookup hash
>> - ugni: to determine the size of the per-peer buffers used for send/recv 
>> communication
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15394.php



Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Nathan Hjelm

The maximum number of peer processes that may be added over the course
of the job will suffice. So either the world or universe size. This is a
reasonable piece of information to expect the upper layers to provide to
the communication layer.

And the impact of providing this information is no less intrusive than
providing information like the number of local ranks.

-Nathan

On Thu, Jul 31, 2014 at 11:09:24AM -0400, George Bosilca wrote:
> What is your definition of “global job size”?
> 
>   George.
> 
> On Jul 31, 2014, at 11:06 , Pritchard Jr., Howard  wrote:
> 
> > Hi Folks,
> > 
> > I think given the way we want to use the btl's in lower levels like opal,
> > it is pretty disgusting for a btl to need to figure out on its own something
> > like a "global job size".  That's not its business.  Can't we add some 
> > attributes
> > to the component's initialization method that provides hints for how to
> > allocate resources it needs to provide its functionality?
> > 
> > I'll see if there's something clever that can be done in ugni for now.
> > I can always add in a hack to probe the apps placement info file and
> > scale the smsg blocks by number of nodes rather than number of ranks.
> > 
> > Howard
> > 
> > 
> > -Original Message-
> > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm
> > Sent: Thursday, July 31, 2014 8:58 AM
> > To: Open MPI Developers
> > Subject: Re: [OMPI devel] RFC: job size info in OPAL
> > 
> > 
> > +2^1000
> > 
> > This information is absolutely necessary at this point. If someone has a 
> > better solution they can provide it as an alternative RFC. Until then this 
> > is how it should be done... Otherwise we loose uGNI support on the trunk. 
> > Because we ARE NOT going to remove the mailbox size optimization.
> > 
> > -Nathan
> > 
> > On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote:
> >> WHAT: Should we make the job size (i.e., initial number of procs) 
> >> available in OPAL?
> >> 
> >> WHY: At least 2 BTLs are using this info (*more below)
> >> 
> >> WHERE: usnic and ugni
> >> 
> >> TIMEOUT: there's already been some inflammatory emails about this; 
> >> let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
> >> 
> >> MORE DETAIL:
> >> 
> >> This is an open question.  We *have* the information at the time that the 
> >> BTLs are initialized: do we allow that information to go down to OPAL?
> >> 
> >> Ralph added this info down in OPAL in r32355, but George reverted it in 
> >> r32361.
> >> 
> >> Points for: YES, WE SHOULD
> >> +++ 2 BTLs were using it (usinc, ugni) Other RTE job-related info are 
> >> +++ already in OPAL (num local ranks, local rank)
> >> 
> >> Points for: NO, WE SHOULD NOT
> >> --- What exactly is this number (e.g., num currently-connected procs?), 
> >> and when is it updated?
> >> --- We need to precisely delineate what belongs in OPAL vs. above-OPAL
> >> 
> >> FWIW: here's how ompi_process_info.num_procs was used before the BTL move 
> >> down to OPAL:
> >> 
> >> - usnic: for a minor latency optimization / sizing of a shared receive 
> >> buffer queue length, and for the initial size of a peer lookup hash
> >> - ugni: to determine the size of the per-peer buffers used for 
> >> send/recv communication
> >> 
> >> --
> >> Jeff Squyres
> >> jsquy...@cisco.com
> >> For corporate legal information go to: 
> >> http://www.cisco.com/web/about/doing_business/legal/cri/
> >> 
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post: 
> >> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2014/07/15395.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15396.php


pgpvqqekN2qM3.pgp
Description: PGP signature


Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Ralph Castain
I'd like to suggest an alternative solution. A BTL can exploit whatever data it 
wants, but should first test if the data is available. If the data is 
*required*, then the BTL gracefully disqualifies itself. If the data is 
*desirable* for optimization, then the BTL writer (if they choose) can include 
an alternate path that doesn't do the optimization if the data isn't available.

This would seem to meet everyone's needs, yes?


On Jul 31, 2014, at 8:09 AM, George Bosilca  wrote:

> What is your definition of “global job size”?
> 
>  George.
> 
> On Jul 31, 2014, at 11:06 , Pritchard Jr., Howard  wrote:
> 
>> Hi Folks,
>> 
>> I think given the way we want to use the btl's in lower levels like opal,
>> it is pretty disgusting for a btl to need to figure out on its own something
>> like a "global job size".  That's not its business.  Can't we add some 
>> attributes
>> to the component's initialization method that provides hints for how to
>> allocate resources it needs to provide its functionality?
>> 
>> I'll see if there's something clever that can be done in ugni for now.
>> I can always add in a hack to probe the apps placement info file and
>> scale the smsg blocks by number of nodes rather than number of ranks.
>> 
>> Howard
>> 
>> 
>> -Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm
>> Sent: Thursday, July 31, 2014 8:58 AM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] RFC: job size info in OPAL
>> 
>> 
>> +2^1000
>> 
>> This information is absolutely necessary at this point. If someone has a 
>> better solution they can provide it as an alternative RFC. Until then this 
>> is how it should be done... Otherwise we loose uGNI support on the trunk. 
>> Because we ARE NOT going to remove the mailbox size optimization.
>> 
>> -Nathan
>> 
>> On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote:
>>> WHAT: Should we make the job size (i.e., initial number of procs) available 
>>> in OPAL?
>>> 
>>> WHY: At least 2 BTLs are using this info (*more below)
>>> 
>>> WHERE: usnic and ugni
>>> 
>>> TIMEOUT: there's already been some inflammatory emails about this; 
>>> let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
>>> 
>>> MORE DETAIL:
>>> 
>>> This is an open question.  We *have* the information at the time that the 
>>> BTLs are initialized: do we allow that information to go down to OPAL?
>>> 
>>> Ralph added this info down in OPAL in r32355, but George reverted it in 
>>> r32361.
>>> 
>>> Points for: YES, WE SHOULD
>>> +++ 2 BTLs were using it (usinc, ugni) Other RTE job-related info are 
>>> +++ already in OPAL (num local ranks, local rank)
>>> 
>>> Points for: NO, WE SHOULD NOT
>>> --- What exactly is this number (e.g., num currently-connected procs?), and 
>>> when is it updated?
>>> --- We need to precisely delineate what belongs in OPAL vs. above-OPAL
>>> 
>>> FWIW: here's how ompi_process_info.num_procs was used before the BTL move 
>>> down to OPAL:
>>> 
>>> - usnic: for a minor latency optimization / sizing of a shared receive 
>>> buffer queue length, and for the initial size of a peer lookup hash
>>> - ugni: to determine the size of the per-peer buffers used for 
>>> send/recv communication
>>> 
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to: 
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15395.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15396.php



Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread George Bosilca
What is your definition of “global job size”?

  George.

On Jul 31, 2014, at 11:06 , Pritchard Jr., Howard  wrote:

> Hi Folks,
> 
> I think given the way we want to use the btl's in lower levels like opal,
> it is pretty disgusting for a btl to need to figure out on its own something
> like a "global job size".  That's not its business.  Can't we add some 
> attributes
> to the component's initialization method that provides hints for how to
> allocate resources it needs to provide its functionality?
> 
> I'll see if there's something clever that can be done in ugni for now.
> I can always add in a hack to probe the apps placement info file and
> scale the smsg blocks by number of nodes rather than number of ranks.
> 
> Howard
> 
> 
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm
> Sent: Thursday, July 31, 2014 8:58 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] RFC: job size info in OPAL
> 
> 
> +2^1000
> 
> This information is absolutely necessary at this point. If someone has a 
> better solution they can provide it as an alternative RFC. Until then this is 
> how it should be done... Otherwise we loose uGNI support on the trunk. 
> Because we ARE NOT going to remove the mailbox size optimization.
> 
> -Nathan
> 
> On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote:
>> WHAT: Should we make the job size (i.e., initial number of procs) available 
>> in OPAL?
>> 
>> WHY: At least 2 BTLs are using this info (*more below)
>> 
>> WHERE: usnic and ugni
>> 
>> TIMEOUT: there's already been some inflammatory emails about this; 
>> let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
>> 
>> MORE DETAIL:
>> 
>> This is an open question.  We *have* the information at the time that the 
>> BTLs are initialized: do we allow that information to go down to OPAL?
>> 
>> Ralph added this info down in OPAL in r32355, but George reverted it in 
>> r32361.
>> 
>> Points for: YES, WE SHOULD
>> +++ 2 BTLs were using it (usinc, ugni) Other RTE job-related info are 
>> +++ already in OPAL (num local ranks, local rank)
>> 
>> Points for: NO, WE SHOULD NOT
>> --- What exactly is this number (e.g., num currently-connected procs?), and 
>> when is it updated?
>> --- We need to precisely delineate what belongs in OPAL vs. above-OPAL
>> 
>> FWIW: here's how ompi_process_info.num_procs was used before the BTL move 
>> down to OPAL:
>> 
>> - usnic: for a minor latency optimization / sizing of a shared receive 
>> buffer queue length, and for the initial size of a peer lookup hash
>> - ugni: to determine the size of the per-peer buffers used for 
>> send/recv communication
>> 
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15395.php



Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Pritchard Jr., Howard
Hi Folks,

I think given the way we want to use the btl's in lower levels like opal,
it is pretty disgusting for a btl to need to figure out on its own something
like a "global job size".  That's not its business.  Can't we add some 
attributes
to the component's initialization method that provides hints for how to
allocate resources it needs to provide its functionality?

I'll see if there's something clever that can be done in ugni for now.
I can always add in a hack to probe the apps placement info file and
scale the smsg blocks by number of nodes rather than number of ranks.

Howard


-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm
Sent: Thursday, July 31, 2014 8:58 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] RFC: job size info in OPAL


+2^1000

This information is absolutely necessary at this point. If someone has a better 
solution they can provide it as an alternative RFC. Until then this is how it 
should be done... Otherwise we loose uGNI support on the trunk. Because we ARE 
NOT going to remove the mailbox size optimization.

-Nathan

On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote:
> WHAT: Should we make the job size (i.e., initial number of procs) available 
> in OPAL?
> 
> WHY: At least 2 BTLs are using this info (*more below)
> 
> WHERE: usnic and ugni
> 
> TIMEOUT: there's already been some inflammatory emails about this; 
> let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
> 
> MORE DETAIL:
> 
> This is an open question.  We *have* the information at the time that the 
> BTLs are initialized: do we allow that information to go down to OPAL?
> 
> Ralph added this info down in OPAL in r32355, but George reverted it in 
> r32361.
> 
> Points for: YES, WE SHOULD
> +++ 2 BTLs were using it (usinc, ugni) Other RTE job-related info are 
> +++ already in OPAL (num local ranks, local rank)
> 
> Points for: NO, WE SHOULD NOT
> --- What exactly is this number (e.g., num currently-connected procs?), and 
> when is it updated?
> --- We need to precisely delineate what belongs in OPAL vs. above-OPAL
> 
> FWIW: here's how ompi_process_info.num_procs was used before the BTL move 
> down to OPAL:
> 
> - usnic: for a minor latency optimization / sizing of a shared receive 
> buffer queue length, and for the initial size of a peer lookup hash
> - ugni: to determine the size of the per-peer buffers used for 
> send/recv communication
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php


Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Nathan Hjelm

+2^1000

This information is absolutely necessary at this point. If someone has a
better solution they can provide it as an alternative RFC. Until then
this is how it should be done... Otherwise we loose uGNI support on the
trunk. Because we ARE NOT going to remove the mailbox size optimization.

-Nathan

On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote:
> WHAT: Should we make the job size (i.e., initial number of procs) available 
> in OPAL?
> 
> WHY: At least 2 BTLs are using this info (*more below)
> 
> WHERE: usnic and ugni
> 
> TIMEOUT: there's already been some inflammatory emails about this; let's 
> discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
> 
> MORE DETAIL:
> 
> This is an open question.  We *have* the information at the time that the 
> BTLs are initialized: do we allow that information to go down to OPAL?
> 
> Ralph added this info down in OPAL in r32355, but George reverted it in 
> r32361.
> 
> Points for: YES, WE SHOULD
> +++ 2 BTLs were using it (usinc, ugni)
> +++ Other RTE job-related info are already in OPAL (num local ranks, local 
> rank)
> 
> Points for: NO, WE SHOULD NOT
> --- What exactly is this number (e.g., num currently-connected procs?), and 
> when is it updated?
> --- We need to precisely delineate what belongs in OPAL vs. above-OPAL
> 
> FWIW: here's how ompi_process_info.num_procs was used before the BTL move 
> down to OPAL:
> 
> - usnic: for a minor latency optimization / sizing of a shared receive buffer 
> queue length, and for the initial size of a peer lookup hash
> - ugni: to determine the size of the per-peer buffers used for send/recv 
> communication
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php


pgpRELGUpwpHm.pgp
Description: PGP signature


[OMPI devel] RFC: Change default behavior of calling ibv_fork_init

2014-07-31 Thread Rolf vandeVaart
WHAT: Change default behavior in openib to not call ibv_fork_init() even if 
available.
WHY: There are some strange interactions with ummunotify that cause errors.  In 
addition, see the additional points below.
WHEN: After next weekly meeting, August 5, 2014
DETAILS:  This change will just be a couple of lines.  Current default behavior 
is to call ibv_fork_init() if support exists. New default behavior is to call 
it only if asked for.
Essentially, default setting of btl_openib_want_fork_support will change from 
-1 (use it if available) to 0 (do not use unless asked for)


Here are details from an earlier post last year.  
http://www.open-mpi.org/community/lists/devel/2013/12/13395.php
Subject: [OMPI devel] RFC: Calling ibv_fork_init() in the openib BTL
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
List-Post: devel@lists.open-mpi.org
Date: 2013-12-06 10:15:02
To those who care about the openib BTL...
SHORT VERSION
-
Do you really want to call ibv_fork_init() in the openib BTL by default?
MORE DETAIL
---
Rolf V. pointed out to me yesterday that we're calling ibv_fork_init() in the 
openib BTL. He asked if we did the same in the usnic BTL. We don't, and here's 
why:
1. it adds a slight performance penalty for ibv_reg_mr/ibv_dereg_mr
2. the only thing ibv_fork_init() protects against is the child sending from 
memory that it thinks should already be registered:
-
MPI_Init(...)
if (0 == fork()) {
ibv_post_send(some_previously_pinned_buffer, ...);
// ^^ this can't work because the buffer is *not* pinned in the child
// (for lack of a longer explanation here)
}
-
3. ibv_fork_init() is not intended to protect against a child invoking an MPI 
function (if they do that; they get what they deserve!).
Note that #2 can't happen, because MPI doesn't expose its protection domains, 
queue pairs, or registrations (or any of its verbs constructs) at all.
Hence, all ibv_fork_init() does is a) impose a performance penalty, and b) make 
memory physically unavailable in a child process, such that:

ibv_fork_init();
a = malloc(...);
a[0] = 17;
ibv_reg_mr(a, ...);
if (0 == fork()) {
printf("this is a[0]: %d\n", a[0]);
// ^^ This will segv
}
-
But the registered memory may actually be useful in the child.
So I just thought I'd pass this along, and ask the openib-caring people of the 
world if you really still want to be calling ibv_fork_init() by default in the 
openib BTL.
--
Jeff Squyres


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] mca_PROJECT_FRAMEWORK_COMPONENT_symbol vs. mca_FRAMEWORK_COMPONENT_symbol

2014-07-31 Thread Kenneth A. Lloyd
Doesn't namespacing obviate the need for this convoluted identifier scheme?
See, for example, UML package import and include behaviors.

-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Dave Goodell
(dgoodell)
Sent: Wednesday, July 30, 2014 3:35 PM
To: Open MPI Developers
Subject: [OMPI devel] mca_PROJECT_FRAMEWORK_COMPONENT_symbol vs.
mca_FRAMEWORK_COMPONENT_symbol

Jeff and I were talking about some namespacing issues that have come up in
the recent BTL move from OMPI to OPAL.  AFAIK, the current system for
namespacing external symbols is to name them
"mca_FRAMEWORK_COMPONENT_symbol" (e.g., "mca_btl_tcp_add_procs" in the tcp
BTL).  Similarly, the DSO for the component is named
"mca_FRAMEWORK_COMPONENT.so" (e.g., "mca_btl_tcp.so").

Jeff asserted that the eventual goal is to move to a system where all MCA
frameworks/components are also prefixed by the project name.  So the above
examples become "mca_ompi_btl_tcp_add_procs" and "mca_ompi_btl_tcp.so".
Does anyone actually care about pursuing this goal?

I ask because if nobody wants to pursue the goal of adding project names to
namespaces then I already have an easy solution to most of our namespacing
problems.  OTOH, if someone does wish to pursue that goal, then I have a
namespace-related RFC that I would like to propose (in a subsequent email).

-Dave

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/07/15371.php


-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2014.0.4716 / Virus Database: 3986/7949 - Release Date: 07/30/14



Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread Paul Hargrove
Gilles,


Just as you speculate, PGI is creating a _-suffixed reference to the module
name:

$ pgf90 -c test.f90
$ nm -u test.o | grep f08
 U mpi_f08_sizeof_
 U mpi_f08_sizeof_mpi_sizeof_real_s_4_



You suggested the following work-around in a previous email:

$ INST/bin/mpifort  ../test.f
./BLD/ompi/mpi/fortran/use-mpi-f08/.libs/libforce_usempif08_internal_modules_to_be_built.a

That works fine.  That doesn't surprise me, because I had already
identified that file as having been removed from libmpi_usempif08.so
between 1.8.1 and 1.8.2rc2.  It includes the symbol for the module names
plus trailing '_'.

-Paul


On Thu, Jul 31, 2014 at 1:07 AM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:

> Paul,
>
> in .../ompi/mpi/fortran/use-mpi-f08, can you create the following dumb
> test program,
> compile and run nm | grep f08 on the object :
>
> $ cat foo.f90
> program foo
> use mpi_f08_sizeof
>
> implicit none
>
> real :: x
> integer :: size, ierror
>
> call MPI_Sizeof_real_s_4(x, size, ierror)
>
> stop
> end program
>
>
> with intel compiler :
> $ ifort -c foo.f90
> $ nm foo.o | grep f08
>  U mpi_f08_sizeof_mp_mpi_sizeof_real_s_4_
>
> i am wondering whether PGI compiler adds an additional undefined
> reference to mpi_f08_sizeof_ ...
>
> Cheers,
>
> Gilles
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15390.php
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread Gilles Gouaillardet
Paul,

in .../ompi/mpi/fortran/use-mpi-f08, can you create the following dumb
test program,
compile and run nm | grep f08 on the object :

$ cat foo.f90
program foo
use mpi_f08_sizeof

implicit none

real :: x
integer :: size, ierror

call MPI_Sizeof_real_s_4(x, size, ierror)

stop
end program


with intel compiler :
$ ifort -c foo.f90
$ nm foo.o | grep f08
 U mpi_f08_sizeof_mp_mpi_sizeof_real_s_4_

i am wondering whether PGI compiler adds an additional undefined
reference to mpi_f08_sizeof_ ...

Cheers,

Gilles



Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread Gilles Gouaillardet
Paul and all,

For what it's worth, with openmpi 1.8.2rc2 and the intel fortran
compiler version 14.0.3.174 :

$ nm libmpi_usempif08.so| grep -i sizeof

there is no such undefined symbol (mpi_f08_sizeof_)

as a temporary workaround, did you try to force the linker use
libforce_usempif08_internal_modules_to_be_built.a

/* this library does not get installed (at least with intel compilers),
but it is in the compilation tree */

Cheers,

Gilles


On 2014/07/31 12:53, Paul Hargrove wrote:
> In 1.8.2rc2:
> $ nm openmpi-1.8.2rc2-linux-x86_64-pgi-14.4/INST/lib/libmpi_usempif08.so |
> grep ' mpi_f08_sizeof_'
>  U mpi_f08_sizeof_



Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread tmishima


Hi Paul,

Thank you for your investigation. I'm sure it's very
close to fix the problem although I myself can't do
that. So I must owe you something...

Please try Awamori, which is Okinawa's sake and very
good in such a hot day.

Tetsuya

> On Wed, Jul 30, 2014 at 8:53 PM, Paul Hargrove wrote:
> [...]
> I have a clear answer to *what* is different (below) and am next looking
into the why/how now.
> It seems that 1.8.1 has included all dependencies into libmpi_usempif08
while 1.8.2rc2 does not.
>  [...]
>
> The difference appears to stem from the following difference in
ompi/mpi/fortran/use-mpi-f08/Makefile.am:
>
> 1.8.1:
> libmpi_usempif08_la_LIBADD = \
>         $(module_sentinel_file) \
>         $(OMPI_MPIEXT_USEMPIF08_LIBS) \
>         $(top_builddir)/ompi/libmpi.la
>
> 1.8.2rc2:
> libmpi_usempif08_la_LIBADD = \
>         $(OMPI_MPIEXT_USEMPIF08_LIBS) \
>         $(top_builddir)/ompi/libmpi.la
> libmpi_usempif08_la_DEPENDENCIES = $(module_sentinel_file)
>
> Where in both cases one has:
>
> module_sentinel_file = \
>         libforce_usempif08_internal_modules_to_be_built.la
>
> which contains all of the symbols which my previous testing found had
"disappeared" from libmpi_usempif08.so between 1.8.1 and 1.8.2rc2.
>
> I don't have recent enough autotools to attempt the change the
Makefile.am, but instead restored the removed item from
libmpi_usempif08_la_LIBADD directly in Makefile.in.  However, rather than
fixing
> the problem, that resulted in multiple definitions of a bunch of _eq and
_ne functions (e.g. mpi_f08_types_ompi_request_op_ne_).  So, I am uncertain
how to proceed.
>
> Use svn blame points at a "bulk" CMR of many fortran related changes,
including one related to the eq/ne operators.  So, I am turning over this
investigation to Jeff and/or Ralph to figure out what
> actually is required to fix this without loss of whatever benefits were
in that CMR.  I am still available to test the proposed fixes.  Happy
hunting...
>
> Somebody owes me a virtual beer (or nihonshu) ;-)
> -Paul
>
>
> --
>
> Paul H. Hargrove                          phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department     Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/develLink to
this post: http://www.open-mpi.org/community/lists/devel/2014/07/15387.php



Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread Paul Hargrove
On Wed, Jul 30, 2014 at 8:53 PM, Paul Hargrove  wrote:
[...]

> I have a clear answer to *what* is different (below) and am next looking
> into the why/how now.
> It seems that 1.8.1 has included all dependencies into libmpi_usempif08
> while 1.8.2rc2 does not.
>
 [...]

The difference appears to stem from the following difference in
ompi/mpi/fortran/use-mpi-f08/Makefile.am:

1.8.1:
libmpi_usempif08_la_LIBADD = \
$(module_sentinel_file) \
$(OMPI_MPIEXT_USEMPIF08_LIBS) \
$(top_builddir)/ompi/libmpi.la

1.8.2rc2:
libmpi_usempif08_la_LIBADD = \
$(OMPI_MPIEXT_USEMPIF08_LIBS) \
$(top_builddir)/ompi/libmpi.la
libmpi_usempif08_la_DEPENDENCIES = $(module_sentinel_file)

Where in both cases one has:

module_sentinel_file = \
libforce_usempif08_internal_modules_to_be_built.la

which contains all of the symbols which my previous testing found had
"disappeared" from libmpi_usempif08.so between 1.8.1 and 1.8.2rc2.

I don't have recent enough autotools to attempt the change the Makefile.am,
but instead restored the removed item from libmpi_usempif08_la_LIBADD
directly in Makefile.in.  However, rather than fixing the problem, that
resulted in multiple definitions of a bunch of _eq and _ne functions
(e.g. mpi_f08_types_ompi_request_op_ne_).  So, I am uncertain how to
proceed.

Use svn blame points at a "bulk" CMR of many fortran related changes,
including one related to the eq/ne operators.  So, I am turning over this
investigation to Jeff and/or Ralph to figure out what actually is required
to fix this without loss of whatever benefits were in that CMR.  I am still
available to test the proposed fixes.  Happy hunting...

Somebody owes me a virtual beer (or nihonshu) ;-)
-Paul


-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread Paul Hargrove
On Wed, Jul 30, 2014 at 6:20 PM, Paul Hargrove  wrote:

>
> On Wed, Jul 30, 2014 at 6:15 PM,  wrote:
> [...]
>
>> Strange thing is that openmpi-1.8 with PGI14.7 works fine.
>> What's the difference with openmpi-1.8 and openmpi-1.8.2rc2?
>>
> [...]
>
> Tetsuya,
>
> Now that I can reproduce the problem you have reported, I am building
> 1.8.1 with PGI14.4.
> Then I may be able to answer the question about what is different.
>
> -Paul
>


I have a clear answer to *what* is different (below) and am next looking
into the why/how now.
It seems that 1.8.1 has included all dependencies into libmpi_usempif08
while 1.8.2rc2 does not.
My reflex is to blame libtool, but config/lt* are unchanged between the two
versions.

I am rebuilding now with "V=1" passed to make so I can see how the libs
were built.
I'd appreciate guidance if Jeff or anybody else has suggestions as to an
alternative approach to investigate this.
When completed, I will be (more than) happy to turn over the verbose make
output for somebody else to examine.

-Paul

In 1.8.1:
$ nm openmpi-1.8.1-linux-x86_64-pgi-14.4/INST/lib/libmpi_usempif08.so |
grep ' mpi_f08_sizeof_'
0004a9a0 T mpi_f08_sizeof_
0004ad70 T mpi_f08_sizeof_mpi_sizeof_complex_a_16_
0004acf0 T mpi_f08_sizeof_mpi_sizeof_complex_a_8_
0004ad30 T mpi_f08_sizeof_mpi_sizeof_complex_s_16_
0004acb0 T mpi_f08_sizeof_mpi_sizeof_complex_s_8_
0004a9f0 T mpi_f08_sizeof_mpi_sizeof_integer_a_1_
0004aa70 T mpi_f08_sizeof_mpi_sizeof_integer_a_2_
0004aaf0 T mpi_f08_sizeof_mpi_sizeof_integer_a_4_
0004ab70 T mpi_f08_sizeof_mpi_sizeof_integer_a_8_
0004a9b0 T mpi_f08_sizeof_mpi_sizeof_integer_s_1_
0004aa30 T mpi_f08_sizeof_mpi_sizeof_integer_s_2_
0004aab0 T mpi_f08_sizeof_mpi_sizeof_integer_s_4_
0004ab30 T mpi_f08_sizeof_mpi_sizeof_integer_s_8_
0004abf0 T mpi_f08_sizeof_mpi_sizeof_real_a_4_
0004ac70 T mpi_f08_sizeof_mpi_sizeof_real_a_8_
0004abb0 T mpi_f08_sizeof_mpi_sizeof_real_s_4_
0004ac30 T mpi_f08_sizeof_mpi_sizeof_real_s_8_

In 1.8.2rc2:
$ nm openmpi-1.8.2rc2-linux-x86_64-pgi-14.4/INST/lib/libmpi_usempif08.so |
grep ' mpi_f08_sizeof_'
 U mpi_f08_sizeof_


Similar differences exist corresponding to the other three modules that
give undefined references in Tetsuya's simple test code.


-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900