Re: [OMPI devel] configure question

2010-02-17 Thread Ralf Wildenhues
Hello Greg,

* Greg Watson wrote on Tue, Feb 16, 2010 at 07:03:30PM CET:
> When I run configure under Snow Leopard (this is OMPI 1.3.4), I get the 
> following:
> 
> checking if C and Fortran 77 are link compatible... no
> **
> It appears that your Fortran 77 compiler is unable to link against
> object files created by your C compiler.  This typically indicates
> one of a few possibilities:
> 
>   - A conflict between CFLAGS and FFLAGS
>   - A problem with your compiler installation(s)
>   - Different default build options between compilers (e.g., C
> building for 32 bit and Fortran building for 64 bit)
>   - Incompatible compilers
> 
> Such problems can usually be solved by picking compatible compilers
> and/or CFLAGS and FFLAGS.  More information (including exactly what
> command was given to the compilers and what error resulted when the
> commands were executed) is available in the config.log file in this
> directory.
> **
> configure: error: C and Fortran 77 compilers are not link compatible.  Can 
> not continue.
> 
> Anyone know of the top of their head what these options would be, or even if 
> it is possible?

Well, did you take a look at the corresponding bits in the config.log
file?  Can you post them?

Thanks,
Ralf


[OMPI devel] v1.4 broken

2010-02-17 Thread Joshua Hursey
I just noticed that the nightly tarball of v1.4 failed to build in the OpenIB 
BTL last night. The error was:

-
btl_openib_component.c: In function 'init_one_device':
btl_openib_component.c:2089: error: 'mca_btl_openib_component_t' has no member 
named 'default_recv_qps'
-

It looks like CMR #2251 is the problem.

-- Josh


Re: [OMPI devel] v1.4 broken

2010-02-17 Thread Pavel Shamis (Pasha)

I'm checking this issue.

Pasha

Joshua Hursey wrote:

I just noticed that the nightly tarball of v1.4 failed to build in the OpenIB 
BTL last night. The error was:

-
btl_openib_component.c: In function 'init_one_device':
btl_openib_component.c:2089: error: 'mca_btl_openib_component_t' has no member 
named 'default_recv_qps'
-

It looks like CMR #2251 is the problem.

-- Josh
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




Re: [OMPI devel] 1.5.0 could be soon

2010-02-17 Thread Peter Kjellstrom
On Tuesday 16 February 2010, Jeff Squyres wrote:
> We've only got 2 "critical" 1.5.0 bugs left, and I think that those will
> both be closed out pretty soon.
>
> https://svn.open-mpi.org/trac/ompi/report/15
>
> Rainer and I both feel that a RC for 1.5.0 could be pretty soon.
>
> Does anyone have any heartburn with this?  Does anyone have any things they
> still need to get in v1.5.0?

I noticed that 1.5a1r22627 still has a very suboptimal default selection of 
(at least) alltoall algorithms. This has been mentioned several times since 
the first major discussion[1] but nothing seems to have improved.

A short re-cap of the situation is that by default ompi switches from bruck to 
basic-linear at ~100 bytes pkg size and this is bad. The first set of 
figures below are with vanilla ompi and the second set is with a dynamic 
rules file [2] that foreces bruck for all pkg sizes. For details on the 
system see [3].

The problem is equally visible on tcp as on openib. A concrete result is that 
OpenMPI on IB is way slower than other MPIs on 1G eth (for the affected pkg 
sizes (100-3000 bytes)).

[cap@n115 mpi]$ mpirun --host $(hostlist --expand -s',' 
$SLURM_JOB_NODELIST) --bind-to-core  ./alltoall.ompi15a1r22627 
profile.ompibadness
running in profile-from-file mode
bw for   400 x 1 B :   2.0 Mbytes/s  time was:  24.9 ms
bw for   400 x 25 B :  52.8 Mbytes/s time was:  23.9 ms
bw for   400 x 50 B :  82.2 Mbytes/s time was:  30.7 ms
bw for   400 x 75 B :  90.4 Mbytes/s time was:  41.8 ms
bw for   400 x 100 B : 109.2 Mbytes/stime was:  46.1 ms
bw for   400 x 200 B :   4.8 Mbytes/stime was:   2.1 s
bw for   400 x 300 B :   7.0 Mbytes/stime was:   2.2 s
bw for   400 x 400 B :   9.8 Mbytes/stime was:   2.1 s
bw for   400 x 500 B :  12.3 Mbytes/stime was:   2.0 s
bw for   400 x 750 B :  18.5 Mbytes/stime was:   2.0 s
bw for   400 x 1000 B :  24.6 Mbytes/s   time was:   2.0 s
bw for   400 x 1250 B :  29.9 Mbytes/s   time was:   2.1 s
bw for   400 x 1500 B :  35.1 Mbytes/s   time was:   2.2 s
bw for   400 x 2000 B :  45.5 Mbytes/s   time was:   2.2 s
bw for   400 x 2500 B :  51.0 Mbytes/s   time was:   2.5 s
bw for   400 x 3000 B : 113.6 Mbytes/s   time was:   1.3 s
bw for   400 x 3500 B : 123.3 Mbytes/s   time was:   1.4 s
bw for   400 x 4000 B : 135.7 Mbytes/s   time was:   1.5 s
totaltime was:  25.8 s
[cap@n115 mpi]$ mpirun --host $(hostlist --expand -s',' 
$SLURM_JOB_NODELIST) --bind-to-core -mca coll_tuned_use_dynamic_rules 1 -mca 
coll_tuned_dynamic_rules_filename ./dyn_rules ./alltoall.ompi15a1r22627 
profile.ompibadness
running in profile-from-file mode
bw for   400 x 1 B :   2.1 Mbytes/s  time was:  24.3 ms
bw for   400 x 25 B :  55.1 Mbytes/s time was:  22.9 ms
bw for   400 x 50 B :  82.6 Mbytes/s time was:  30.5 ms
bw for   400 x 75 B :  89.4 Mbytes/s time was:  42.3 ms
bw for   400 x 100 B : 109.9 Mbytes/stime was:  45.9 ms
bw for   400 x 200 B : 115.1 Mbytes/stime was:  87.6 ms
bw for   400 x 300 B : 117.8 Mbytes/stime was: 128.3 ms
bw for   400 x 400 B : 105.4 Mbytes/stime was: 191.2 ms
bw for   400 x 500 B : 113.4 Mbytes/stime was: 222.1 ms
bw for   400 x 750 B : 119.3 Mbytes/stime was: 316.9 ms
bw for   400 x 1000 B : 120.9 Mbytes/s   time was: 416.9 ms
bw for   400 x 1250 B : 121.0 Mbytes/s   time was: 520.6 ms
bw for   400 x 1500 B : 120.3 Mbytes/s   time was: 628.2 ms
bw for   400 x 2000 B : 118.0 Mbytes/s   time was: 854.1 ms
bw for   400 x 2500 B :  96.5 Mbytes/s   time was:   1.3 s
bw for   400 x 3000 B : 107.4 Mbytes/s   time was:   1.4 s
bw for   400 x 3500 B : 109.1 Mbytes/s   time was:   1.6 s
bw for   400 x 4000 B : 109.2 Mbytes/s   time was:   1.8 s
totaltime was:   9.7 s

[1] [OMPI users] scaling problem with openmpi
From: Roman Martonak 
  To: us...@open-mpi.org
  Date: 2009-05-16 00.20

[2]:
 1 # num of collectives
 3 # ID = 3 Alltoall collective (ID in coll_tuned.h)
 1 # number of com sizes
 32 # comm size 8
 1 # number of msg sizes
 0 3 0 0 # for message size 0, bruck 1, topo 0, 0 segmentation
 # end of first collective

[3]:
 OpenMPI: Built with intel-11.1.074 only configure options used were:
  --enable-orterun-prefix-by-default
  --prefix
 OS: CentOS-5.4 x86_64
 HW: Dual E5520 nodes with IB (ConnectX)
 Size of job: 8 nodes (that is 64 cores/ranks)

/Peter


signature.asc
Description: This is a digitally signed message part.


Re: [OMPI devel] configure question

2010-02-17 Thread Greg Watson
The problem seems to be that on SL, gfortran defaults to 32-bit binaries while 
gcc defaults to 64-bit. If I set FFLAGS=-m64 then configure finishes. Of 
course, I have no idea if a Fortran MPI program will actually *work*, but at 
least OMPI builds. That's all that matters isn't it? :-).

Greg

On Feb 17, 2010, at 2:01 AM, Ralf Wildenhues wrote:

> Hello Greg,
> 
> * Greg Watson wrote on Tue, Feb 16, 2010 at 07:03:30PM CET:
>> When I run configure under Snow Leopard (this is OMPI 1.3.4), I get the 
>> following:
>> 
>> checking if C and Fortran 77 are link compatible... no
>> **
>> It appears that your Fortran 77 compiler is unable to link against
>> object files created by your C compiler.  This typically indicates
>> one of a few possibilities:
>> 
>>  - A conflict between CFLAGS and FFLAGS
>>  - A problem with your compiler installation(s)
>>  - Different default build options between compilers (e.g., C
>>building for 32 bit and Fortran building for 64 bit)
>>  - Incompatible compilers
>> 
>> Such problems can usually be solved by picking compatible compilers
>> and/or CFLAGS and FFLAGS.  More information (including exactly what
>> command was given to the compilers and what error resulted when the
>> commands were executed) is available in the config.log file in this
>> directory.
>> **
>> configure: error: C and Fortran 77 compilers are not link compatible.  Can 
>> not continue.
>> 
>> Anyone know of the top of their head what these options would be, or even if 
>> it is possible?
> 
> Well, did you take a look at the corresponding bits in the config.log
> file?  Can you post them?
> 
> Thanks,
> Ralf
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] RFC: ABI break between 1.4 and 1.5 / .so versioning

2010-02-17 Thread Jeff Squyres
WHAT: Break ABI between 1.4 and 1.5 series.

WHY: To settle the ABI and .so versioning issues once and for all.

WHERE: Open MPI's .so versions and the opal_wrapper compiler.

WHEN: For 1.5[.0].  This is only meaningful if we do it for the *entire* v1.5 
series.

TIMEOUT: Next Tuesday teleconf, 23 Feb 2010

===

BACKGROUND / REQUIRED READING:
--

 * Ticket 2092: https://svn.open-mpi.org/trac/ompi/ticket/2092
 * Libtool .so versioning rules: 
https://svn.open-mpi.org/trac/ompi/wiki/ReleaseProcedures

Libtool .so version numbers are expressed as c:r:a.  libmpi is currently 
versioned "correctly", meaning that we advance the c:r:a triple as necessary 
for each release.  libopen-pal and libopen-rte, however, are currently fixed at 
0:0:0, which is Wrong.  The reasons why they are fixed at 0:0:0 are expressed 
in #2092.

SHORT VERSION OF THIS PROPOSAL:
---

 * For v1.5.0, set c:r:a of libmpi to 1:0:0.
 * Starting with v1.5.0, set c:r:a for libopen-rte and libopen-al properly.
 * This means a break in ABI between v1.4.x and v1.5.x, but the ABI will remain 
constant for all of 1.5.x/1.6.x.
 * The wrapper compilers will need to be updated to recognize the difference 
between static and dynamic linking.

LONGER VERSION / MORE DETAILS AND RATIONALE:


The fix for these issues involves several dominos falling in order.  You need 
to read this whole proposal to understand the full scope, sorry.  :-\

1. We need to fix the wrapper compilers to recognize the difference between 
shared library linking and static linking.  Right now, the MPI wrappers always 
do this:

-lmpi -lopen-rte -lopen-pal

2. Listing all three libraries is only necessary when linking statically.  When 
linking dynamically, only the top-level library should be listed (e.g., -lmpi 
for MPI applications).  The implicit linker dependencies of libmpi.so will 
automatically pull in libopen-rte.so.  Likewise, the implicit dependencies of 
libopen-rte.so will automatically pull in libopen-pal.so.  More specifically, 
when linking dynamically, MPI a.out applications will only explicitly depend on 
libmpi.so (not libopen-rte.so and not libopen-pal.so).

3. Hence, the wrappers need to learn the difference between static and dynamic 
linking: when linking dynamically, only list "-lmpi".  When linking statically, 
list all 3 libraries.  This allows minimization of explicit library 
dependencies in dynamic linking, and is arguably the Right way to do it.

--> More below about how to make the wrappers understand the difference between 
static/shared linking.

4. When MPI applications only depend on libmpi, we can properly version 
libopen-rte.so and libopen-pal.so.  Hence, for v1.5.0, we will have non-0:0:0 
.so versions for these two libraries.

5. Since MPI application a.out's created by the v1.4 series will have explicit 
dependencies on all 3 libraries, they will be ABI incompatible with Open MPI 
v1.5's ORTE and OPAL libraries (as opposed to MPI applications created with 
updated wrappers in v1.5, which will only depend on libmpi when linking 
dynamically).

6. The question then remains: what to set libmpi.so's c:r:a values in v1.5.0?  
I say it should be 1:0:0.  Here's why:
  * Recall that we have added some new MPI-2.2 functions in v1.5.  Hence, 
libmpi.so's "c" needs to increase to 1 and "r" needs to be set to 0.  The 
questions is what to do with the "a" value.
  * By extension of #5, we should also make libmpi.so be ABI incompatible 
between v1.4.x and v1.5.x (to prevent some needless confusion -- rather than 
have libmpi be ABI compatible and libopen-rte and libopen-pal *not* be ABI 
compatible, I think it would be better to make *all 3* be ABI incompatible).  
This means setting the libmpi.so "a" value to 0 (as opposed to setting it to 1).

Crystal clear?  I thought so.  :-)

--

Here's my proposal on how to change the wrapper compilers to understand the 
difference between static and dynamic linking:

*** FIRST: give the wrapper the ability to link one library or all libraries
- wrapper data text files grow a new option: libs_private (a la pkg-config(1) 
files)
- wrapper data text files list -l in libs, and everything else in 
libs_private.  For example, for mpicc:
  libs=-lmpi
  libs_private=-lopen-rte -lopen-pal

*** NEXT: give the wrappers the ability to switch between just ${libs} or 
${libs}+${libs_private}.  Pseudocode:
- wrapper always adds ${libs} to the argv
- wrapper examines each argv[x]:
  --ompi:shared) found_in_argv=1 ;;
  --ompi:static) add ${libs_private} ; found_in_argv=1 ;;
- if (!found_in_argv) 
  - if default set via configure, add ${libs_private} (SEE BELOW)

*** LAST: give sysadmin ability to set wrapper behavior defaults
- if --disable-shared is set in OMPI's configure, wrappers default to adding 
both ${libs} and ${libs_private}
- new configure option: --enable-wrapper-

Re: [OMPI devel] RFC: ABI break between 1.4 and 1.5 / .so versioning

2010-02-17 Thread Barrett, Brian W
On Feb 17, 2010, at 11:23 AM, Jeff Squyres wrote:
> Here's my proposal on how to change the wrapper compilers to understand the 
> difference between static and dynamic linking:
> 
> *** FIRST: give the wrapper the ability to link one library or all libraries
> - wrapper data text files grow a new option: libs_private (a la pkg-config(1) 
> files)
> - wrapper data text files list -l in libs, and everything else in 
> libs_private.  For example, for mpicc:
>  libs=-lmpi
>  libs_private=-lopen-rte -lopen-pal
> 
> *** NEXT: give the wrappers the ability to switch between just ${libs} or 
> ${libs}+${libs_private}.  Pseudocode:
> - wrapper always adds ${libs} to the argv
> - wrapper examines each argv[x]:
>  --ompi:shared) found_in_argv=1 ;;
>  --ompi:static) add ${libs_private} ; found_in_argv=1 ;;
> - if (!found_in_argv) 
>  - if default set via configure, add ${libs_private} (SEE BELOW)

This is horrible!  Users want to be able to specify -Bstatic or -static or 
whatever and have the right things happen.  I have a better idea - since 
there's basically no set of users which use OMPI's libopal and some other 
libopal (and indeed, that's near impossible do to the horrible API exposed by 
opal (data type sizes changing based on configure arguments, for example), 
let's give up and just have libmpi.{so,a} and completely avoid this whole rat 
hole of a problem?

There's simply no way your solution is workable for most users.  They'll just 
end up wondering why when they do -Bstatic (or whatever the option is on their 
compiler) they get missing symbol link errors.

Brian



Re: [OMPI devel] RFC: ABI break between 1.4 and 1.5 / .so versioning

2010-02-17 Thread Jeff Squyres
Brian and I talked higher bandwidth to figure this out.

The issue is that if the user has to specify -static to their linker, they 
*also* have to specify --ompi:static, or Bad Things will happen.  Or, if they 
don't specify -static but *only* specify --ompi:static, Bad Things will happen. 
 In short: it seems like adding yet another wrapper-compiler-specific flag to 
the MPI ecosystem will cause confusion, fear, and possibly the death of some 
cats.

The alternate proposal is to have one-big-honkin' libmpi that slurps in all of 
libopen-rte and libopen-pal.  We'll still install libopen-rte and libopen-pal 
because the tools (like orterun and friends) will need them.  But MPI apps will 
only -lmpi, regardless of whether they are static or shared.  There will never 
be a need to -lopen-rte and -lopen-pal for MPI apps.

Analogous things will happen for ORTE: libopen-rte will slurp in libopen-pal.  
And ORTE apps will only -lopen-rte.  Birds will sing.  Children will play.  The 
world will be content.

--> NOTE: The ABI break will still occur between 1.4 and 1.5 because we'll be 
.so versioning libopen-pal and libopen-rte.  The only issue Brian was concerned 
about was the modification of the wrapper compilers.

If we do this, is there anyone who will still want the old 3-library behavior?  
Specifically:

a) the libraries are not slurped into each other, and
b) the MPI wrapper compilers still list all 3 libraries / ORTE wrapper 
compilers still list 2 libs

If so, we can add a --with-wrappers-linking-all-libs configure switch (or 
something with a better name) to support the old behavior, but I'd argue that 
it should not be the default.



On Feb 17, 2010, at 1:31 PM, Barrett, Brian W wrote:

> On Feb 17, 2010, at 11:23 AM, Jeff Squyres wrote:
> > Here's my proposal on how to change the wrapper compilers to understand the 
> > difference between static and dynamic linking:
> >
> > *** FIRST: give the wrapper the ability to link one library or all libraries
> > - wrapper data text files grow a new option: libs_private (a la 
> > pkg-config(1) files)
> > - wrapper data text files list -l in libs, and everything else in 
> > libs_private.  For example, for mpicc:
> >  libs=-lmpi
> >  libs_private=-lopen-rte -lopen-pal
> >
> > *** NEXT: give the wrappers the ability to switch between just ${libs} or 
> > ${libs}+${libs_private}.  Pseudocode:
> > - wrapper always adds ${libs} to the argv
> > - wrapper examines each argv[x]:
> >  --ompi:shared) found_in_argv=1 ;;
> >  --ompi:static) add ${libs_private} ; found_in_argv=1 ;;
> > - if (!found_in_argv)
> >  - if default set via configure, add ${libs_private} (SEE BELOW)
> 
> This is horrible!  Users want to be able to specify -Bstatic or -static or 
> whatever and have the right things happen.  I have a better idea - since 
> there's basically no set of users which use OMPI's libopal and some other 
> libopal (and indeed, that's near impossible do to the horrible API exposed by 
> opal (data type sizes changing based on configure arguments, for example), 
> let's give up and just have libmpi.{so,a} and completely avoid this whole rat 
> hole of a problem?
> 
> There's simply no way your solution is workable for most users.  They'll just 
> end up wondering why when they do -Bstatic (or whatever the option is on 
> their compiler) they get missing symbol link errors.
> 
> Brian
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] RFC: ABI break between 1.4 and 1.5 / .so versioning

2010-02-17 Thread Ralf Wildenhues
Hello Jeff,

* Jeff Squyres wrote on Wed, Feb 17, 2010 at 08:19:25PM CET:
> The issue is that if the user has to specify -static to their linker,
> they *also* have to specify --ompi:static, or Bad Things will happen.
> Or, if they don't specify -static but *only* specify --ompi:static,
> Bad Things will happen.  In short: it seems like adding yet another
> wrapper-compiler-specific flag to the MPI ecosystem will cause
> confusion, fear, and possibly the death of some cats.

Do you care for omitting -lopen-pal and -lorte only for capable Linux
systems?  With new-enough binutils, you should be able to use
-Wl,--as-needed -Wl,--no-as-needed around these two libs.

I'm not entirely sure I understand your argumentation for why libmpi
from 1.5.x has to be binary incompatible, but I haven't fully thought
through this yet.

Cheers,
Ralf


Re: [OMPI devel] PATCH: remove trailing colon at the end of thegenerated LD_LIBRARY_PATH

2010-02-17 Thread Jeff Squyres
Looks good to me!

Please commit and file CMRs for v1.4 and v1.5 (assuming this patch applies 
cleanly to both branches).


On Feb 16, 2010, at 6:46 AM, Nadia Derbey wrote:

> Hi,
> 
> The mpivars.sh genereted in openmpi.spec might in some cases lead to a
> LD_LIBRARY_PATH that contains a trailing ":". This happens if the
> LD_LIBRARY_PATH is originally unset.
> This means that current directory is included in the search path for the
> loader, which might not be the desired result.
> 
> The following patch proposal fixes this potential issue by adding the
> ":" only if LD_LIBRARY_PATH is already set.
> 
> Regards,
> Nadia
> 
> 
> diff -r 6609b6ba7637 contrib/dist/linux/openmpi.spec
> --- a/contrib/dist/linux/openmpi.spec   Mon Feb 15 22:14:59 2010 +
> +++ b/contrib/dist/linux/openmpi.spec   Tue Feb 16 12:44:41 2010 +0100
> @@ -505,7 +505,7 @@ fi
> 
>  # LD_LIBRARY_PATH
>  if test -z "\`echo \$LD_LIBRARY_PATH | grep %{_libdir}\`"; then
> -LD_LIBRARY_PATH=%{_libdir}:\${LD_LIBRARY_PATH}
> +LD_LIBRARY_PATH=%{_libdir}\${LD_LIBRARY_PATH:+:}\${LD_LIBRARY_PATH}
>  export LD_LIBRARY_PATH
>  fi
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] RFC: ABI break between 1.4 and 1.5 / .so versioning

2010-02-17 Thread Jeff Squyres
On Feb 17, 2010, at 3:05 PM, Ralf Wildenhues wrote:

> > The issue is that if the user has to specify -static to their linker,
> > they *also* have to specify --ompi:static, or Bad Things will happen.
> > Or, if they don't specify -static but *only* specify --ompi:static,
> > Bad Things will happen.  In short: it seems like adding yet another
> > wrapper-compiler-specific flag to the MPI ecosystem will cause
> > confusion, fear, and possibly the death of some cats.
> 
> Do you care for omitting -lopen-pal and -lorte only for capable Linux
> systems?  With new-enough binutils, you should be able to use
> -Wl,--as-needed -Wl,--no-as-needed around these two libs.

Mmmm.  Good point.  But I don't think it helps us on Solaris or OS X, does it?  
(maybe it does on OS X?)  Or do all linkers have some kind of option like this? 
 (this *might* be a way out, but I would probably need to be convinced :-) )

> I'm not entirely sure I understand your argumentation for why libmpi
> from 1.5.x has to be binary incompatible, but I haven't fully thought
> through this yet.

The context for this issue is so long that much was left out of my mail.  
Here's this particular issue in a nutshell:

- Open MPI v1.4.1 has libmpi at 0:1:0 and libopen-rte and libopen-pal both at 
0:0:0.
- Open MPI v1.4.1 links MPI apps against -lmpi -lopen-rte -lopen-pal.
- If we start .so versioning properly in v1.5, it's likely that libopen-rte and 
libopen-pal will both be 1:0:0.
  --> Note that these are both internal libraries; there are no symbols in 
these libraries that are used in the MPI applications.
- Open MPI v1.5 libmpi *could* be 1:0:1.
- Hence, an a.out created for OMPI v1.4.1 would work fine with v1.5 libmpi.
- But that a.out would not work with v1.5 libopen-rte and libopen-pal.

The problem is that our internal APIs change not infrequently, and potentially 
in incompatible ways.  This shouldn't (doesn't) matter to MPI applications, but 
because we "-lmpi -lopen-rte -lopen-pal" even for shared library linking, the 
linker thinks that it *does* matter because we've established an explicit 
dependency from a.out to all 3 libraries.

My initial idea was to add special flags to the wrapper compilers that the user 
would use to indicate whether it should be "-lmpi" (shared link) or "-lmpi 
-lopen-rte -lopen-pal" (static link).  Brian hates this.  :-)

Brian's idea is to make libmpi.la slurp up libopen-rte.la as a convenience 
library.  Similarly, have libopen-rte.la slurp up libopen-pal.la as a 
convenience library.  Hence, only -lmpi is needed regardless of whether you're 
linking statically or dynamically.

Regardless of which way we go, if we start .so versioning libopen-rte and 
libopen-pal in v1.5, ABI will break between v1.4 and v1.5.  We *do* need to fix 
the .so versioning issues of libopen-rte and libopen-pal; if we don't do it for 
v1.5.0, our next opportunity will be to do it in v1.7 (which is quite a long 
time off) because I refuse to do this size of a change in the middle of a 
release series.  All we'll have done is put off the pain until later.

Hopefully, that made sense.  :-)

-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




[OMPI devel] Limitations on pending nonblocking operations

2010-02-17 Thread Christian Csar
I'm trying to figure out what the limitation is for the number of
pending nonblocking operations as it does not seem to be specified
anywhere. I apologize if this is better suited to the user list, but
this seemed like information more likely to be available on the dev list.

As part of a toy assignment involving multiplying triangular square
matrices, one solution being compared sends each row and column
individually. On matrices of 100 and 1000 rows the program functions
fine. However with 5000 rows it functions correctly with 8 processes
spread across 4 or 2 nodes, but not on a single node, similarly for 4
processes it works on 2 nodes, but not one, and for 2 processes on 1
node it fails. The failure appears to be because there are some number
(at least 2500) of receives that never complete causing a waitany to
never return. No errors are produced from the MPI_Isends, nor from the
MPI_Irecv's nor the MPI_Waitany.

As it works on multiple nodes, but not one node, it seems reasonable to
believe that the problem lies with there being too many nonblocking
operations in progress, as there are a total of around 18000 pending
operations at once if all the processes are run on one node.

The standard says the following, but I can't seem to find a definition
of what Open MPI considers pathological, and information on where to
find this would be appreciated. I've attached the results of ompi_info
--all if it is of any use.

"If the call causes some system resource to be exhausted, then it will
fail and return an error code. Quality implementations of MPI should
ensure that this happens only in ``pathological'' cases. That is, an MPI
implementation should be able to support a large number of pending
nonblocking operations."

Sincerely,
   Christian Csar


ompi_info.gz
Description: GNU Zip compressed data