from:"Jed Brown"

Re: [OMPI users] OMPI users] Fortran vs C reductions

2016-02-10 Thread Jed Brown

Gilles Gouaillardet  writes:
>> implementation.  Must I compile in support for being called with
>> MPI_DOUBLE_COMPLEX?
>>
> does that really matter ?

Possibly.  For example, if the library needed to define some static
data, its setup might involve communicating values before being called
with that particular type.  That setup phase would fail if the Fortran
type is invalid.

> i assume your library and the user code are built with the same OpenMPI.
> if there is no Fortran support, then you are compiling code that cannot 
> be invoked (e.g. dead code),
> and though that is not the most elegant thing to do, that does not sound 
> like a showstopper to me.
>
> so yes, compile in support for being called with Fortran predefined 
> datatypes,
> worst case scenario is you generate broken dead code.

No, worst case is that the library crashes at run-time, e.g., during
setup of some sort.

I don't have a specific library with this behavior, but I can fill in
the details to scientifically justify such a thing.

Anyway, my suggestion is to either make a compile-time error so that a
configure script can test its validity or make it possible to query at
run-time whether the type/object is valid.  The latter would have the
advantage that you could rebuild MPI to add Fortran support and
dependent projects would not need to be rebuilt because they saw the
same environment.  I think that would involve new function(s).

signature.asc
Description: PGP signature

Re: [OMPI users] OMPI users] Fortran vs C reductions

2016-02-09 Thread Jed Brown

Dave Love  writes:

> Jed Brown  writes:
>
>> Isn't that entirely dependent on the Fortran compiler?  There is no
>> universal requirement that there be a relationship between Fortran
>> INTEGER and C int, for example.
>
> In case it's not generally obvious:  the compiler _and its options_.
> You can typically change the width of real and double precision, as with
> gfortran -default-real-8, and similarly for integer.  (It seems unKIND
> if the MPI standard specifically enshrines double precision, but
> anyhow...)

Indeed.  (Though such options are an abomination.)

>> Feature tests are far more reliable, accurate, and lower maintenance
>> than platform/version tests.  When a package defines macros/symbols that
>> fail at run-time, it makes feature tests much more expensive.  Even more
>> so when cross-compiling, where run-time tests require batch submission.
>
> Right, but isn't the existence of the compiler wrapper the appropriate
> test for Fortran support, and don't you really need it to run
> Fortran-related feature tests? 

The wrapper might not exist.  It doesn't on many prominent platforms
today.

> I have an "integer*8" build of OMPI, for instance.  It's a pain
> generally when build systems for MPI stuff avoid compiler wrappers,
> and I'd hope that using them could make possibly-unfortunate standards
> requirements like this moot.  Would there be a problem with that in
> this case?

In the example of my other reply, my library would not need to call MPI
From Fortran, but needs to know whether it needs to compile support for
decoding Fortran datatypes.

My personal opinion is that compiler wrappers are gross (they don't
compose), but systems like CMake that insist on circumventing compiler
wrappers in a yet more error-prone way are worse.


signature.asc
Description: PGP signature

Re: [OMPI users] OMPI users] Fortran vs C reductions

2016-02-09 Thread Jed Brown

Gilles Gouaillardet  writes:

> Jed,
>
> my 0.02US$
>
> we recently had a kind of similar discussion about MPI_DATATYPE_NULL, and
> we concluded
> ompi should do its best to implement the MPI standard, and not what some of
> us think the standard should be.

Did anyone suggest violating the standard?

> in your configure script, you can simply try to compile a simple fortran
> MPI hello world.
> if it fails, then you can assume fortran bindings are not available, and
> not use fortran types in your application.

With which compiler?  Remember that we're talking about the C macros --
the user of those might not have any Fortran in their code.  Like
suppose I have a C library that implements a custom reduction.  I'll
need to be checking the datatype to dispatch to a concrete
implementation.  Must I compile in support for being called with
MPI_DOUBLE_COMPLEX?

signature.asc
Description: PGP signature

Re: [OMPI users] OMPI users] Fortran vs C reductions

2016-02-09 Thread Jed Brown

George Bosilca  writes:

> Now we can argue if DOUBLE PRECISION in Fortran is a double in C. As these
> languages are interoperable, and there is no explicit conversion function,
> it is safe to assume this is the case. Thus, is seems to me absolutely
> legal to provide the MPI-required support for DOUBLE PRECISION despite the
> fact that Fortran support is not enabled.

Isn't that entirely dependent on the Fortran compiler?  There is no
universal requirement that there be a relationship between Fortran
INTEGER and C int, for example.

> Now taking a closer look at the op, I see nothing in the standard the would
> require to provide the op if the corresponding language is not supported.
> While it could be nice (as a convenience for the users and also because
> there is no technical reason not to) to enable the loc op, on non native
> datatypes, this is not mandatory. Thus, the current behavior exposed by
> Open MPI is acceptable from the standard perspective.

I believe the question is not whether it's standard-compliant to define
the types when they are not supported (the OP's usage doesn't sound
valid anyway because they are using the Fortran MPI datatypes to refer
to C types).  Rather, the question is: if those types are
non-functional, can/should they be removed from the header.  This, for
example, allows a configure script to test whether those datatypes
exist.

Feature tests are far more reliable, accurate, and lower maintenance
than platform/version tests.  When a package defines macros/symbols that
fail at run-time, it makes feature tests much more expensive.  Even more
so when cross-compiling, where run-time tests require batch submission.

The fact is that if a package makes it impractical to test features, the
end-user experience reflects poorly on that package and all of its
dependencies (though which user support passes).  It's the sort of thing
that drives users and developers away from the platform.

Since I don't think you can make the Fortran types reliable without
access to a Fortran compiler, my suggestion would be remove the symbols
when Fortran is not available.

signature.asc
Description: PGP signature

Re: [OMPI users] Open MPI MPI-OpenMP Hybrid Binding Question

2016-01-25 Thread Jed Brown

Dave Love  writes:
> PETSc can't be using MPI-3 because I'm in the process of fixing rpm
> packaging for the current version and building it with ompi 1.6.

It would be folly for PETSc to ship with a hard dependency on MPI-3.
You wouldn't be able to package it with ompi-1.6, for example.  But that
doesn't mean PETSc's configure can't test for MPI-3 functionality and
use it when available.  Indeed, it does (though for different capability
than mentioned in this thread).

> (Exascale is only of interest if when are spins-off useful for
> university-scale systems.)  I was hoping for a running example.

The relevant example for the technique mentioned in this thread is in
src/ksp/ksp/examples/tests/benchmarkscatters of the 'master' versus
'barry/utilize-hwloc' branches.  It's completely experimental at this
time.

signature.asc
Description: PGP signature

Re: [OMPI users] [petsc-maint] Deadlock in OpenMPI 1.8.3 and PETSc 3.4.5

2015-02-23 Thread Jed Brown

"Jeff Squyres (jsquyres)"  writes:
> This is, unfortunately, an undefined area of the MPI specification.  I
> do believe that our previous behavior was *correct* -- it just
> deadlocks with PETSC because PETSC is relying on undefined behavior.

Jeff, can you clarify where in the standard this is left undefined?  Is
one to assume that callbacks can never call into MPI unless explicitly
allowed?  Note that empirically, this usage has worked with all
implementations since 1998, except this version of Open MPI.

If the callback is to be considered invalid, how would you recommend
implementing two-way linked communicators?

> For those who care, Microsoft/Cisco proposed a new attribute system to
> the Forum a while ago that removes all these kinds of ambiguities (see
> http://meetings.mpi-forum.org/secretary/2013/09/slides/jsquyres-attributes-revamp.pdf).
> However, we didn't get a huge amount of interest, and therefore lost
> our window of availability opportunity to be able to advance the
> proposal.  I'd be more than happy to talk anyone through the proposal
> if they have interest/cycles in taking it over and advancing it with
> the Forum.
>
> Two additional points from the PDF listed above:
>
> - on slide 21, it was decided to no allow the recursive behavior (i.e., you 
> can ignore the "This is under debate" bullet.
> - the "destroy" callback was not judged to be useful; you can ignore slides 
> 22 and 23.



signature.asc
Description: PGP signature

Re: [OMPI users] [FEniCS] Question about MPI barriers

2014-10-17 Thread Jed Brown

Martin Sandve Alnæs  writes:

> Thanks, but ibarrier doesn't seem to be in the stable version of openmpi:
> http://www.open-mpi.org/doc/v1.8/
> Otherwise mpi_ibarrier+mpi_test+homemade time/sleep loop would do the trick.

MPI_Ibarrier is there (since 1.7), just missing a man page.


pgpBxd2PEYphR.pgp
Description: PGP signature

Re: [OMPI users] latest stable and win7/msvc2013

2014-07-17 Thread Jed Brown

Damien  writes:

> Visual Studio can link libs compiled with Intel.  

The headers also need to fall within the language subset implemented by
MSVC, but this is easier to ensure and the Windows ecosystem seems to be
happy with binary distribution.


pgpHHN3ASH0QJ.pgp
Description: PGP signature

Re: [OMPI users] latest stable and win7/msvc2013

2014-07-17 Thread Jed Brown

Ralph Castain  writes:

> Yeah, but I'm cheap and get the Intel compilers for free :-)

Fine for you, but not for the people trying to integrate your library
in a stack developed using MSVC.


pgpDnaWHH5nyy.pgp
Description: PGP signature

Re: [OMPI users] latest stable and win7/msvc2013

2014-07-17 Thread Jed Brown

Rob Latham  writes:
> hey, (almost all of) c99 support is in place in visual studio 2013
> http://blogs.msdn.com/b/vcblog/archive/2013/07/19/c99-library-support-in-visual-studio-2013.aspx

This talks about the standard library, but not whether the C frontend
has acquired these features.  Are they attempting to support C99 or
merely providing some features to C++ users?

BTW, did Microsoft ever fix the misimplementation of variadic macros?

http://stackoverflow.com/questions/5134523/msvc-doesnt-expand-va-args-correctly

pgp47P3r85ypC.pgp
Description: PGP signature

Re: [OMPI users] latest stable and win7/msvc2013

2014-07-17 Thread Jed Brown

Damien  writes:

> Is this something that could be funded by Microsoft, and is it time to 
> approach them perhaps?  MS MPI is based on MPICH, and if mainline MPICH 
> isn't supporting Windows anymore, then there won't be a whole lot of 
> development in an increasingly older Windows build.  With the Open-MPI 
> roadmap, there's a lot happening.  Would it be a better business model 
> for MS to piggy-back off of Open-MPI ongoing innovation, and put their 
> resources into maintaining a Windows build of Open-MPI instead?

Maybe Fab can comment on Microsoft's intentions regarding MPI and
C99/C11 (just dreaming now).

> On 2014-07-17 11:42 AM, Jed Brown wrote:
>> Rob Latham  writes:
>>> Well, I (and dgoodell and jsquyers and probably a few others of you) can
>>> say from observing disc...@mpich.org traffic that we get one message
>>> about Windows support every month -- probably more often.
>> Seems to average at least once a week.  We also see regular petsc
>> support emails wondering why --download-{mpich,openmpi} does not work on
>> Windows.  (These options are pretty much only used by beginners for whom
>> PETSc is their first encounter with MPI.)


pgpNqLEhkDbvI.pgp
Description: PGP signature

Re: [OMPI users] latest stable and win7/msvc2013

2014-07-17 Thread Jed Brown

Rob Latham  writes:
> Well, I (and dgoodell and jsquyers and probably a few others of you) can 
> say from observing disc...@mpich.org traffic that we get one message 
> about Windows support every month -- probably more often.

Seems to average at least once a week.  We also see regular petsc
support emails wondering why --download-{mpich,openmpi} does not work on
Windows.  (These options are pretty much only used by beginners for whom
PETSc is their first encounter with MPI.)

pgpTrcUQKFdOM.pgp
Description: PGP signature

[OMPI users] CXX=no in config.status, breaks mpic++ wrapper

2014-01-14 Thread Jed Brown

With ompi-git from Monday (7e023a4ebf1aeaa530f79027d00c1bdc16b215fd),
configure is putting "compiler=no" in
ompi/tools/wrappers/mpic++-wrapper-data.txt:

# There can be multiple blocks of configuration data, chosen by
# compiler flags (using the compiler_args key to chose which block
# should be activated.  This can be useful for multilib builds.  See the
# multilib page at:
#https://svn.open-mpi.org/trac/ompi/wiki/compilerwrapper3264 
# for more information.

project=Open MPI
project_short=OMPI
version=1.9a1
language=C++
compiler_env=CXX
compiler_flags_env=CXXFLAGS
compiler=no
preprocessor_flags=   
compiler_flags_prefix=
compiler_flags=-pthread 
linker_flags= -Wl,-rpath -Wl,@{libdir} -Wl,--enable-new-dtags
# Note that per https://svn.open-mpi.org/trac/ompi/ticket/3422, we
# intentionally only link in the MPI libraries (ORTE, OPAL, etc. are
# pulled in implicitly) because we intend MPI applications to only use
# the MPI API.
libs= -lmpi
libs_static= -lmpi -lopen-rte -lopen-pal -lm -lnuma -lpciaccess -ldl 
dyn_lib_file=libmpi.so
static_lib_file=libmpi.a
required_file=
includedir=${includedir}
libdir=${libdir}


This breaks the wrapper:

$ /path/to/mpic++
--
The Open MPI wrapper compiler was unable to find the specified compiler
no in your PATH.

Note that this compiler was either specified at configure time or in
one of several possible environment variables.
--


Attaching logs because it's not obvious to me what is going wrong.
Automake-1.14.1 and autoconf-2.69.



config.log.xz
Description: application/xz


config.status.xz
Description: application/xz


pgpJ2gIX0o4tr.pgp
Description: PGP signature

Re: [OMPI users] MPI stats argument in Fortran mpi module

2014-01-08 Thread Jed Brown

"Jeff Squyres (jsquyres)"  writes:
>> Totally superficial, just passing "status(1)" instead of "status" or
>> "status(1:MPI_STATUS_SIZE)".
>
> That's a different type (INTEGER scalar vs. INTEGER array).  So the
> compiler complaining about that is actually correct.

Yes, exactly.

> Under the covers, Fortran will (most likely) pass both by reference,
> so they'll both actually (most likely) *work* if you build with an MPI
> that doesn't provide an interface for MPI_Recv, but passing status(1)
> is actually incorrect Fortran.

Prior to slice notation, this would be the only way to build an array of
statuses.  I.e., receives go into status(1:MPI_STATUS_SIZE),
status(1+MPI_STATUS_SIZE:2*MPI_STATUS_SIZE), etc.  Due to
pass-by-reference semantics, I think this will always work, despite not
type-checking with explicit interfaces.  I don't know what the language
standard says about backward-compatibility of such constructs, but
presumably we need to know the dialect to understand whether it's
acceptable.  (I actually don't know if the Fortran 77 standard defines
the behavior when passing status(1), status(1+MPI_STATUS_SIZE), etc., or
whether it works only as a consequence of the only reasonable
implementation.

> I think you're saying that you agree with my above statements about
> the different types, and you're just detailing how you got to asking
> about WTF we were providing an MPI_Recv interface in the first place.
> Kumbaya.  :-)

Indeed.

pgpRjssQGNBtq.pgp
Description: PGP signature

Re: [OMPI users] Regression: Fortran derived types with newer MPI module

2014-01-08 Thread Jed Brown

"Jeff Squyres (jsquyres)"  writes:

> As I mentioned Craig and I debated long and hard to change that
> default, but, in summary, we apparently missed this clause on p610.
> I'll change it back.

Okay, thanks.

> I'll be happy when gfortran 4.9 is released that supports ignore TKR
> and you'll get proper interfaces. :-)

Better for everyone.

>> I don't call MPI from Fortran, but someone on a Fortran project that I
>> watch mentioned that the compiler would complain about such and such a
>> use (actually relating to types for MPI_Status in MPI_Recv rather than
>> buffer types).  
>
> Can you provide more details here?  Choice buffer issues aside, I'm
> failing to think of a scenario where you should get a compile mismatch
> for the MPI status dummy argument in MPI_Recv...

Totally superficial, just passing "status(1)" instead of "status" or
"status(1:MPI_STATUS_SIZE)".  I extrapolated: how can they provide an
explicit interface to MPI_Recv in "use mpi", given portability
constraints/existing language standards?


pgpCIfMJ5CYnP.pgp
Description: PGP signature

Re: [OMPI users] Regression: Fortran derived types with newer MPI module

2014-01-07 Thread Jed Brown

"Jeff Squyres (jsquyres)"  writes:

> Yes, I can explain what's going on here.  The short version is that a
> change was made with the intent to provide maximum Fortran code
> safety, but with a possible backwards compatibility issue.  If this
> change is causing real problems, we can probably change this, but I'd
> like a little feedback from the Fortran MPI dev community first.

On page 610, I see text disallowing the explicit interfaces in
ompi/mpi/fortran/use-mpi-tkr:

  In S2 and S3: Without such extensions, routines with choice buffers should
  be provided with an implicit interface, instead of overloading with a 
different
  MPI function for each possible buffer type (as mentioned in Section 17.1.11 on
  page 625). Such overloading would also imply restrictions for passing Fortran
  derived types as choice buffer, see also Section 17.1.15 on page 629.

Why did OMPI decide that this (presumably non-normative) text in the
standard was not worth following?  (Rejecting something in the standard
indicates stronger convictions than would be independently weighing the
benefits of each approach.)

> c) The design of the MPI-2 "mpi" module has multiple flaws that are
> identified in the MPI-3 text (but were not recognized back in MPI-2.x
> days).  Here's one: until F2008+addendums, there was no Fortran
> equivalent of "void *".  Hence, the mpi module has to overload
> MPI_Send() and have a prototype *for every possible type and
> dimension*.

And this is not possible, thus the text saying not to do it.

I don't call MPI from Fortran, but someone on a Fortran project that I
watch mentioned that the compiler would complain about such and such a
use (actually relating to types for MPI_Status in MPI_Recv rather than
buffer types).  My immediate response was "they can't do that because
without nonstandard or post-F08 extensions (or exposing the user to
c_loc), the type system cannot express those functions and thus you
cannot have explicit interfaces".  But then I looked at latest OMPI and
indeed, it was enumerating types, thus my email.

> Here's another fatal flaw: it's not possible for an MPI implementation
> to provide MPI_Send() prototypes for user-defined Fortran datatypes.
> Hence, the example you cite is a pipe dream for the "mpi" module
> because there's no way to specify a (void*)-like argument for the
> choice buffer.

F2003 has c_loc, which is a sufficient stop-gap until TS 29113 is widely
available.  I have long-advocated that the best way to write extensible
libraries for Fortran2003 callers (even if the library is implemented
entirely in Fortran) involves some use of c_loc (e.g., for context
arguments).

This annoys the Fortran programmers and they usually write perl scripts
to generate interfaces that enumerate the types they need and give up on
extensibility.  ;-)

It's nice to know that after 60 years (when Fortran 201x is released,
including TS 29113), there will be a Fortran standard with an analogue
of void*.

> Craig Rasmussen and I debated long and hard about whether to change
> the default from "small" to "medium" or not.  We finally ended up
> doing it with the following rationale:
>
> - very few codes use the "mpi" module

FWIW, I've noticed a few projects transition to it in the last few years.

> - but those who do should have the maximum amount of compile-time protection
>
> ...but we always knew that someone may come complaining some day.  And that 
> day has now come.
>
> So my question to you / the Fortran MPI dev community is: what do you want 
> (for gfortran)?  
>
> Do you want us to go back to the "small" size by default, or do you
> want more compile-time protection by default?  (with the obvious
> caveat that you can't use user-defined Fortran datatypes as choice
> buffers; you might be able to use something like c_loc, but I haven't
> thought deeply about this and don't know offhand if that works)

I can't answer this as a Fortran developer, but I know that a lot of
projects want some modicum of portability and in practice, it takes
almost 10 years to flush the old compilers out of production
environments.  Either the upgrade problem will need to be fixed [1] so
that nearly all existing machines have new compilers or Fortran projects
will be wrestling with this for a long time yet.

Most Fortran packages I know use homogeneous arrays, which also means
that they don't call MPI_Type_create_struct or similar functions.  If
those functions are going to be provided by the module, I think they
should be able to use them (e.g., examples in the Standard should work)
and the Standard's advice about implicit interfaces should be followed.

[1] Also, there are still production machines without MPI-2.0 and I get
email if I make a mistake in providing MPI-1 fallback paths.

pgp4Mn5eAmbuu.pgp
Description: PGP signature

[OMPI users] Regression: Fortran derived types with newer MPI module

2014-01-06 Thread Jed Brown

The attached code is from the example on page 629-630 (17.1.15 Fortran
Derived Types) of MPI-3.  This compiles cleanly with MPICH and with OMPI
1.6.5, but not with the latest OMPI.  Arrays higher than rank 4 would
have a similar problem since they are not enumerated.  Did someone
decide that a necessarily-incomplete enumeration of types was "good
enough" and that other users should use some other workaround?

$ ~/usr/ompi/bin/mpifort -c struct.f90 
struct.f90:40.55:

  call MPI_SEND(foo, 1, newtype, dest, tag, comm, ierr)
   1
Error: There is no specific subroutine for the generic 'mpi_send' at (1)
struct.f90:43.48:

  call MPI_GET_ADDRESS(fooarr(1), disp(1), ierr)
1
Error: There is no specific subroutine for the generic 'mpi_get_address' at (1)
struct.f90:44.48:

  call MPI_GET_ADDRESS(fooarr(2), disp(2), ierr)
1
Error: There is no specific subroutine for the generic 'mpi_get_address' at (1)
struct.f90:50.61:

  call MPI_SEND(fooarr, 5, newarrtype, dest, tag, comm, ierr)
 1
Error: There is no specific subroutine for the generic 'mpi_send' at (1)


$ ~/usr/ompi/bin/ompi_info
 Package: Open MPI jed@batura Distribution
Open MPI: 1.9a1
  Open MPI repo revision: r29531M
   Open MPI release date: Oct 26, 2013
Open RTE: 1.9a1
  Open RTE repo revision: r29531M
   Open RTE release date: Oct 26, 2013
OPAL: 1.9a1
  OPAL repo revision: r29531M
   OPAL release date: Oct 26, 2013
 MPI API: 2.2
Ident string: 1.9a1
  Prefix: /home/jed/usr/ompi
 Configured architecture: x86_64-unknown-linux-gnu
  Configure host: batura
   Configured by: jed
   Configured on: Mon Jan  6 19:38:01 CST 2014
  Configure host: batura
Built by: jed
Built on: Mon Jan  6 19:49:41 CST 2014
  Built host: batura
  C bindings: yes
C++ bindings: no
 Fort mpif.h: yes (all)
Fort use mpi: yes (limited: overloading)
   Fort use mpi size: deprecated-ompi-info-value
Fort use mpi_f08: no
 Fort mpi_f08 compliance: The mpi_f08 module was not built
  Fort mpi_f08 subarrays: no
   Java bindings: no
  Wrapper compiler rpath: runpath
  C compiler: gcc
 C compiler absolute: /usr/bin/gcc
  C compiler family name: GNU
  C compiler version: 4.8.2
C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
   Fort compiler: /usr/bin/gfortran
   Fort compiler abs: 
 Fort ignore TKR: no
   Fort 08 assumed shape: no
  Fort optional args: no
Fort BIND(C): no
Fort PRIVATE: no
   Fort ABSTRACT: no
   Fort ASYNCHRONOUS: no
  Fort PROCEDURE: no
 Fort f08 using wrappers: yes
 C profiling: yes
   C++ profiling: no
   Fort mpif.h profiling: yes
  Fort use mpi profiling: yes
   Fort use mpi_f08 prof: no
  C++ exceptions: no
  Thread support: posix (MPI_THREAD_MULTIPLE: no, OPAL support: yes, 
OMPI progress: no, ORTE progress: yes, Event lib: yes)
   Sparse Groups: no
  Internal debug support: yes
  MPI interface warnings: yes
 MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
 libltdl support: yes
   Heterogeneous support: no
 mpirun default --prefix: no
 MPI I/O support: yes
   MPI_WTIME support: gettimeofday
 Symbol vis. support: yes
   Host topology support: yes
  MPI extensions: 
   FT Checkpoint support: no (checkpoint thread: no)
   C/R Enabled Debugging: no
 VampirTrace support: yes
  MPI_MAX_PROCESSOR_NAME: 256
MPI_MAX_ERROR_STRING: 256
 MPI_MAX_OBJECT_NAME: 64
MPI_MAX_INFO_KEY: 36
MPI_MAX_INFO_VAL: 256
   MPI_MAX_PORT_NAME: 1024
  MPI_MAX_DATAREP_STRING: 128
   MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.9)
MCA compress: bzip (MCA v2.0, API v2.0, Component v1.9)
MCA compress: gzip (MCA v2.0, API v2.0, Component v1.9)
 MCA crs: none (MCA v2.0, API v2.0, Component v1.9)
  MCA db: hash (MCA v2.0, API v1.0, Component v1.9)
  MCA db: print (MCA v2.0, API v1.0, Component v1.9)
   MCA event: libevent2021 (MCA v2.0, API v2.0, Component v1.9)
   MCA hwloc: external (MCA v2.0, API v2.0, Component v1.9)
  MCA if: linux_ipv6 (MCA v2.0, API v2.0, Component v1.9)
  MCA if: posix_ipv4 (MCA v2.0, API v2.0, Component v1.9)
 MCA installdirs: env (MCA v2.0, API v2.0, Component v1.9)
 MCA installdirs: config (MCA v2.0, API v2.0, Component v1.9)
  MCA memchecker: valgrind (MCA v2.0, API v2.0, Component v1.9)

Re: [OMPI users] MPI process hangs if OpenMPI is compiled with --enable-thread-multiple

2013-11-24 Thread Jed Brown

Ralph Castain  writes:

> Given that we have no idea what Homebrew uses, I don't know how we
> could clarify/respond.

Pierre provided a link to MacPorts saying that all of the following
options were needed to properly enable threads.

 --enable-event-thread-support --enable-opal-multi-threads 
--enable-orte-progress-threads --enable-mpi-thread-multiple

If that is indeed the case, and if passing some subset of these options
results in deadlock, it's not exactly user-friendly.

Maybe --enable-mpi-thread-multiple is enough, in which case MacPorts is
doing something needlessly complicated and Pierre's link was a red
herring?

pgp2T2LDecJvY.pgp
Description: PGP signature

Re: [OMPI users] MPI process hangs if OpenMPI is compiled with --enable-thread-multiple

2013-11-24 Thread Jed Brown

Dominique Orban  writes:
> My question originates from a hang similar to the one I described in
> my first message in the PETSc tests. They still hang after I corrected
> the OpenMPI compile flags. I'm in touch with the PETSc folks as well
> about this.

Do you have an updated stack trace?


pgpg268u_56n9.pgp
Description: PGP signature

Re: [OMPI users] MPI process hangs if OpenMPI is compiled with --enable-thread-multiple

2013-11-24 Thread Jed Brown

Pierre Jolivet  writes:
> It looks like you are compiling Open MPI with Homebrew. The flags they use in 
> the formula when --enable-mpi-thread-multiple is wrong.
> c.f. a similar problem with MacPorts 
> https://lists.macosforge.org/pipermail/macports-tickets/2013-June/138145.html.

If these "wrong" configure flags cause deadlock, wouldn't you consider
it to be an Open MPI bug?  In decreasing order of preference, I would
say

1. simple configure flags work to enable feature

2. configure errors due to inconsistent flags

3. configure succeeds, but feature is not actually enabled (so no
   deadlock, though this is arguably already a bug)

pgptnSE6yMaWp.pgp
Description: PGP signature

[OMPI users] "C++ compiler absolute"

2013-06-01 Thread Jed Brown

I built from trunk a couple days ago and notice that mpicxx has
an erroneous path:

$ ~/usr/ompi/bin/mpicxx -show
no -I/homes/jedbrown/usr/ompi/include -pthread -Wl,-rpath 
-Wl,/homes/jedbrown/usr/ompi/lib -Wl,--enable-new-dtags 
-L/homes/jedbrown/usr/ompi/lib -lmpi

The C compiler is fine

$ ~/usr/ompi/bin/mpicc -show
/soft/apps/packages/gcc-4.8.0/bin/gcc -I/homes/jedbrown/usr/ompi/include 
-pthread -Wl,-rpath -Wl,/homes/jedbrown/usr/ompi/lib -Wl,--enable-new-dtags 
-L/homes/jedbrown/usr/ompi/lib -lmpi

I configured with

$ ../configure --prefix=/homes/jedbrown/usr/ompi 
CC=/soft/apps/packages/gcc-4.8.0/bin/gcc 
CXX=/soft/apps/packages/gcc-4.8.0/bin/g++ 
FC=/soft/apps/packages/gcc-4.8.0/bin/gfortran

These compilers all exist and the build/install went cleanly.  So where
does this come from?

   C++ compiler absolute: none


$ ~/usr/ompi/bin/ompi_info  
  [140/9616]
 Package: Open MPI jedbrown@cg Distribution
Open MPI: 1.9a1
  Open MPI repo revision: r28134M
   Open MPI release date: Feb 28, 2013
Open RTE: 1.9a1
  Open RTE repo revision: r28134M
   Open RTE release date: Feb 28, 2013
OPAL: 1.9a1
  OPAL repo revision: r28134M
   OPAL release date: Feb 28, 2013
 MPI API: 2.1
Ident string: 1.9a1
  Prefix: /homes/jedbrown/usr/ompi
 Configured architecture: x86_64-unknown-linux-gnu
  Configure host: cg
   Configured by: jedbrown
   Configured on: Thu May 23 21:07:25 CDT 2013
  Configure host: cg
Built by: jedbrown
Built on: Thu May 23 21:19:23 CDT 2013
  Built host: cg
  C bindings: yes
C++ bindings: no
 Fort mpif.h: yes (all)
Fort use mpi: yes (limited: overloading)
   Fort use mpi size: deprecated-ompi-info-value
Fort use mpi_f08: no
 Fort mpi_f08 compliance: The mpi_f08 module was not built
  Fort mpi_f08 subarrays: no
   Java bindings: no
  Wrapper compiler rpath: runpath
  C compiler: /soft/apps/packages/gcc-4.8.0/bin/gcc
 C compiler absolute: 
  C compiler family name: GNU
  C compiler version: 4.8.0
C++ compiler: /soft/apps/packages/gcc-4.8.0/bin/g++
   C++ compiler absolute: none
   Fort compiler: /soft/apps/packages/gcc-4.8.0/bin/gfortran
   Fort compiler abs: 
 Fort ignore TKR: no
   Fort 08 assumed shape: no
  Fort optional args: no
Fort BIND(C): no
Fort PRIVATE: no
   Fort ABSTRACT: no
   Fort ASYNCHRONOUS: no
  Fort PROCEDURE: no
 Fort f08 using wrappers: yes
 C profiling: yes
   C++ profiling: no
   Fort mpif.h profiling: yes
  Fort use mpi profiling: yes
   Fort use mpi_f08 prof: no
  C++ exceptions: no
  Thread support: posix (MPI_THREAD_MULTIPLE: no, OPAL support: no,
  OMPI progress: no, ORTE progress: no, Event lib:
  yes)
   Sparse Groups: no
  Internal debug support: no
  MPI interface warnings: yes
 MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
 libltdl support: yes
   Heterogeneous support: no
 mpirun default --prefix: no
 MPI I/O support: yes
   MPI_WTIME support: gettimeofday
 Symbol vis. support: yes
   Host topology support: yes
  MPI extensions: 
   FT Checkpoint support: no (checkpoint thread: no)
   C/R Enabled Debugging: no
 VampirTrace support: yes
  MPI_MAX_PROCESSOR_NAME: 256
MPI_MAX_ERROR_STRING: 256
 MPI_MAX_OBJECT_NAME: 64
MPI_MAX_INFO_KEY: 36
MPI_MAX_INFO_VAL: 256
   MPI_MAX_PORT_NAME: 1024
  MPI_MAX_DATAREP_STRING: 128
   MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.9)
MCA compress: bzip (MCA v2.0, API v2.0, Component v1.9)
MCA compress: gzip (MCA v2.0, API v2.0, Component v1.9)
 MCA crs: none (MCA v2.0, API v2.0, Component v1.9)
  MCA db: hash (MCA v2.0, API v1.0, Component v1.9)
  MCA db: print (MCA v2.0, API v1.0, Component v1.9)
   MCA event: libevent2019 (MCA v2.0, API v2.0, Component v1.9)
   MCA hwloc: hwloc152 (MCA v2.0, API v2.0, Component v1.9)
  MCA if: linux_ipv6 (MCA v2.0, API v2.0, Component v1.9)
  MCA if: posix_ipv4 (MCA v2.0, API v2.0, Component v1.9)
 MCA installdirs: env (MCA v2.0, API v2.0, Component v1.9)
 MCA installdirs: config (MCA v2.0, API v2.0, Component v1.9)
  MCA memory: linux (MCA v2.0, API v2.0, Component v1.9)
   MCA pstat: linux (MCA v2.0, API v2.0, Component v1.9)
   MCA shmem: mmap (MCA v2.0, API v2.0, Component v1.9)
   MCA shmem: posix (MCA v2.0, API v2.

Re: [OMPI users] One-sided bugs

2012-12-30 Thread Jed Brown

I've resolved the problem in a satisfactory way by circumventing one-sided
entirely. I.e., this issue is finally closed:

https://bitbucket.org/petsc/petsc-dev/issue/9/implement-petscsf-without-one-sided

Users can proceed anyway using the run-time option
-acknowledge_ompi_onesided_bug, which will also be a convenient way to test
an eventual fix (beyond the reduced test cases that have been sitting in
your bug tracker for several years). (This is only relevant with -sf_type
window; the default no longer uses one-sided.)

I would still like to encourage Open MPI to deliver an error message in
this known broken case instead of silently stomping all over the user's
memory.

On Tue, Sep 11, 2012 at 2:23 PM, Jed Brown  wrote:

> *Bump*
>
> There doesn't seem to have been any progress on this. Can you at least
> have an error message saying that Open MPI one-sided does not work with
> datatypes instead of silently causing wanton corruption and deadlock?
>
>
> On Thu, Dec 22, 2011 at 4:17 PM, Jed Brown  wrote:
>
>> [Forgot the attachment.]
>>
>>
>> On Thu, Dec 22, 2011 at 15:16, Jed Brown  wrote:
>>
>>> I wrote a new communication layer that we are evaluating for use in mesh
>>> management and PDE solvers, but it is based on MPI-2 one-sided operations
>>> (and will eventually benefit from some of the MPI-3 one-sided proposals,
>>> especially MPI_Fetch_and_op() and dynamic windows). All the basic
>>> functionality works well with MPICH2, but I have run into some Open MPI
>>> bugs regarding one-sided operations with composite data types. This email
>>> provides a reduced test case for two such bugs. I see that there are also
>>> some existing serious-looking bug reports regarding one-sided operations,
>>> but they are getting pretty old now and haven't seen action in a while.
>>>
>>> https://svn.open-mpi.org/trac/ompi/ticket/2656
>>> https://svn.open-mpi.org/trac/ompi/ticket/1905
>>>
>>> Is there a plan for resolving these in the near future?
>>>
>>> Is anyone using Open MPI for serious work with one-sided operations?
>>>
>>>
>>> Bugs I am reporting:
>>>
>>> *1.* If an MPI_Win is used with an MPI_Datatype, even if the MPI_Win
>>> operation has completed, I get an invalid free when MPI_Type_free() is
>>> called before MPI_Win_free(). Since MPI_Type_free() is only supposed to
>>> mark the datatype for deletion, the implementation should properly manage
>>> reference counting. If you run the attached code with
>>>
>>> $ mpiexec -n 2 ./a.out 1
>>>
>>> (which only does part of the comm described for the second bug, below),
>>> you can see the invalid free on rank 1 with stack still in MPI_Win_fence()
>>>
>>> (gdb) bt
>>> #0  0x77288905 in raise () from /lib/libc.so.6
>>> #1  0x77289d7b in abort () from /lib/libc.so.6
>>> #2  0x772c147e in __libc_message () from /lib/libc.so.6
>>> #3  0x772c7396 in malloc_printerr () from /lib/libc.so.6
>>> #4  0x772cb26c in free () from /lib/libc.so.6
>>> #5  0x77a5aaa8 in ompi_datatype_release_args (pData=0x845010) at
>>> ompi_datatype_args.c:414
>>> #6  0x77a5b0ea in __ompi_datatype_release (datatype=0x845010) at
>>> ompi_datatype_create.c:47
>>> #7  0x7218e772 in opal_obj_run_destructors (object=0x845010) at
>>> ../../../../opal/class/opal_object.h:448
>>> #8  ompi_osc_rdma_replyreq_free (replyreq=0x680a80) at
>>> osc_rdma_replyreq.h:136
>>> #9  ompi_osc_rdma_replyreq_send_cb (btl=0x73680ce0,
>>> endpoint=, descriptor=0x837b00, status=) at
>>> osc_rdma_data_move.c:691
>>> #10 0x7347f38f in mca_btl_sm_component_progress () at
>>> btl_sm_component.c:645
>>> #11 0x77b1f80a in opal_progress () at runtime/opal_progress.c:207
>>> #12 0x721977c5 in opal_condition_wait (m=,
>>> c=0x842ee0) at ../../../../opal/threads/condition.h:99
>>> #13 ompi_osc_rdma_module_fence (assert=0, win=0x842270) at
>>> osc_rdma_sync.c:207
>>> #14 0x77a89db5 in PMPI_Win_fence (assert=0, win=0x842270) at
>>> pwin_fence.c:60
>>> #15 0x004010d8 in main (argc=2, argv=0x7fffd508) at win.c:60
>>>
>>> meanwhile, rank 0 has already freed the datatype and is waiting in
>>> MPI_Win_free().
>>> (gdb) bt
>>> #0  0x77312107 in sched_yield () from /lib/libc.so.6
>>> #1  0x77b1f82b in opal_progress () at runtime/opal_progress.c:220
>>> #2  0x000

Re: [OMPI users] Setting RPATH for Open MPI libraries

2012-11-04 Thread Jed Brown

Jeff, we are averaging a half dozen support threads per week on PETSc
lists/email caused by lack of RPATH in Open MPI for non-standard install
locations. Can you either make the necessary environment modification more
visible for novice users or implement the RPATH option?


On Wed, Sep 12, 2012 at 1:52 PM, Jed Brown  wrote:

> On Wed, Sep 12, 2012 at 10:20 AM, Jeff Squyres  wrote:
>
>> We have a long-standing philosophy that OMPI should add the bare minimum
>> number of preprocessor/compiler/linker flags to its wrapper compilers, and
>> let the user/administrator customize from there.
>>
>
> In general, I agree with that philosophy.
>
>
>>
>> That being said, a looong time ago, I started a patch to add a
>> --with-rpath option to configure, but never finished it.  :-\  I can try to
>> get it back on my to-do list.
>>
>
> That would be perfect.
>
>
>>
>> For the moment, you might want to try the configure
>> --enable-mpirun-prefix-by-default option, too.
>>
>
> The downside is that we tend not to bother with the mpirun for configure
> and it's a little annoying to "mpirun ldd" when hunting for other problems
> (e.g. a missing shared lib unrelated to Open MPI).
>

Re: [OMPI users] Setting RPATH for Open MPI libraries

2012-09-12 Thread Jed Brown

On Wed, Sep 12, 2012 at 10:20 AM, Jeff Squyres  wrote:

> We have a long-standing philosophy that OMPI should add the bare minimum
> number of preprocessor/compiler/linker flags to its wrapper compilers, and
> let the user/administrator customize from there.
>

In general, I agree with that philosophy.

>
> That being said, a looong time ago, I started a patch to add a
> --with-rpath option to configure, but never finished it.  :-\  I can try to
> get it back on my to-do list.
>

That would be perfect.

>
> For the moment, you might want to try the configure
> --enable-mpirun-prefix-by-default option, too.
>

The downside is that we tend not to bother with the mpirun for configure
and it's a little annoying to "mpirun ldd" when hunting for other problems
(e.g. a missing shared lib unrelated to Open MPI).

Re: [OMPI users] One-sided bugs

2012-09-11 Thread Jed Brown

*Bump*

There doesn't seem to have been any progress on this. Can you at least have
an error message saying that Open MPI one-sided does not work with
datatypes instead of silently causing wanton corruption and deadlock?

On Thu, Dec 22, 2011 at 4:17 PM, Jed Brown  wrote:

> [Forgot the attachment.]
>
>
> On Thu, Dec 22, 2011 at 15:16, Jed Brown  wrote:
>
>> I wrote a new communication layer that we are evaluating for use in mesh
>> management and PDE solvers, but it is based on MPI-2 one-sided operations
>> (and will eventually benefit from some of the MPI-3 one-sided proposals,
>> especially MPI_Fetch_and_op() and dynamic windows). All the basic
>> functionality works well with MPICH2, but I have run into some Open MPI
>> bugs regarding one-sided operations with composite data types. This email
>> provides a reduced test case for two such bugs. I see that there are also
>> some existing serious-looking bug reports regarding one-sided operations,
>> but they are getting pretty old now and haven't seen action in a while.
>>
>> https://svn.open-mpi.org/trac/ompi/ticket/2656
>> https://svn.open-mpi.org/trac/ompi/ticket/1905
>>
>> Is there a plan for resolving these in the near future?
>>
>> Is anyone using Open MPI for serious work with one-sided operations?
>>
>>
>> Bugs I am reporting:
>>
>> *1.* If an MPI_Win is used with an MPI_Datatype, even if the MPI_Win
>> operation has completed, I get an invalid free when MPI_Type_free() is
>> called before MPI_Win_free(). Since MPI_Type_free() is only supposed to
>> mark the datatype for deletion, the implementation should properly manage
>> reference counting. If you run the attached code with
>>
>> $ mpiexec -n 2 ./a.out 1
>>
>> (which only does part of the comm described for the second bug, below),
>> you can see the invalid free on rank 1 with stack still in MPI_Win_fence()
>>
>> (gdb) bt
>> #0  0x77288905 in raise () from /lib/libc.so.6
>> #1  0x77289d7b in abort () from /lib/libc.so.6
>> #2  0x772c147e in __libc_message () from /lib/libc.so.6
>> #3  0x772c7396 in malloc_printerr () from /lib/libc.so.6
>> #4  0x772cb26c in free () from /lib/libc.so.6
>> #5  0x77a5aaa8 in ompi_datatype_release_args (pData=0x845010) at
>> ompi_datatype_args.c:414
>> #6  0x77a5b0ea in __ompi_datatype_release (datatype=0x845010) at
>> ompi_datatype_create.c:47
>> #7  0x7218e772 in opal_obj_run_destructors (object=0x845010) at
>> ../../../../opal/class/opal_object.h:448
>> #8  ompi_osc_rdma_replyreq_free (replyreq=0x680a80) at
>> osc_rdma_replyreq.h:136
>> #9  ompi_osc_rdma_replyreq_send_cb (btl=0x73680ce0,
>> endpoint=, descriptor=0x837b00, status=) at
>> osc_rdma_data_move.c:691
>> #10 0x7347f38f in mca_btl_sm_component_progress () at
>> btl_sm_component.c:645
>> #11 0x77b1f80a in opal_progress () at runtime/opal_progress.c:207
>> #12 0x721977c5 in opal_condition_wait (m=,
>> c=0x842ee0) at ../../../../opal/threads/condition.h:99
>> #13 ompi_osc_rdma_module_fence (assert=0, win=0x842270) at
>> osc_rdma_sync.c:207
>> #14 0x77a89db5 in PMPI_Win_fence (assert=0, win=0x842270) at
>> pwin_fence.c:60
>> #15 0x004010d8 in main (argc=2, argv=0x7fffd508) at win.c:60
>>
>> meanwhile, rank 0 has already freed the datatype and is waiting in
>> MPI_Win_free().
>> (gdb) bt
>> #0  0x77312107 in sched_yield () from /lib/libc.so.6
>> #1  0x77b1f82b in opal_progress () at runtime/opal_progress.c:220
>> #2  0x77a53fe4 in opal_condition_wait (m=,
>> c=) at ../opal/threads/condition.h:99
>> #3  ompi_request_default_wait_all (count=2, requests=0x7fffd210,
>> statuses=0x7fffd1e0) at request/req_wait.c:263
>> #4  0x725b8d71 in ompi_coll_tuned_sendrecv_actual (sendbuf=0x0,
>> scount=0, sdatatype=0x77dba840, dest=1, stag=-16, recvbuf=> out>, rcount=0, rdatatype=0x77dba840, source=1, rtag=-16,
>> comm=0x8431a0, status=0x0) at coll_tuned_util.c:54
>> #5  0x725c2de2 in ompi_coll_tuned_barrier_intra_two_procs
>> (comm=, module=) at coll_tuned_barrier.c:256
>> #6  0x725b92ab in ompi_coll_tuned_barrier_intra_dec_fixed
>> (comm=0x8431a0, module=0x844980) at coll_tuned_decision_fixed.c:190
>> #7  0x72186248 in ompi_osc_rdma_module_free (win=0x842170) at
>> osc_rdma.c:46
>> #8  0x77a58a44 in ompi_win_free (win=0x842170) at win/win.c:150
>> #9  0x77a8a0dd in PMPI_Win_free (win=0x7fffd408) at
>> pwin_free.c:

Re: [OMPI users] Setting RPATH for Open MPI libraries

2012-09-11 Thread Jed Brown

On Tue, Sep 11, 2012 at 2:29 PM, Reuti  wrote:

> With "user" you mean someone compiling Open MPI?


Yes

Re: [OMPI users] Setting RPATH for Open MPI libraries

2012-09-11 Thread Jed Brown

I want to avoid the user having to figure that out. MPICH2 sets RPATH by
default when installed to nonstandard locations and I think that is not a
bad choice. Usually applications are compiled differetly when the want to
switch between debug and optimized (or other reasons for selecting a
different library using LD_LIBRARY_PATH).
On Sep 8, 2012 2:48 PM, "Reuti"  wrote:

> Hi,
>
> Am 08.09.2012 um 14:46 schrieb Jed Brown:
>
> > Is there a way to configure Open MPI to use RPATH without needing to
> manually specify --with-wrapper-ldflags=-Wl,-rpath,${prefix}/lib (and
> similar for non-GNU-compatible compilers)?
> ___
>
> What do you want to achieve in detail - just shorten the `./configure`
> command line? You could also add it after Open MPI's compilation in the
> text file:
>
> ${prefix}/share/openmpi/mpicc-wrapper-data.txt
>
> -- Reuti
>
>
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

[OMPI users] Setting RPATH for Open MPI libraries

2012-09-08 Thread Jed Brown

Is there a way to configure Open MPI to use RPATH without needing to
manually specify --with-wrapper-ldflags=-Wl,-rpath,${prefix}/lib (and
similar for non-GNU-compatible compilers)?

Re: [OMPI users] [EXTERNAL] Re: mpicc link shouldn't add -ldl and -lhwloc

2012-05-31 Thread Jed Brown

On Thu, May 31, 2012 at 6:20 AM, Jeff Squyres  wrote:

> On May 29, 2012, at 11:42 AM, Jed Brown wrote:
>
> > The pkg-config approach is to use pkg-config --static if you want to
> link that library statically.
>
> Do the OMPI pkg-config files not do this properly?
>

Looks right to me. I think the complaint was that there was no way to
specify the equivalent using wrapper compilers. I don't like the wrapper
compiler model (certainly not for languages with a common ABI like C), but
pkg-config doesn't have a good way to manage multiple configurations.


>
> > So the problem is almost exclusively one of binary compatibility. If an
> app or library is only linked to the interface libs, underlying system
> libraries can be upgraded to different soname without needing to relink the
> applications. For example, libhwloc could be upgraded to a different ABI,
> Open MPI rebuilt, and then the user application and intermediate MPI-based
> libraries would not need to be rebuilt. This is great for distributions and
> convenient if you work on projects with lots of dependencies.
> >
> > It's not such an issue for HPC applications because we tend to recompile
> a lot and don't need binary compatibility for many of the most common use
> cases. There is also the linker option -Wl,--as-needed that usually does
> what is desired.
>
> Mmmm.  Ok.  Brian and I are going to be in the same physical location next
> week; I'll chat with him about this.

Re: [OMPI users] [EXTERNAL] Re: mpicc link shouldn't add -ldl and -lhwloc

2012-05-29 Thread Jed Brown

On Tue, May 29, 2012 at 9:05 AM, Jeff Squyres  wrote:
>
>
> We've tossed around ideas such as having the wrappers always assume
> dynamic linking (e.g., only include a minimum of libraries), and then add
> another wrapper option like --wrapper:static (or whatever) to know when to
> add in all the dependent libraries.  Or possibly even look for some popular
> linker options like --static, or some such (which we've tried to avoid,
> because that can turn into a slippery slope), but such switches aren't
> always necessary for MPI-only-static (vs. completely-100%-static) linking.
>  It gets even fuzzier when both libmpi.so and libmpi.a are present.  Which
> way should we assume?
>
> Another problem is backwards compatibility -- users who are currently
> statically linking will assume the old behavior (of not needing to specify
> anything additional).
>
> > Now I'm not saying that Open MPI should commit to pkg-config instead of
> wrapper compilers, but the concept of linking differently for static versus
> shared libraries is something that should be observed.
>
> Fair enough.  But we've never been able to come up with a rational way to
> do it (note that pkg-config has its own problems -- OMPI provides
> pkg-config files in addition to wrapper compilers, but they don't fix
> everything, either).
>
> We have users who both --enable-static and --enable-shared (meaning: both
> libmpi.so and libmpi.a are present).  And therefore we've come down on the
> conservative side of adding in whatever is necessary for static linking.
>

The pkg-config approach is to use pkg-config --static if you want to link
that library statically.


>
> > (Over-linking is an ongoing problem with HPC-oriented packages. We are
> probably all guilty of it, but tools like pkg-config don't handle multiple
> configurations well and I don't know of a similar system that manages both
> static/shared and multi-configuration well.)
>
> I suppose, but it does depend on how you define "problem".  The linker
> will ignore any unused libraries -- so it's a problem like lint is a
> problem.  It's annoying, but it doesn't do any harm.
>
> ...or are there cases where it actually does something harmful?
>

So the problem is almost exclusively one of binary compatibility. If an app
or library is only linked to the interface libs, underlying system
libraries can be upgraded to different soname without needing to relink the
applications. For example, libhwloc could be upgraded to a different ABI,
Open MPI rebuilt, and then the user application and intermediate MPI-based
libraries would not need to be rebuilt. This is great for distributions and
convenient if you work on projects with lots of dependencies.

It's not such an issue for HPC applications because we tend to recompile a
lot and don't need binary compatibility for many of the most common use
cases. There is also the linker option -Wl,--as-needed that usually does
what is desired.

Re: [OMPI users] [EXTERNAL] Re: mpicc link shouldn't add -ldl and -lhwloc

2012-05-27 Thread Jed Brown

On Wed, May 23, 2012 at 8:29 AM, Barrett, Brian W wrote:

> >I should add the caveat that they are need when linking statically, but
> >not when using shared libraries.
>
> And therein lies the problem.  We have a number of users who build Open
> MPI statically and even some who build both static and shared libraries in
> the same build.  We've never been able to figure out a reasonable way to
> guess if we need to add -lhwloc or -ldl, so we add them.  It's better to
> list them and have some redundant dependencies (since you have to have the
> library anyways) than to not list them and have odd link errors.


So pkg-config has the --static option for exactly this reason. Let's look
at Cairo as an example.

$ cat /usr/lib/pkgconfig/cairo.pc
prefix=/usr
exec_prefix=${prefix}
libdir=${exec_prefix}/lib
includedir=${prefix}/include

Name: cairo
Description: Multi-platform 2D graphics library
Version: 1.12.2

Requires.private:   gobject-2.0 glib-2.0   pixman-1 >= 0.22.0
 fontconfig >= 2.2.95 freetype2 >= 9.7.3   libpng xcb-shm xcb >= 1.6
xcb-render >= 1.6 xrender >= 0.6 x11
Libs: -L${libdir} -lcairo
Libs.private:-lz -lz
Cflags: -I${includedir}/cairo

$ pkg-config cairo --libs
-lcairo
$ pkg-config cairo --libs --static
-pthread -lcairo -lgobject-2.0 -lffi -lpixman-1 -lfontconfig -lexpat
-lfreetype -lbz2 -lpng15 -lz -lm -lxcb-shm -lxcb-render -lXrender
-lglib-2.0 -lrt -lpcre -lX11 -lpthread -lxcb -lXau -lXdmcp
$ ldd /usr/lib/libcairo.so
linux-vdso.so.1 =>  (0x7fff741ff000)
libpthread.so.0 => /lib/libpthread.so.0 (0x7f135eac7000)
libpixman-1.so.0 => /usr/lib/libpixman-1.so.0 (0x7f135e83f000)
libfontconfig.so.1 => /usr/lib/libfontconfig.so.1
(0x7f135e608000)
libfreetype.so.6 => /usr/lib/libfreetype.so.6 (0x7f135e369000)
libpng15.so.15 => /usr/lib/libpng15.so.15 (0x7f135e13c000)
libxcb-shm.so.0 => /usr/lib/libxcb-shm.so.0 (0x7f135df39000)
libxcb-render.so.0 => /usr/lib/libxcb-render.so.0
(0x7f135dd3)
libxcb.so.1 => /usr/lib/libxcb.so.1 (0x7f135db12000)
libXrender.so.1 => /usr/lib/libXrender.so.1 (0x7f135d906000)
libX11.so.6 => /usr/lib/libX11.so.6 (0x7f135d5cc000)
libz.so.1 => /usr/lib/libz.so.1 (0x7f135d3b6000)
librt.so.1 => /lib/librt.so.1 (0x7f135d1ad000)
libm.so.6 => /lib/libm.so.6 (0x7f135ceb8000)
libc.so.6 => /lib/libc.so.6 (0x7f135cb17000)
/lib/ld-linux-x86-64.so.2 (0x7f135f012000)
libbz2.so.1.0 => /usr/lib/libbz2.so.1.0 (0x7f135c906000)
libexpat.so.1 => /usr/lib/libexpat.so.1 (0x7f135c6dc000)
libXau.so.6 => /usr/lib/libXau.so.6 (0x7f135c4d8000)
libXdmcp.so.6 => /usr/lib/libXdmcp.so.6 (0x7f135c2d1000)
libdl.so.2 => /lib/libdl.so.2 (0x7f135c0cd000)

Now I'm not saying that Open MPI should commit to pkg-config instead of
wrapper compilers, but the concept of linking differently for static versus
shared libraries is something that should be observed.

(Over-linking is an ongoing problem with HPC-oriented packages. We are
probably all guilty of it, but tools like pkg-config don't handle multiple
configurations well and I don't know of a similar system that manages both
static/shared and multi-configuration well.)

Re: [OMPI users] AlltoallV (with some zero send count values)

2012-03-06 Thread Jed Brown

On Tue, Mar 6, 2012 at 15:43, Timothy Stitt  wrote:

> Can anyone tell me whether it is legal to pass zero values for some of the
> send count elements in an MPI_AlltoallV() call? I want to perform an
> all-to-all operation but for performance reasons do not want to send data
> to various ranks who don't need to receive any useful values. If it is
> legal, can I assume the implementation is smart enough to not send messages
> when the send count is 0?
>
> FYI: my tests show that AlltoallV operations with various send count
> values set to 0...hangs.
>

This is allowed by the standard, but be warned that it is likely to perform
poorly compared to what could be done with point-to-point or one-sided
operations if most links are empty.

Re: [OMPI users] parallelising ADI

2012-03-06 Thread Jed Brown

On Tue, Mar 6, 2012 at 16:23, Tim Prince  wrote:

>  On 03/06/2012 03:59 PM, Kharche, Sanjay wrote:
>
>> Hi
>>
>> I am working on a 3D ADI solver for the heat equation. I have implemented
>> it as serial. Would anybody be able to indicate the best and more
>> straightforward way to parallelise it. Apologies if this is going to the
>> wrong forum.
>>
>>
>>  If it's to be implemented in parallelizable fashion (not SSOR style
> where each line uses updates from the previous line), it should be feasible
> to divide the outer loop into an appropriate number of blocks, or decompose
> the physical domain and perform ADI on individual blocks, then update and
> repeat.

True ADI has inherently high communication cost because a lot of data
movement is needed to make the _fundamentally sequential_ tridiagonal
solves local enough that latency doesn't kill you trying to keep those
solves distributed. This also applies (albeit to a lesser degree) in serial
due to way memory works.

If you only do non-overlapping subdomain solves, you must use a Krylov
method just to ensure convergence. You can add overlap, but the Krylov
method is still necessary for any practical convergence rate. The method
will also require an iteration count proportional to the number of
subdomains across the global domain times the square root of the number of
elements across a subdomain. The constants may not be small and this
asymptotic result is independent of what the subdomain solver is. You need
a coarse level to improve this scaling.

Sanjay, as Matt and I recommended when you asked the same question on the
PETSc list this morning, unless this is a homework assignment, you should
solve your problem with multigrid instead of ADI. We pointed you to simple
example code that scales well to many thousands of processes.

Re: [OMPI users] One-sided bugs

2011-12-22 Thread Jed Brown

[Forgot the attachment.]

On Thu, Dec 22, 2011 at 15:16, Jed Brown  wrote:

> I wrote a new communication layer that we are evaluating for use in mesh
> management and PDE solvers, but it is based on MPI-2 one-sided operations
> (and will eventually benefit from some of the MPI-3 one-sided proposals,
> especially MPI_Fetch_and_op() and dynamic windows). All the basic
> functionality works well with MPICH2, but I have run into some Open MPI
> bugs regarding one-sided operations with composite data types. This email
> provides a reduced test case for two such bugs. I see that there are also
> some existing serious-looking bug reports regarding one-sided operations,
> but they are getting pretty old now and haven't seen action in a while.
>
> https://svn.open-mpi.org/trac/ompi/ticket/2656
> https://svn.open-mpi.org/trac/ompi/ticket/1905
>
> Is there a plan for resolving these in the near future?
>
> Is anyone using Open MPI for serious work with one-sided operations?
>
>
> Bugs I am reporting:
>
> *1.* If an MPI_Win is used with an MPI_Datatype, even if the MPI_Win
> operation has completed, I get an invalid free when MPI_Type_free() is
> called before MPI_Win_free(). Since MPI_Type_free() is only supposed to
> mark the datatype for deletion, the implementation should properly manage
> reference counting. If you run the attached code with
>
> $ mpiexec -n 2 ./a.out 1
>
> (which only does part of the comm described for the second bug, below),
> you can see the invalid free on rank 1 with stack still in MPI_Win_fence()
>
> (gdb) bt
> #0  0x77288905 in raise () from /lib/libc.so.6
> #1  0x77289d7b in abort () from /lib/libc.so.6
> #2  0x772c147e in __libc_message () from /lib/libc.so.6
> #3  0x772c7396 in malloc_printerr () from /lib/libc.so.6
> #4  0x772cb26c in free () from /lib/libc.so.6
> #5  0x77a5aaa8 in ompi_datatype_release_args (pData=0x845010) at
> ompi_datatype_args.c:414
> #6  0x77a5b0ea in __ompi_datatype_release (datatype=0x845010) at
> ompi_datatype_create.c:47
> #7  0x7218e772 in opal_obj_run_destructors (object=0x845010) at
> ../../../../opal/class/opal_object.h:448
> #8  ompi_osc_rdma_replyreq_free (replyreq=0x680a80) at
> osc_rdma_replyreq.h:136
> #9  ompi_osc_rdma_replyreq_send_cb (btl=0x73680ce0,
> endpoint=, descriptor=0x837b00, status=) at
> osc_rdma_data_move.c:691
> #10 0x7347f38f in mca_btl_sm_component_progress () at
> btl_sm_component.c:645
> #11 0x77b1f80a in opal_progress () at runtime/opal_progress.c:207
> #12 0x721977c5 in opal_condition_wait (m=,
> c=0x842ee0) at ../../../../opal/threads/condition.h:99
> #13 ompi_osc_rdma_module_fence (assert=0, win=0x842270) at
> osc_rdma_sync.c:207
> #14 0x77a89db5 in PMPI_Win_fence (assert=0, win=0x842270) at
> pwin_fence.c:60
> #15 0x004010d8 in main (argc=2, argv=0x7fffd508) at win.c:60
>
> meanwhile, rank 0 has already freed the datatype and is waiting in
> MPI_Win_free().
> (gdb) bt
> #0  0x77312107 in sched_yield () from /lib/libc.so.6
> #1  0x77b1f82b in opal_progress () at runtime/opal_progress.c:220
> #2  0x77a53fe4 in opal_condition_wait (m=,
> c=) at ../opal/threads/condition.h:99
> #3  ompi_request_default_wait_all (count=2, requests=0x7fffd210,
> statuses=0x7fffd1e0) at request/req_wait.c:263
> #4  0x725b8d71 in ompi_coll_tuned_sendrecv_actual (sendbuf=0x0,
> scount=0, sdatatype=0x77dba840, dest=1, stag=-16, recvbuf= out>, rcount=0, rdatatype=0x77dba840, source=1, rtag=-16,
> comm=0x8431a0, status=0x0) at coll_tuned_util.c:54
> #5  0x725c2de2 in ompi_coll_tuned_barrier_intra_two_procs
> (comm=, module=) at coll_tuned_barrier.c:256
> #6  0x725b92ab in ompi_coll_tuned_barrier_intra_dec_fixed
> (comm=0x8431a0, module=0x844980) at coll_tuned_decision_fixed.c:190
> #7  0x72186248 in ompi_osc_rdma_module_free (win=0x842170) at
> osc_rdma.c:46
> #8  0x77a58a44 in ompi_win_free (win=0x842170) at win/win.c:150
> #9  0x77a8a0dd in PMPI_Win_free (win=0x7fffd408) at
> pwin_free.c:56
> #10 0x00401195 in main (argc=2, argv=0x7fffd508) at win.c:69
>
>
> *2.* This appears to be more fundamental and perhaps much harder to fix.
> The attached code sets up the following graph
>
> rank 0:
> 0 -> (1,0)
> 1 -> nothing
> 2 -> (1,1)
>
> rank 1:
> 0 -> (0,0)
> 1 -> (0,2)
> 2 -> (0,1)
>
> We pull over this graph using two calls to MPI_Get(), each with composite
> data types defining what to pull into the first two slots, and what to put
> into the third slot. It is Valgrind-clean with MPICH2, and produces the
> fol

[OMPI users] One-sided bugs

2011-12-22 Thread Jed Brown

I wrote a new communication layer that we are evaluating for use in mesh
management and PDE solvers, but it is based on MPI-2 one-sided operations
(and will eventually benefit from some of the MPI-3 one-sided proposals,
especially MPI_Fetch_and_op() and dynamic windows). All the basic
functionality works well with MPICH2, but I have run into some Open MPI
bugs regarding one-sided operations with composite data types. This email
provides a reduced test case for two such bugs. I see that there are also
some existing serious-looking bug reports regarding one-sided operations,
but they are getting pretty old now and haven't seen action in a while.

https://svn.open-mpi.org/trac/ompi/ticket/2656
https://svn.open-mpi.org/trac/ompi/ticket/1905

Is there a plan for resolving these in the near future?

Is anyone using Open MPI for serious work with one-sided operations?


Bugs I am reporting:

*1.* If an MPI_Win is used with an MPI_Datatype, even if the MPI_Win
operation has completed, I get an invalid free when MPI_Type_free() is
called before MPI_Win_free(). Since MPI_Type_free() is only supposed to
mark the datatype for deletion, the implementation should properly manage
reference counting. If you run the attached code with

$ mpiexec -n 2 ./a.out 1

(which only does part of the comm described for the second bug, below), you
can see the invalid free on rank 1 with stack still in MPI_Win_fence()

(gdb) bt
#0  0x77288905 in raise () from /lib/libc.so.6
#1  0x77289d7b in abort () from /lib/libc.so.6
#2  0x772c147e in __libc_message () from /lib/libc.so.6
#3  0x772c7396 in malloc_printerr () from /lib/libc.so.6
#4  0x772cb26c in free () from /lib/libc.so.6
#5  0x77a5aaa8 in ompi_datatype_release_args (pData=0x845010) at
ompi_datatype_args.c:414
#6  0x77a5b0ea in __ompi_datatype_release (datatype=0x845010) at
ompi_datatype_create.c:47
#7  0x7218e772 in opal_obj_run_destructors (object=0x845010) at
../../../../opal/class/opal_object.h:448
#8  ompi_osc_rdma_replyreq_free (replyreq=0x680a80) at
osc_rdma_replyreq.h:136
#9  ompi_osc_rdma_replyreq_send_cb (btl=0x73680ce0, endpoint=, descriptor=0x837b00, status=) at
osc_rdma_data_move.c:691
#10 0x7347f38f in mca_btl_sm_component_progress () at
btl_sm_component.c:645
#11 0x77b1f80a in opal_progress () at runtime/opal_progress.c:207
#12 0x721977c5 in opal_condition_wait (m=,
c=0x842ee0) at ../../../../opal/threads/condition.h:99
#13 ompi_osc_rdma_module_fence (assert=0, win=0x842270) at
osc_rdma_sync.c:207
#14 0x77a89db5 in PMPI_Win_fence (assert=0, win=0x842270) at
pwin_fence.c:60
#15 0x004010d8 in main (argc=2, argv=0x7fffd508) at win.c:60

meanwhile, rank 0 has already freed the datatype and is waiting in
MPI_Win_free().
(gdb) bt
#0  0x77312107 in sched_yield () from /lib/libc.so.6
#1  0x77b1f82b in opal_progress () at runtime/opal_progress.c:220
#2  0x77a53fe4 in opal_condition_wait (m=,
c=) at ../opal/threads/condition.h:99
#3  ompi_request_default_wait_all (count=2, requests=0x7fffd210,
statuses=0x7fffd1e0) at request/req_wait.c:263
#4  0x725b8d71 in ompi_coll_tuned_sendrecv_actual (sendbuf=0x0,
scount=0, sdatatype=0x77dba840, dest=1, stag=-16, recvbuf=, rcount=0, rdatatype=0x77dba840, source=1, rtag=-16,
comm=0x8431a0, status=0x0) at coll_tuned_util.c:54
#5  0x725c2de2 in ompi_coll_tuned_barrier_intra_two_procs
(comm=, module=) at coll_tuned_barrier.c:256
#6  0x725b92ab in ompi_coll_tuned_barrier_intra_dec_fixed
(comm=0x8431a0, module=0x844980) at coll_tuned_decision_fixed.c:190
#7  0x72186248 in ompi_osc_rdma_module_free (win=0x842170) at
osc_rdma.c:46
#8  0x77a58a44 in ompi_win_free (win=0x842170) at win/win.c:150
#9  0x77a8a0dd in PMPI_Win_free (win=0x7fffd408) at
pwin_free.c:56
#10 0x00401195 in main (argc=2, argv=0x7fffd508) at win.c:69


*2.* This appears to be more fundamental and perhaps much harder to fix.
The attached code sets up the following graph

rank 0:
0 -> (1,0)
1 -> nothing
2 -> (1,1)

rank 1:
0 -> (0,0)
1 -> (0,2)
2 -> (0,1)

We pull over this graph using two calls to MPI_Get(), each with composite
data types defining what to pull into the first two slots, and what to put
into the third slot. It is Valgrind-clean with MPICH2, and produces the
following:

$ mpiexec.hydra -n 2 ./a.out 2
[0] provided [100,101,102]  got [200, -2,201]
[1] provided [200,201,202]  got [100,102,101]

With Open MPI, I see

a.out: malloc.c:3096: sYSMALLOc: Assertion `(old_top == (((mbinptr) (((char
*) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk,
fd && old_size == 0) || ((unsigned long) (old_size) >= (unsigned
long)__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 *
(sizeof(size_t))) - 1)) & ~((2 * (sizeof(size_t))) - 1))) &&
((old_top)->size & 0x1) && ((unsigned long)old_end & pagemask) == 0)'
failed.

on both ranks, wi

Re: [OMPI users] SpMV Benchmarks

2011-05-06 Thread Jed Brown

On Thu, May 5, 2011 at 23:15, Paul Monday (Parallel Scientific) <
paul.mon...@parsci.com> wrote:

> Hi, I'm hoping someone can help me locate a SpMV benchmark that runs w/
> Open MPI so I can benchmark how my systems are interacting with the network
> as I add nodes / cores to the pool of systems.  I can find SpMV benchmarks
> for single processor / OpenMP all over, but these networked ones are proving
> harder to come by.  I located Lis (http://www.ssisc.org/lis/) but it seems
> more of a solver then a benchmarking program.
>

I would suggest using PETSc. It is a solvers library rather than a contrived
benchmark suite, but the examples give you access to many different matrices
and you can use many different formats without changing the code. If you run
with -log_summary, you will get a useful table showing the performance of
different operations (time/balance/communication/reductions/flops/etc). Also
note that SpMV is usually not an end in its own right, usually it is part of
a preconditioned Krylov iteration, so the performance of all the pieces
matter.

If you are concerned with absolute performance then you should consider
using petsc-dev since it tends to have better memory performance due to
software prefetch. This is important for good reuse of high-level caches
since otherwise the matrix entries flush out the useful stuff. It usually
makes between a 20 and 30% improvement, a bit more for some symmetric and
triangular kernels. Many of the sparse matrix kernels did not have software
prefetch as of the 3.1 release. Remember:

"The easiest way to make software scalable is to make it sequentially
inefficient." (Gropp, 1999)

Re: [OMPI users] memcpy overlap in ompi_ddt_copy_content_same_ddt and glibc 2.12

2010-11-11 Thread Jed Brown

On Thu, Nov 11, 2010 at 12:36, Number Cruncher  wrote:

> However as commented here:
> https://bugzilla.redhat.com/show_bug.cgi?id=638477#c86 the valgrind memcpy
> implementation is overlap-safe.
>

 Yes, of course.  That's how the bug in Open MPI was originally detected.
 Of course you can't do production runs with valgrind.

Are you using an Intel Nehalem-class CPU?
>

No, Core 2 Duo and Opteron (but the Opterons still have older glibc).
 Reverse memcpy must only be turned on for Nehalem.

Jed

Re: [OMPI users] memcpy overlap in ompi_ddt_copy_content_same_ddt and glibc 2.12

2010-11-10 Thread Jed Brown

On Wed, Nov 10, 2010 at 22:08, e-mail number.cruncher <
number.crunc...@ntlworld.com> wrote:

> In short, someone from Intel submitted a glibc patch that does faster
> memcpy's on e.g. Intel i7, respects the ISO C definition, but does
> things backwards.
>

However, the commit message and mailing list, as far as I can tell, does not
explain how the implementations were benchmarked.  Linus claims that his
(entirely trivial) implementation matches or beats the new one.  If indeed
the performance gains claimed by Lu (2X to 4X) are real, then the old
implementation must have been truly horrible (as stated by Agner Fog in
http://sourceware.org/ml/libc-help/2008-08/msg7.html).  I'd like to see
the benchmark results demonstrating that the backward memcpy is really
faster than forward.

> I think any software that ignores the ISO warning
> "If copying takes place between objects that overlap, the behavior is
> undefined" needs fixing.
>

Absolutely, it is incorrect and should be fixed.

Jed

Re: [OMPI users] memcpy overlap in ompi_ddt_copy_content_same_ddt and glibc 2.12

2010-11-10 Thread Jed Brown

On Wed, Nov 10, 2010 at 18:25, Jed Brown  wrote:

> Is the memcpy-back code ever executed when called as memcpy()?  I can't
> imagine why it would be, but it would make plenty of sense to use it inside
> memmove when the destination is at a higher address than the source.

Apparently the backward memcpy is actually used, but I still don't know why
and neither does Linus:

https://bugzilla.redhat.com/show_bug.cgi?id=638477#c46

Jed

Re: [OMPI users] memcpy overlap in ompi_ddt_copy_content_same_ddt and glibc 2.12

2010-11-10 Thread Jed Brown

On Wed, Nov 10, 2010 at 18:11, Number Cruncher  wrote:

> Just some observations from a concerned user with a temperamental Open MPI
> program (1.4.3):
>
> Fedora 14 (just released) includes glibc-2.12 which has optimized versions
> of memcpy, including a copy backward.
>
> http://sourceware.org/git/?p=glibc.git;a=commitdiff;h=6fb8cbcb58a29fff73eb2101b34caa19a7f88eba

Is the memcpy-back code ever executed when called as memcpy()?  I can't
imagine why it would be, but it would make plenty of sense to use it inside
memmove when the destination is at a higher address than the source.

Jed

Re: [OMPI users] Open MPI data transfer error

2010-11-06 Thread Jed Brown

On Sat, Nov 6, 2010 at 18:00, Jack Bryan  wrote:

>  Thanks,
>
> About my MPI program bugs:
>
> I used GDB and got the error:
>
> Program received signal SIGSEGV, Segmentation fault.
> 0:  0x003a31c62184 in fwrite () from /lib64/libc.so.6
>

Clearly fwrite was called with invalid parameters, but you don't give enough
information for anyone to explain why.  Compile your program with debugging
symbols and print the whole stack trace, e.g. with "backtrace full".  Also
try valgrind.

> class CNSGA2
> {
> allocate mem for var;
> some deallocate statement;
> some pointers;
> evaluate(); // it is a function
> }
>

This isn't even close to valid code since you can't have statements in the
suggested scope.

main()
> {
> CNSGA2* nsga2a = new CNSGA2(true); // true or false are only for different
> constructors
> CNSGA2* nsga2b = new CNSGA2(false);
>  if (myRank == 0) // scope1
> {
> initialize the objects of nsga2a or nsga2b;
>  }
>  broadcast some parameters, which are got from scope1.
>
> According to the parameters, define a datatype (myData) so that all workers
> use that to do recv and send.
>
>  if (myRank == 0) // scope2
> {
> send out myData to workers by the datatype defined above;
>  }
>  if (myRank != 0)
> {
> newCNSGA2 myNsga2;
> recv data from master and work on the recved data;
> myNsga2.evaluate(recv data);
> send back results;
> }
>
> }
>

According to the above, rank 0 never receives the results from before.  You
should paste valid code.

Jed

[OMPI users] Open MPI 1.5 is not detecting oversubscription

2010-11-06 Thread Jed Brown

Previous versions would set mpi_yield_when_idle automatically when
oversubscribing a node.  I assume this behavior was not intentionally
changed, but the parameter is not being set in cases of oversubscription,
with or without an explicit hostfile.

Jed

Re: [OMPI users] Open MPI program cannot complete

2010-10-25 Thread Jed Brown

On Mon, Oct 25, 2010 at 19:35, Jack Bryan  wrote:

> I have to use #PBS to submit any jobs in my cluster.
> I cannot use command line to hang a job on my cluster.
>

You don't need a cluster to run MPI jobs, can you run the job on whatever
you development machine is?  Does it hang there?

PBS interactive jobs are started with qsub -I.

>
> Where should I put the (gdb --batch -ex 'bt full' -ex 'info reg' -pid
> ZOMBIE_PID) in the script ?
>

On the line after "mpirun ...", assuming that control returns to there after
the hang.  You didn't answer whether that was the case.

> And how to get ZOMBIE_PID from the script ?
>

Simplest is "pgrep $COMMAND", or use ps.

Jed

Re: [OMPI users] Open MPI program cannot complete

2010-10-25 Thread Jed Brown

On Mon, Oct 25, 2010 at 19:07, Jack Bryan  wrote:

> I need to use #PBS parallel job script to submit a job on MPI cluster.
>

Is it not possible to reproduce locally?  Most clusters have a way to submit
an interactive job (which would let you start this thing and then inspect
individual processes).  Ashley's Padb suggestion will certainly be better in
a non-interactive environment.

> Where should I put the (gdb --batch -ex 'bt full' -ex 'info reg' -pid
> ZOMBIE_PID) in the script ?
>

Is control returning to your script after rank 0 has exited?  In that case,
you can just put this on the next line.

> How to get the ZOMBIE_PID ?
>

"ps" from the command line, or getpid() from C code.

Jed

Re: [OMPI users] Open MPI program cannot complete

2010-10-25 Thread Jed Brown

On Mon, Oct 25, 2010 at 18:26, Jack Bryan  wrote:

> Thanks, the problem is still there.


This really doesn't prove that there are no outstanding asynchronous
requests, but perhaps you know that there are not, despite not being able to
post a complete test case here.  I suggest attaching a debugger and getting
a stack trace from the zombies (gdb --batch -ex 'bt full' -ex 'info reg'
-pid ZOMBIE_PID).

Jed

Re: [OMPI users] Build failure with OMPI-1.5 (clang-2.8, gcc-4.5.1 with debug options)

2010-10-14 Thread Jed Brown

On Fri, Oct 15, 2010 at 01:26, Jed Brown  wrote:

> I'll report the bug


http://llvm.org/bugs/show_bug.cgi?id=8383

Re: [OMPI users] Build failure with OMPI-1.5 (clang-2.8, gcc-4.5.1 with debug options)

2010-10-14 Thread Jed Brown

On Fri, Oct 15, 2010 at 00:38, Jeff Squyres (jsquyres)
wrote:

> Huh. Can you make V=1 to build libmpi and use the same kind of options to
> build your sample library?
>

Make log here

http://59A2.org/files/openmpi-1.5-clang-make.log

After some digging, this looks like a clang bug.  First, from the comments
on

http://llvm.org/bugs/show_bug.cgi?id=3679

there seems to be some resistance to the #pragma weak g2 = g3, but since
these things work with clang-2.8, that isn't the whole story.  Indeed,

#pragma GCC visibility push(default)
#pragma weak fake = real
#pragma GCC visibility pop

does not expose the symbol "fake".  This must be a bug, but an arguably
better way to set up the aliasing is

int fake(int i) __attribute__((weak, alias("real")));

which does work.  I'll report the bug, but maybe Open MPI could use a more
complete test for visibility working with weak aliasing?  Or just don't
support clang-2.8.

Jed

Re: [OMPI users] Build failure with OMPI-1.5 (clang-2.8, gcc-4.5.1 with debug options)

2010-10-14 Thread Jed Brown

On Thu, Oct 14, 2010 at 23:53, Jeff Squyres  wrote:

> The configure test essentially looks like this -- could you try this
> manually and see what happens?
>
> cat > conftest_weak.h < int real(int i);
> int fake(int i);
> EOF
>
> cat > conftest_weak.c < #include "conftest_weak.h"
> #pragma weak fake = real
> int real(int i) { return i; }
> EOF
>
> cat > conftest.c < #include "conftest_weak.h"
> int main() { return fake(3); }
> EOF
>
> # Try the compile
> clang $CFLAGS  -c conftest_weak.c
> clang $CFLAGS  conftest.c conftest_weak.o -o conftest $LDFLAGS $LIBS
>
> The configure test rules that weak symbol support is there if both compiler
> invocations return an exit status of 0.
>

They exit 0 and

$ nm conftest |g 'real|fake'
004004a0 W fake
004004a0 T real

so it looks like that is working fine.  It also works fine when I stuff it
into a shared library:

$ clang -c -fPIC conftest_weak.c
$ clang -shared -fPIC conftest.c conftest_weak.o -o conftest.so
$ nm conftest.so |g 'real|fake'
05a0 W fake
05a0 T real

Jed

Re: [OMPI users] Build failure with OMPI-1.5 (clang-2.8, gcc-4.5.1 with debug options)

2010-10-14 Thread Jed Brown

On Thu, Oct 14, 2010 at 23:31, Jeff Squyres  wrote:

> Strange, because I see
> /home/jed/src/openmpi-1.5/bclang/ompi/contrib/vt/vt/../../../.libs/libmpi.so
> explicitly listed in the link line, which should contain MPI_Abort.  Can you
> nm on that file and ensure that it is actually listed there?
>

$ nm -D
/home/jed/src/openmpi-1.5/bclang/ompi/contrib/vt/vt/../../../.libs/libmpi.so
|grep MPI_Abort
00074380 T PMPI_Abort

In contrast, with gcc:

$ nm -D
/home/jed/src/openmpi-1.5/bgcc/ompi/contrib/vt/vt/../../../.libs/libmpi.so
|grep MPI_Abort
000712d0 W MPI_Abort
000712d0 T PMPI_Abort

Weak symbol issue, I don't know how clang is different in this regard.

Jed

Re: [OMPI users] Build failure with OMPI-1.5 (clang-2.8, gcc-4.5.1 with debug options)

2010-10-14 Thread Jed Brown

On Thu, Oct 14, 2010 at 22:36, Jeff Squyres  wrote:

> On Oct 11, 2010, at 4:50 PM, Jed Brown wrote:
>
> > Note that this is an out-of-source build.
> >
> > $ ../configure --enable-debug --enable-mem-debug
> --prefix=/home/jed/usr/ompi-1.5-clang CC=clang CXX=clang++
> > $ make
> > [...]
> >   CXXLD  vtunify-mpi
> > vtunify_mpi-vt_unify_mpi.o: In function `VTUnify_MPI_Abort':
> >
> /home/jed/src/openmpi-1.5/bclang/ompi/contrib/vt/vt/tools/vtunify/mpi/../../../../../../../../ompi/contrib/vt/vt/tools/vtunify/mpi/vt_unify_mpi.c:63:
> undefined reference to `MPI_Abort'
>
> Well this is disappointing.  :-\
>
> Can you "make V=1" so that we can see the command line here that is
> failing?
>

libtool: link: clang++ -DVT_MPI -g -finline-functions -pthread -o
.libs/vtunify-mpi vtunify_mpi-vt_unify_mpi.o vtunify_mpi-vt_unify.o
vtunify_mpi-vt_unify_defs.o vtunify_mpi-vt_unify_defs_hdlr.o
vtunify_mpi-vt_unify_events.o vtunify_mpi-vt_unify_events_hdlr.o
vtunify_mpi-vt_unify_markers.o vtunify_mpi-vt_unify_markers_hdlr.o
vtunify_mpi-vt_unify_stats.o vtunify_mpi-vt_unify_stats_hdlr.o
vtunify_mpi-vt_unify_tkfac.o  ../../../util/.libs/libutil.a
../../../extlib/otf/otflib/.libs/libotf.so -lz
-L/home/jed/src/openmpi-1.5/bclang/ompi/contrib/vt/vt/../../../.libs
/home/jed/src/openmpi-1.5/bclang/ompi/contrib/vt/vt/../../../.libs/libmpi.so
-ldl -lnsl -lutil -lm -pthread -Wl,-rpath
-Wl,/home/jed/usr/ompi-1.5-clang/lib
vtunify_mpi-vt_unify_mpi.o: In function `VTUnify_MPI_Abort':
/home/jed/src/openmpi-1.5/bclang/ompi/contrib/vt/vt/tools/vtunify/mpi/../../../../../../../../ompi/contrib/vt/vt/tools/vtunify/mpi/vt_unify_mpi.c:63:
undefined reference to `MPI_Abort'


> FWIW, this looks like a problem that is self-contained in VampirTrace, so
> you can likely get a working build with:
>
> ./configure --enable-contrib-no-build=vt ...
>
> > Leaving out the debugging flags gets me further (no compilation error,
> just this link error):
> >
> > $ ../configure --prefix=/home/jed/usr/ompi-1.5-clang CC=clang CXX=clang++
> > $ make
> > [...]
> >   CCLD   libutil.la
> > ar:
> /home/jed/src/openmpi-1.5/bclang/ompi/contrib/vt/vt/util/.libs/libutil.a: No
> such file or directory
> > make[5]: *** [libutil.la] Error 9
>
> That's a weird one -- it should be *creating* that library, so I'm not sure
> why it would complain that the library doesn't exist...?  This could be a
> red herring, though -- perhaps some oddity in your tree and/or
> filesystem...?  (I've seen this kind of thing before such that a "make
> distclean" fixed the issue, I think)
>

Sure enough, using a new build directory, I get the same error as above:

libtool: link: clang++ -DVT_MPI -O3 -DNDEBUG -finline-functions -pthread -o
.libs/vtunify-mpi vtunify_mpi-vt_unify_mpi.o vtunify_mpi-vt_unify.o vtunif
y_mpi-vt_unify_defs.o vtunify_mpi-vt_unify_defs_hdlr.o
vtunify_mpi-vt_unify_events.o vtunify_mpi-vt_unify_events_hdlr.o
vtunify_mpi-vt_unify_markers.o
 vtunify_mpi-vt_unify_markers_hdlr.o vtunify_mpi-vt_unify_stats.o
vtunify_mpi-vt_unify_stats_hdlr.o vtunify_mpi-vt_unify_tkfac.o
 ../../../util/.libs/
libutil.a ../../../extlib/otf/otflib/.libs/libotf.so -lz
-L/home/jed/src/openmpi-1.5/bclang-nodbg/ompi/contrib/vt/vt/../../../.libs
/home/jed/src/open
mpi-1.5/bclang-nodbg/ompi/contrib/vt/vt/../../../.libs/libmpi.so -ldl -lnsl
-lutil -lm -pthread -Wl,-rpath -Wl,/home/jed/usr/ompi-1.5-clang-nodbg/lib
vtunify_mpi-vt_unify_mpi.o: In function `VTUnify_MPI_Abort':
../../../../../../../../ompi/contrib/vt/vt/tools/vtunify/mpi/vt_unify_mpi.c:(.text+0xa):
undefined reference to `MPI_Abort'

Grab config.log for this case here:

http://59A2.org/files/openmpi-1.5-clang-config.log


> > I also get this last failure with gcc-4.5.1, but only with the debug
> flags:
> >
> > $ ../configure --enable-debug --enable-mem-debug
> --prefix=/home/jed/usr/ompi-1.5-gcc CC=gcc CXX=g++
> > $ make
> > [...]
> > Making all in util
> >   CC libutil_la-installdirs.lo
> >   CCLD   libutil.la
> > ar:
> /home/jed/src/openmpi-1.5/bgcc/ompi/contrib/vt/vt/util/.libs/libutil.a: No
> such file or directory
>
> Same error.  Weird.  Can you "make V=1" here, too?


This one completes with a clean build directory, reconfiguring from a
non-debug build must have caused this issue the first time around.

Jed

[OMPI users] Build failure with OMPI-1.5 (clang-2.8, gcc-4.5.1 with debug options)

2010-10-11 Thread Jed Brown

Note that this is an out-of-source build.

$ ../configure --enable-debug --enable-mem-debug
--prefix=/home/jed/usr/ompi-1.5-clang CC=clang CXX=clang++
$ make
[...]
  CXXLD  vtunify-mpi
vtunify_mpi-vt_unify_mpi.o: In function `VTUnify_MPI_Abort':
/home/jed/src/openmpi-1.5/bclang/ompi/contrib/vt/vt/tools/vtunify/mpi/../../../../../../../../ompi/contrib/vt/vt/tools/vtunify/mpi/vt_unify_mpi.c:63:
undefined reference to `MPI_Abort'
vtunify_mpi-vt_unify_mpi.o: In function `VTUnify_MPI_Address':
/home/jed/src/openmpi-1.5/bclang/ompi/contrib/vt/vt/tools/vtunify/mpi/../../../../../../../../ompi/contrib/vt/vt/tools/vtunify/mpi/vt_unify_mpi.c:74:
undefined reference to `MPI_Address'
vtunify_mpi-vt_unify_mpi.o: In function `VTUnify_MPI_Barrier':
/home/jed/src/openmpi-1.5/bclang/ompi/contrib/vt/vt/tools/vtunify/mpi/../../../../../../../../ompi/contrib/vt/vt/tools/vtunify/mpi/vt_unify_mpi.c:86:
undefined reference to `MPI_Barrier'
vtunify_mpi-vt_unify_mpi.o: In function `VTUnify_MPI_Bcast':
/home/jed/src/openmpi-1.5/bclang/ompi/contrib/vt/vt/tools/vtunify/mpi/../../../../../../../../ompi/contrib/vt/vt/tools/vtunify/mpi/vt_unify_mpi.c:101:
undefined reference to `MPI_Bcast'
vtunify_mpi-vt_unify_mpi.o: In function `VTUnify_MPI_Comm_size':
/home/jed/src/openmpi-1.5/bclang/ompi/contrib/vt/vt/tools/vtunify/mpi/../../../../../../../../ompi/contrib/vt/vt/tools/vtunify/mpi/vt_unify_mpi.c:115:
undefined reference to `MPI_Comm_size'
vtunify_mpi-vt_unify_mpi.o: In function `VTUnify_MPI_Comm_rank':
/home/jed/src/openmpi-1.5/bclang/ompi/contrib/vt/vt/tools/vtunify/mpi/../../../../../../../../ompi/contrib/vt/vt/tools/vtunify/mpi/vt_unify_mpi.c:127:
undefined reference to `MPI_Comm_rank'
vtunify_mpi-vt_unify_mpi.o: In function `VTUnify_MPI_Finalize':
/home/jed/src/openmpi-1.5/bclang/ompi/contrib/vt/vt/tools/vtunify/mpi/../../../../../../../../ompi/contrib/vt/vt/tools/vtunify/mpi/vt_unify_mpi.c:138:
undefined reference to `MPI_Finalize'
vtunify_mpi-vt_unify_mpi.o: In function `VTUnify_MPI_Init':
/home/jed/src/openmpi-1.5/bclang/ompi/contrib/vt/vt/tools/vtunify/mpi/../../../../../../../../ompi/contrib/vt/vt/tools/vtunify/mpi/vt_unify_mpi.c:149:
undefined reference to `MPI_Init'
vtunify_mpi-vt_unify_mpi.o: In function `VTUnify_MPI_Pack':
/home/jed/src/openmpi-1.5/bclang/ompi/contrib/vt/vt/tools/vtunify/mpi/../../../../../../../../ompi/contrib/vt/vt/tools/vtunify/mpi/vt_unify_mpi.c:165:
undefined reference to `MPI_Pack'
vtunify_mpi-vt_unify_mpi.o: In function `VTUnify_MPI_Pack_size':
/home/jed/src/openmpi-1.5/bclang/ompi/contrib/vt/vt/tools/vtunify/mpi/../../../../../../../../ompi/contrib/vt/vt/tools/vtunify/mpi/vt_unify_mpi.c:180:
undefined reference to `MPI_Pack_size'
vtunify_mpi-vt_unify_mpi.o: In function `VTUnify_MPI_Recv':
/home/jed/src/openmpi-1.5/bclang/ompi/contrib/vt/vt/tools/vtunify/mpi/../../../../../../../../ompi/contrib/vt/vt/tools/vtunify/mpi/vt_unify_mpi.c:197:
undefined reference to `MPI_Recv'
vtunify_mpi-vt_unify_mpi.o: In function `VTUnify_MPI_Send':
/home/jed/src/openmpi-1.5/bclang/ompi/contrib/vt/vt/tools/vtunify/mpi/../../../../../../../../ompi/contrib/vt/vt/tools/vtunify/mpi/vt_unify_mpi.c:218:
undefined reference to `MPI_Send'
vtunify_mpi-vt_unify_mpi.o: In function `VTUnify_MPI_Type_commit':
/home/jed/src/openmpi-1.5/bclang/ompi/contrib/vt/vt/tools/vtunify/mpi/../../../../../../../../ompi/contrib/vt/vt/tools/vtunify/mpi/vt_unify_mpi.c:230:
undefined reference to `MPI_Type_commit'
vtunify_mpi-vt_unify_mpi.o: In function `VTUnify_MPI_Type_free':
/home/jed/src/openmpi-1.5/bclang/ompi/contrib/vt/vt/tools/vtunify/mpi/../../../../../../../../ompi/contrib/vt/vt/tools/vtunify/mpi/vt_unify_mpi.c:242:
undefined reference to `MPI_Type_free'
vtunify_mpi-vt_unify_mpi.o: In function `VTUnify_MPI_Type_struct':
/home/jed/src/openmpi-1.5/bclang/ompi/contrib/vt/vt/tools/vtunify/mpi/../../../../../../../../ompi/contrib/vt/vt/tools/vtunify/mpi/vt_unify_mpi.c:270:
undefined reference to `MPI_Type_struct'
vtunify_mpi-vt_unify_mpi.o: In function `VTUnify_MPI_Unpack':
/home/jed/src/openmpi-1.5/bclang/ompi/contrib/vt/vt/tools/vtunify/mpi/../../../../../../../../ompi/contrib/vt/vt/tools/vtunify/mpi/vt_unify_mpi.c:300:
undefined reference to `MPI_Unpack'
collect2: ld returned 1 exit status
clang: error: linker (via gcc) command failed with exit code 1 (use -v to
see invocation)
make[7]: *** [vtunify-mpi] Error 1


Leaving out the debugging flags gets me further (no compilation error, just
this link error):

$ ../configure --prefix=/home/jed/usr/ompi-1.5-clang CC=clang CXX=clang++
$ make
[...]
  CCLD   libutil.la
ar:
/home/jed/src/openmpi-1.5/bclang/ompi/contrib/vt/vt/util/.libs/libutil.a: No
such file or directory
make[5]: *** [libutil.la] Error 9

I also get this last failure with gcc-4.5.1, but only with the debug flags:

$ ../configure --enable-debug --enable-mem-debug
--prefix=/home/jed/usr/ompi-1.5-gcc CC=gcc CXX=g++
$ make
[...]
Making all in util
  CC libutil_la-installdirs.lo
  CCLD   libutil.la
ar: /home/jed/src/openmpi-1.5/bgc

Re: [OMPI users] OpenMPI Run-Time "Freedom" Question

2010-08-12 Thread Jed Brown

Or OMPI_CC=icc-xx.y mpicc ...

Jed

On Aug 12, 2010 5:18 PM, "Ralph Castain"  wrote:

On Aug 12, 2010, at 7:04 PM, Michael E. Thomadakis wrote:

> On 08/12/10 18:59, Tim Prince wrote:
>>...
The "easy" way to accomplish this would be to:

(a) build OMPI with whatever compiler you decide to use as a "baseline"

(b) do -not- use the wrapper compiler to build the application. Instead, do
"mpicc --showme" (or whatever language equivalent you want) to get the
compile line, substitute your "new" compiler library for the "old" one, and
then execute the resulting command manually.

If you then set your LD_LIBRARY_PATH to the "new" libs, it might work - but
no guarantees. Still, you could try it - and if it worked, you could always
just explain that this is a case-by-case situation, and so it -could- break
with other compiler combinations.

Critical note: the app developers would have to validate the code with every
combination! Otherwise, correct execution will be a complete crap-shoot -
just because the app doesn't abnormally terminate does -not- mean it
generated a correct result!

> Thanks for the information on this. We indeed use Intel Compiler set
11.1.XXX + OMPI 1.4.1 and ...

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Do MPI calls ever sleep?

2010-07-21 Thread Jed Brown

On Wed, 21 Jul 2010 15:20:24 -0400, David Ronis  wrote:
> Hi Jed,
> 
> Thanks for the reply and suggestion.  I tried adding -mca
> yield_when_idle 1 (and later mpi_yield_when_idle 1 which is what
> ompi_info reports the variable as) but it seems to have had 0 effect.
> My master goes into fftw planning routines for a minute or so (I see the
> threads being created), but the overall usage of the slaves remains
> close to 100% during this time.  Just to be sure, I put the slaves into
> a MPI_Barrier(MPI_COMM_WORLD) while they were waiting for the fftw
> planner to finish.   It also didn't help.

They still spin (instead of using e.g. select()), but call sched_yield()
so should only be actively spinning when nothing else is trying to run.
Are you sure that the planner is always running in parallel?  What OS
and OMPI version are you using?

Jed

Re: [OMPI users] Do MPI calls ever sleep?

2010-07-21 Thread Jed Brown

On Wed, 21 Jul 2010 14:10:53 -0400, David Ronis  wrote:
> Is there another MPI routine that polls for data and then gives up its
> time-slice?

You're probably looking for the runtime option -mca yield_when_idle 1.
This will slightly increase latency, but allows other threads to run
without competing with the spinning MPI.

Jed

Re: [OMPI users] openmpi v1.5?

2010-07-21 Thread Jed Brown

On Mon, 19 Jul 2010 15:24:32 -0400, Jeff Squyres  wrote:
> I'm actually waiting for *1* more bug fix before we consider 1.5 "complete".

I see this going through, but would it be possible to change the size of
the _count field in ompi_status_public_t now so that this bug can be
fixed without ABI breakage?

  https://svn.open-mpi.org/trac/ompi/ticket/2241

Note that thet 1.4.3 milestone doesn't make sense since it can't be
fixed without an ABI change.

Jed

Re: [OMPI users] Ok, I've got OpenMPI set up, now what?!

2010-07-19 Thread Jed Brown

On Mon, 19 Jul 2010 13:33:01 -0600, Damien Hocking  wrote:
> It does.  The big difference is that MUMPS is a 3-minute compile, and 
> PETSc, erm, isn't.  It's..longer...

FWIW, PETSc takes less than 3 minutes to build (after configuration) for
me (I build it every day).  Building MUMPS (with dependencies) is
automatic with PETSc's --download-{blacs,scalapack,mumps}, but is
involved to do by hand (all three require editing makefiles).  I know
people that have configured PETSc just to build code which calls MUMPS
directly (without PETSc).  :-)

Jed

Re: [OMPI users] openmpi v1.5?

2010-07-19 Thread Jed Brown

On Mon, 19 Jul 2010 15:16:59 -0400, Michael Di Domenico 
 wrote:
> Since I am a SVN neophyte can anyone tell me when openmpi 1.5 is
> scheduled for release?

https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.5

> And whether the Slurm srun changes are going to make in?

https://svn.open-mpi.org/trac/ompi/wiki/v1.5/planning

Jed

Re: [OMPI users] Highly variable performance

2010-07-15 Thread Jed Brown

On Thu, 15 Jul 2010 13:03:31 -0400, Jeff Squyres  wrote:
> Given the oversubscription on the existing HT links, could contention
> account for the difference?  (I have no idea how HT's contention
> management works) Meaning: if the stars line up in a given run, you
> could end up with very little/no contention and you get good
> bandwidth.  But if there's a bit of jitter, you could end up with
> quite a bit of contention that ends up cascading into a bunch of
> additional delay.

What contention?  Many sockets needing to access memory on another
socket via HT links?  Then yes, perhaps that could be a lot.  As show in
the diagram, it's pretty non-uniform, and if, say sockets 0, 1, and 3
all found memory on socket 0 (say socket 2 had local memory), then there
are two ways for messages to get from 3 to 0 (via 1 or via 2).  I don't
know if there is hardware support to re-route to avoid contention, but
if not, then socket 3 could be sharing the 1->0 HT link (which has max
throughput of 8 GB/s, therefore 4 GB/s would be available per socket,
provided it was still operating at peak).  Note that this 4 GB/s is
still less than splitting the 10.7 GB/s three ways.

> I fail to see how that could add up to 70-80 (or more) seconds of
> difference -- 13 secs vs. 90+ seconds (and more), though...  70-80
> seconds sounds like an IO delay -- perhaps paging due to the ramdisk
> or somesuch...?  That's a SWAG.

This problem should have had significantly less resident than would
cause paging, but these were very short jobs so a relatively small
amount of paging would cause a big performance hit.  We have also seen
up to a factor of 10 variability in longer jobs (e.g. 1 hour for a
"fast" run), with larger working sets, but once the pages are faulted,
this kernel (2.6.18 from RHEL5) won't migrate them around, so even if
you eventually swap out all the ramdisk, pages faulted before and after
will be mapped to all sorts of inconvenient places.

But, I don't have any systematic testing with a guaranteed clean
ramdisk, and I'm not going to overanalyze the extra factors when there's
an understood factor of 3 hanging in the way.  I'll give an update if
there is any news.

Jed

Re: [OMPI users] Highly variable performance

2010-07-15 Thread Jed Brown

On Thu, 15 Jul 2010 09:36:18 -0400, Jeff Squyres  wrote:
> Per my other disclaimer, I'm trolling through my disastrous inbox and
> finding some orphaned / never-answered emails.  Sorry for the delay!

No problem, I should have followed up on this with further explanation.

> Just to be clear -- you're running 8 procs locally on an 8 core node,
> right?

These are actually 4-socket quad-core nodes, so there are 16 cores
available, but we are only running on 8, -npersocket 2 -bind-to-socket.
This was a greatly simplified case, but is still sufficient to show the
variability.  It tends to be somewhat worse if we use all cores of a
node.

  (Cisco is an Intel partner -- I don't follow the AMD line
> much) So this should all be local communication with no external
> network involved, right?

Yes, this was the greatly simplified case, contained entirely within a 

> > lsf.o240562 killed   8*a6200
> > lsf.o240563 9.2110e+01   8*a6200
> > lsf.o240564 1.5638e+01   8*a6237
> > lsf.o240565 1.3873e+01   8*a6228
>
> Am I reading that right that it's 92 seconds vs. 13 seconds?  Woof!

Yes, an the "killed" means it wasn't done after 120 seconds.  This
factor of 10 is about the worst we see, but of course very surprising.

> Nice and consistent, as you mentioned.  And I assume your notation
> here means that it's across 2 nodes.

Yes, the Quadrics nodes are 2-socket dual core, so 8 procs needs two
nodes.

The rest of your observations are consistent with my understanding.  We
identified two other issues, neither of which accounts for a factor of
10, but which account for at least a factor of 3.

1. The administrators mounted a 16 GB ramdisk on /scratch, but did not
   ensure that it was wiped before the next task ran.  So if you got a
   node after some job that left stinky feces there, you could
   effectively only have 16 GB (before the old stuff would be swapped
   out).  More importantly, the physical pages backing the ramdisk may
   not be uniformly distributed across the sockets, and rather than
   preemptively swap out those old ramdisk pages, the kernel would find
   a page on some other socket (instead of locally, this could be
   confirmed, for example, by watching the numa_foreign and numa_miss
   counts with numastat).  Then when you went to use that memory
   (typically in a bandwidth-limited application), it was easy to have 3
   sockets all waiting on one bus, thus taking a factor of 3+
   performance hit despite a resident set much less than 50% of the
   available memory.  I have a rather complete analysis of this in case
   someone is interested.  Note that this can affect programs with
   static or dynamic allocation (the kernel looks for local pages when
   you fault it, not when you allocate it), the only way I know of to
   circumvent the problem is to allocate memory with libnuma
   (e.g. numa_alloc_local) which will fail if local memory isn't
   available (instead of returning and subsequently faulting remote
   pages).

2. The memory bandwidth is 16-18% different between sockets, with
   sockets 0,3 being slow and sockets 1,2 having much faster available
   bandwidth.  This is fully reproducible and acknowledged by
   Sun/Oracle, their response to an early inquiry:

 http://59A2.org/files/SunBladeX6440STREAM-20100616.pdf

   I am not completely happy with this explanation because the issue
   persists even with full software prefetch, packed SSE2, and
   non-temporal stores; as long as the working set does not fit within
   (per-socket) L3.  Note that the software prefetch allows for several
   hundred cycles of latency, so the extra hop for snooping shouldn't be
   a problem.  If the working set fits within L3, then all sockets are
   the same speed (and of course much faster due to improved bandwidth).
   Some disassembly here:

 http://gist.github.com/476942

   The three with prefetch and movntpd run within 2% of each other, the
   other is much faster within cache and much slower when it breaks out
   of cache (obviously).  The performance numbers are higher than with
   the reference implementation (quoted in Sun/Oracle's repsonse), but
   (run with taskset to each of the four sockets):

 Triad:   5842.5814   0.0329   0.0329   0.0330
 Triad:   6843.4206   0.0281   0.0281   0.0282
 Triad:   6827.6390   0.0282   0.0281   0.0283
 Triad:   5862.0601   0.0329   0.0328   0.0331

   This is almost exclusively due to the prefetching, the packed
   arithmetic is almost completely inconsequential when waiting on
   memory bandwidth.

Jed

Re: [OMPI users] EXTERNAL: Re: MPI_GET beyond 2 GB displacement

2010-07-08 Thread Jed Brown

On Thu, 8 Jul 2010 09:53:11 -0400, Jeff Squyres  wrote:
> > Do you "use mpi" or the F77 interface?
> 
> It shouldn't matter; both the Fortran module and mpif.h interfaces are the 
> same.

Yes, but only the F90 version can do type checking, the function
prototypes are not present in mpif.h.  The truncation is an internal
issue, unrelated to user code having compatible types (since I can
reproduce the issue from C).  I'm just confused by how passing an int64
when an int32 is expected could work from Fortran on a big-endian system
(it would likely work from C when the argument is passed by value in a
register), using the F90 module would enable the compiler to do type
checking and should hilight any type mismatches.

Jed

Re: [OMPI users] EXTERNAL: Re: MPI_GET beyond 2 GB displacement

2010-07-07 Thread Jed Brown

On Wed, 07 Jul 2010 17:34:44 -0600, "Price, Brian M (N-KCI)" 
 wrote:
> Jed,
> 
> The IBM P5 I'm working on is big endian.

Sorry, that didn't register.  The displ argument is MPI_Aint which is 8
bytes (at least on LP64, probably also on LLP64), so your use of kind=8
for that is certainly correct.  The count argument is a plain int, I
don't see how your code could be correct if you pass in an 8-byte int
there when it expects a 4-byte int (since the upper 4 bytes would be
used on a big-endian system).

> The test program I'm using is written in Fortran 90 (as stated in my 
> question).

Do you "use mpi" or the F77 interface?

> I imagine this is indeed a library issue, but I still don't understand what 
> I've done wrong here.

I can reproduce this in C on x86-64, even with displ much smaller than
2^31 (e.g. by setting displ_unit=4).  Apparently Open MPI multiplies
displ*displ_unit and stuffs the result in an int (somewhere in the
implementation), MPICH2 works correctly for me with large displacements.

  https://svn.open-mpi.org/trac/ompi/ticket/2472

Jed

Re: [OMPI users] EXTERNAL: Re: MPI_GET beyond 2 GB displacement

2010-07-07 Thread Jed Brown

On Wed, 07 Jul 2010 15:51:41 -0600, "Price, Brian M (N-KCI)" 
 wrote:
> Jeff,
> 
> I understand what you've said about 32-bit signed INTs, but in my program, 
> the displacement variable that I use for the MPI_GET call is a 64-bit INT 
> (KIND = 8).

The MPI Fortran bindings expect a standard int, your program is only
working because your system is little endian so the first 4 bytes are
the low bytes (correct for numbers less than 2^31), it would be
completely broken on a big endian system.  This is a library issue, you
can't fix it by using different sized ints in your program and you would
see compiler errors due to the type mismatch if you were using Fortran
90 (which is capable of some type checking).

Jed

Re: [OMPI users] Highly variable performance

2010-06-23 Thread Jed Brown

Following up on this, I have partial resolution.  The primary culprit
appears to be stale files in a ramdisk non-uniformly distributed across
the sockets, thus interactingly poorly with NUMA.  The slow runs
invariably have high numa_miss and numa_foreign counts.  I still have
trouble making it explain up to a factor of 10 degredation, but it
certainly explains a factor of 3.

Jed

Re: [OMPI users] Address not mapped segmentation fault with 1.4.2 ...

2010-06-10 Thread Jed Brown

Just a guess, but you could try the updated patch here

  https://svn.open-mpi.org/trac/ompi/ticket/2431

Jed

[OMPI users] Highly variable performance

2010-06-02 Thread Jed Brown

I'm investigating some very large performance variation and have reduced
the issue to a very simple MPI_Allreduce benchmark.  The variability
does not occur for serial jobs, but it does occur within single nodes.
I'm not at all convinced that this is an Open MPI-specific issue (in
fact the same variance is observed with MVAPICH2 which is an available,
but not "recommended", implementation on that cluster) but perhaps
someone here can suggest steps to track down the issue.

The nodes of interest are 4-socket Opteron 8380 (quad core, 2.5 GHz), connected
with QDR InfiniBand.  The benchmark loops over

  
MPI_Allgather(localdata,nlocal,MPI_DOUBLE,globaldata,nlocal,MPI_DOUBLE,MPI_COMM_WORLD);

with nlocal=1 (80 KiB messages) 1 times, so it normally runs in
a few seconds.  Open MPI 1.4.1 was compiled with gcc-4.3.3, and this
code was built with mpicc -O2.  All submissions were 8 process, timing
and host results are presented below in chronological order.  The jobs
were run with 2-minute time limits (to get through the queue easily)
jobs are marked "killed" if they went over this amount of time.  Jobs
were usually submitted in batches of 4.  The scheduler is LSF-7.0.

The HOST field indicates the node that was actually used, a6* nodes are
of the type described above, a2* nodes are much older (2-socket Opteron
2220 (dual core, 2.8 GHz)) and use a Quadrics network, the timings are
very reliable on these older nodes.  When the issue first came up, I was
inclined to blame memory bandwidith issues with other jobs, but the
variance is still visible when our job runs on exactly a full node,
present regardless of affinity settings, and events that don't require
communication are well-balanced in both small and large runs.

I then suspected possible contention between transport layers, ompi_info
gives

  MCA btl: parameter "btl" (current value: "self,sm,openib,tcp", data source: 
environment)

so the timings below show many variations of restricting these values.
Unfortunately, the variance is large for all combinations, but I find it
notable that -mca btl self,openib is reliably much slower than self,tcp.

Note that some nodes are used in multiple runs, yet there is no strict
relationship where some nodes are "fast", for instance, a6200 is very
slow (6x and more) in the first set, then normal on the subsequent test.
Nevertheless, when the same node appears in temporally nearby tests,
there seems to be a correlation (though there is certainly not enough
data here to establish that with confidence).

As a final observation, I think the performance in all cases is
unreasonably low since the same test on a (unrelated to the cluster)
2-socket Opteron 2356 (quad core, 2.3 GHz) always takes between 9.75 and
10.0 seconds, 30% faster than the fastest observations on the cluster
nodes with faster cores and memory.

#  JOB   TIME (s)  HOST

ompirun
lsf.o240562 killed   8*a6200
lsf.o240563 9.2110e+01   8*a6200
lsf.o240564 1.5638e+01   8*a6237
lsf.o240565 1.3873e+01   8*a6228

ompirun -mca btl self,sm
lsf.o240574 1.6916e+01   8*a6237
lsf.o240575 1.7456e+01   8*a6200
lsf.o240576 1.4183e+01   8*a6161
lsf.o240577 1.3254e+01   8*a6203
lsf.o240578 1.8848e+01   8*a6274

prun (quadrics)
lsf.o240602 1.6168e+01   4*a2108+4*a2109
lsf.o240603 1.6746e+01   4*a2110+4*a2111
lsf.o240604 1.6371e+01   4*a2108+4*a2109
lsf.o240606 1.6867e+01   4*a2110+4*a2111

ompirun -mca btl self,openib
lsf.o240776 3.1463e+01   8*a6203
lsf.o240777 3.0418e+01   8*a6264
lsf.o240778 3.1394e+01   8*a6203
lsf.o240779 3.5111e+01   8*a6274

ompirun -mca self,sm,openib
lsf.o240851 1.3848e+01   8*a6244
lsf.o240852 1.7362e+01   8*a6237
lsf.o240854 1.3266e+01   8*a6204
lsf.o240855 1.3423e+01   8*a6276

ompirun
lsf.o240858 1.4415e+01   8*a6244
lsf.o240859 1.5092e+01   8*a6237
lsf.o240860 1.3940e+01   8*a6204
lsf.o240861 1.5521e+01   8*a6276
lsf.o240903 1.3273e+01   8*a6234
lsf.o240904 1.6700e+01   8*a6206
lsf.o240905 1.4636e+01   8*a6269
lsf.o240906 1.5056e+01   8*a6234

ompirun -mca self,tcp
lsf.o240948 1.8504e+01   8*a6234
lsf.o240949 1.9317e+01   8*a6207
lsf.o240950 1.8964e+01   8*a6234
lsf.o240951 2.0764e+01   8*a6207

ompirun -mca btl self,sm,openib
lsf.o240998 1.3265e+01   8*a6269
lsf.o240999 1.2884e+01   8*a6269
lsf.o241000 1.3092e+01   8*a6234
lsf.o241001 1.3044e+01   8*a6269

ompirun -mca btl self,openib
lsf.o241013 3.1572e+01   8*a6229
lsf.o241014 3.0552e+01   8*a6234
lsf.o241015 3.1813e+01   8*a6229
lsf.o241016 3.2514e+01   8*a6252

ompirun -mca btl self,sm
lsf.o241044 1.3417e+01   8*a6234
lsf.o241045 killed   8*a6232
lsf.o241046 1.4626e+01   8*a6269
lsf.o241047 1.5060e+01   8*a6253
lsf.o241166 1.3179e+01   8*a6228
lsf.o241167 2.7759e+01   8*a6232
lsf.o241168 1.4224e+01   8*a6234
lsf.o241169 1.4825e+01   8*a6228
lsf.o241446 1.4896e+01   8*a6204
lsf.o241447 1.4960e+01   8*a6228
lsf.o241448 1.7622e+01   8*a6222
lsf.o241449 1.5112e+01   8*a6204

ompirun -mca btl self,tcp
lsf.o241556 1.9135e+01   8*a6204
lsf.o241557 2.4365e+01   8*a6261
lsf.o241558

Re: [OMPI users] Solving SVD Using Lanczos Method Implementation

2010-04-26 Thread Jed Brown

On Mon, 26 Apr 2010 22:30:15 +0700, long thai  
wrote:
> Hi all.
> 
> I'm trying to develop MPI program to solve SVD using Lanczos algorithms.
> However, I have no idea how to do that. Somebody suggested to take a look at
> http://www.netlib.org/scalapack/ but I cannot understand exactly what to
> look. Morever, I know that *las2* is the popular library to solve SVD but
> don't know how to use it in parallel computing.

I recommend SLEPc

  http://www.grycap.upv.es/slepc/

There are plenty of examples and a variety of algorithms for scalable
computation of SVDs.

Jed

Re: [OMPI users] How to "guess" the incoming data type ?

2010-04-26 Thread Jed Brown

On Sun, 25 Apr 2010 20:38:54 -0700, Eugene Loh  wrote:
> Could you encode it into the tag?

This sounds dangerous.

> Or, append a data type to the front of each message?

This is the idea, unfortunately this still requires multiple messages
for collectives (because you can't probe for a suitable buffer size, and
no dynamic language will be happy with "the buffer can be anything as
long as it's type is in this list and the total number of bytes is N").

This file is a pretty easy to read reference for a friendly MPI in
dynamic language:

  http://mpi4py.googlecode.com/svn/trunk/src/MPI/pickled.pxi

Note that mpi4py also exposes the low-level functionality.  Numpy arrays
can be sent without pickling:

  http://mpi4py.googlecode.com/svn/trunk/src/MPI/asbuffer.pxi

Something that could be done to prevent packing in some cases is to
define an MPI datatype for the send and receive types, but this will
usually require an extra message because the receiver has to wire up an
empty object that is ready to receive the incoming message.

Jed

Re: [OMPI users] How to debug Open MPI programs with gdb

2010-04-22 Thread Jed Brown

On Thu, 22 Apr 2010 13:11:49 +0200, "=?utf-8?b?0J3QtdC80LDRmtCwINCY0LvQuNGb?= 
(Nemanja Ilic)"  wrote:
> On the contrary when I debug with "mpirun -np 4 xterm -e gdb
> my_mpi_application" the four debugger windows are started with
> separate thread each, just as it should be.  Since I will be using
> debugger on a remote computer I can only run gdb in console mode. Can
> anyone help me with this?

An alternative to opening xterms (e.g. if that host isn't running an X
server, you can't get X11 forwarding to work, or you just don't want
xterms) is to use GNU "screen".  It's basically the same command line,
but it will open a screen terminal for each thread.  When debugging
multiple threads with xterms or screens, I recommend gdb's -ex 'break
somewhere' -ex run --args ./app -args -for -your application to save you
from entering commands into each terminal separately.

Jed

Re: [OMPI users] 3D domain decomposition with MPI

2010-03-13 Thread Jed Brown

On Fri, 12 Mar 2010 15:06:33 -0500, Gus Correa  wrote:
> Hi Cole, Jed
> 
> I don't have much direct experience with PETSc.

Disclaimer: I've been using PETSc for several years and also work on the
library itself.

> I mostly troubleshooted other people's PETSc programs,
> and observed their performance.
> What I noticed is:
> 1) PETSc's learning curve is as steep if not steeper than MPI, and

I think this depends strongly on what you want to do.  Since the library
is built on top of MPI, it's sort of trivially true since it's
beneficial for the user to be familiar with collective semantics, and
perhaps other MPI functionality depending on the level of control that
they seek.  That said, many PETSc users never call MPI directly.

> 2) PETSc codes seem to be slower (or have more overhead)
> than codes written directly in MPI.
> Jed seems to have a different perception of PETSc, though,
> and is more enthusiastic about it.
> 
> Admittedly, I don't have any direct comparison
> (i.e. the same exact code implemented via PETSc and via MPI),
> to support what I said above.

If you do find such a comparison, we'd like to see it.  We expose a
limited amount of interfaces that are known to perform/scale poorly,
because users who are not concerned about scalability ask for them so
often.  These should be clearly marked, we'll fix the docs if this is
not the case.

Note that PETSc's neighbor updates use persistant nonblocking calls by
default, but you can select alltoallw, one-sided, ready-send,
synchronous sends, and a couple other options, with and without packing
(choice at runtime).  If you know of a faster way, we'd like to see it.

Note that a default build is in debugging mode which activates lots of
integrity checks, checks for memory corruption, etc., and is usually 2
or 3 times slower than a production build (--with-debugging=0).

> OTOH, if you have a clean and good serial code already developed,
> I think it won't be a big deal to parallelize it directly
> with MPI, assuming that the core algorithm (your Gauss-Seidel solver)
> fits the remaining code in a well structured way.

This depends a lot on the structure of the serial code.  Bill Gropp had
a great quote in the last rce-cast (starts at 38:30, in response to
Brock Palen's question about what to think about when designing a
parallel program):

  I think the first thing they should keep in mind is to see whether
  they can succeed without using MPI.  After all, one of the things that
  we try to do with MPI is to encourage the development of libraries.
  All too often we see people who are reinverting "PETSc-light" instead
  of just pulling up the library and using it.  MPI enabled an entire
  parallel ecosystem for scientific software and the first thing you
  should do is see if you've already had someone else do the job for
  you.  I think after that, if you actually have to write the code, then
  you have to confront the top-down versus bottom-up.  And the next
  mistake that people make is they write the individual node code and
  then try to figure out how to glue it together to all of the other
  nodes.  And we really feel that for many applications, what you want
  to do is to start by viewing your application as a global application,
  have global data structures, figure out how you decompose it, and then
  the code to coordinate the communication between them will be pretty
  obvious.  And you can tell the difference between how an application
  was built, from whether it was top-down or bottom-up.

[...]

  You want to think about how you decompose your data structures, how
  you think about them globally.  Rather than saying, I need to think
  about everything in parallel, so I'll have all these little patches,
  and I'll compute on them, and then figure out how to stitch them
  together.  If you were building a house, you'd start with a set of
  blueprints that give you a picture of what the whole house looks like.
  You wouldn't start with a bunch of tiles and say. "Well I'll put this
  tile down on the ground, and then I'll find a tile to go next to it."
  But all too many people try to build their parallel programs by
  creating the smallest possible tiles and then trying to have the
  structure of their code emerge from the chaos of all these little
  pieces.  You have to have an organizing principle if you're going to
  survive making your code parallel.

Jed

Re: [OMPI users] 3D domain decomposition with MPI

2010-03-11 Thread Jed Brown

On Thu, 11 Mar 2010 12:44:01 -0500, "Cole, Derek E"  
wrote:
> I am replying to this via the daily-digest message I got. Sorry it
> wasn't sooner... I didn't realize I was getting replies until I got
> the digest. Does anyone know how to change it so I get the emails as
> you all send them?

Log in at the bottom and edit options:

  http://www.open-mpi.org/mailman/listinfo.cgi/users

> I am doing a Red-black Gauss-Seidel algorithm.

Note that red-black Guss-Seidel is a terrible algorithm on cache-based
hardware, it only makes sense on vector hardware.  The reason for this
is that the whole point is to approximate a dense action (the inverse of
a matrix), but the red-black ordering causes this action to be purely
local.  A sequential ordering, on the other hand, is like a dense
lower-triangular operation, which tends to be much closer to a real
inverse.  In parallel, you do these sequential sweeps on each process,
and communicate when you're done.  The memory performance will be twice
as good, and the algorithm will converge in fewer iterations.

> The ghost points are what I was trying to figure for moving this into
> the 3rd dimension. Thanks for adding some concrete-ness to my idea of
> exactly how much overhead is involved. The test domains I am computing
> on are on the order of 100*50*50 or so. This is why I am trying to
> limit the overhead of the MPI communication. I am in the process of
> finding out exactly how big the domains may become, so that I can
> adjust the algorithm accordingly.

The difficulty is for small subdomains.  If you have large subdomains,
then parallel overhead will almost always be small.

> If I understand what you mean by pencils versus books, I don't think
> that will work for these types of calculations will it? Maybe a better
> explanation of what you mean by a pencil versus a book. If I was going
> to solve a sub-matrix of the XY plane for all Z-values, what is that
> considered?

That would be a "book" or "slab".

I still recommend using PETSc rather than reproducing standard code to
call MPI directly for this, it will handle all the decomposition and
updates, and has the advantage that you'll be able to use much better
algorithms than Gauss-Seidel.

Jed

Re: [OMPI users] 3D domain decomposition with MPI

2010-03-11 Thread Jed Brown

On Wed, 10 Mar 2010 22:25:43 -0500, Gus Correa  wrote:

> Ocean dynamics equations, at least in the codes I've seen,
> normally use "pencil" decomposition, and are probably harder to
> handle using 3D "chunk" decomposition (due to the asymmetry imposed by
> gravity).

There is also a lot to be said for the strength of coupling.  Ocean
codes do "tridiagonal solves" in columns, and these would no longer be
trivially cheap (in fact the structure of the code would need to change)
if the partition also broke up the vertical.  Since the domain is so
anisotropic and the coupling (at least aside from the barotropic mode)
is so much stronger in the vertical than the horizontal, it is good to
decompose with columns always kept together.  In a fully implicit code,
these column solves would quit being mandatory, but the availability of
a "line smoother" for multigrid still favors this type of decomposition.

Also note that in domain decomposition algorithms (like additive
Schwarz, and balancing Neumann-Neumann), the asymptotics scale with the
maximum number of subdomains required to cross the domain, and/or with
the number of elements along the longest edge of a subdomain.  This
tends to favor partitioning in 3D, unless the physics/domain is
sufficiently anisotropic to overcome this preference.

Also depending on Derek's application, he may want to use a library like
PETSc to handle the decomposition and updates.  Certainly this is true
if the application may ever need solvers; in my opinion, it is also true
unless this is a toy problem being used to learn MPI.  If you really
want to write this stuff yourself, it's still worth looking at the
discussion in PETSc user's manual.

Jed

Re: [OMPI users] MPi Abort verbosity

2010-02-24 Thread Jed Brown

On Wed, 24 Feb 2010 14:21:02 +0100, Gabriele Fatigati  
wrote:
> Yes, of course,
> 
> but i would like to know if there is any way to do that with openmpi

See the error handler docs, e.g. MPI_Comm_set_errhandler.

Jed

Re: [OMPI users] Similar question about MPI_Create_type

2010-02-08 Thread Jed Brown

On Mon, 08 Feb 2010 14:42:15 -0500, Prentice Bisbal  wrote:
> I'll give that a try, too. IMHO, MPI_Pack/Unpack looks easier and less
> error prone, but Pacheco advocates using derived types over
> MPI_Pack/Unpack.

I would recommend using derived types for big structures, or perhaps for
long-lived medium-sized structures.  If your structure is static
(i.e. doesn't contain pointers), then derived types definitely make
sense and allow you to use that type in collectives.

> In my situation, rank 0 is reading in a file containing all the coords.
> So even if other ranks don't have the data, I still need to create the
> structure on all the nodes, even if I don't populate it with data?

You're populating it by receiving data.  MPI can't allocate the space
for you, so you have to it up.

> To clarify: I thought adding a similar structure, b_point in rank 1
> would be adequate to receive the data from rank 0

You have allocated memory by the time you call MPI_Recv, but you were
passing an undefined value to MPI_Address, and you certainly can't base
derived_type an a_point and use it to receive into b_point.

It would be fine to receive into a_point on rank 1, but whatever you do,
derived_type has to be created correctly on each process.

Jed

Re: [OMPI users] Similar question about MPI_Create_type

2010-02-08 Thread Jed Brown

On Mon, 08 Feb 2010 13:54:10 -0500, Prentice Bisbal  wrote:
> but I don't have that book handy

The standard has lots of examples.

  http://www.mpi-forum.org/docs/docs.html

You can do this, but for small structures, you're better off just
packing buffers.  For large structures containing variable-size fields,
I think it is clearer to use MPI_BOTTOM instead of offsets from an
arbitrary (instance-dependent) address.

[...]

>   if (rank == 0) {
> a_point.index = 1;
> a_point.coords = malloc(3 * sizeof(int));
> a_point.coords[0] = 3;
> a_point.coords[1] = 6;
> a_point.coords[2] = 9;
>   }
> 
>   block_lengths[0] = 1;
>   block_lengths[1] = 3;
> 
>   type_list[0] = MPI_INT;
>   type_list[1] = MPI_INT;
> 
>   displacements[0] = 0;
>   MPI_Address(&a_point.index, &start_address);
>   MPI_Address(a_point.coords, &address);
^^

Rank 1 has not allocated this yet.

Jed

Re: [OMPI users] Difficulty with MPI_Unpack

2010-02-08 Thread Jed Brown

On Sun, 07 Feb 2010 22:40:55 -0500, Prentice Bisbal  wrote:
> Hello, everyone. I'm having trouble packing/unpacking this structure:
> 
> typedef struct{
>   int index;
>   int* coords;
> }point;
> 
> The size of the coords array is not known a priori, so it needs to be a
> dynamic array. I'm trying to send it from one node to another using
> MPI_Pack/MPI_Unpack as shown below. When I unpack it, I get this error
> when unpacking the coords array:
> 
> [fatboy:07360] *** Process received signal ***
> [fatboy:07360] Signal: Segmentation fault (11)
> [fatboy:07360] Signal code: Address not mapped (1)
> [fatboy:07360] Failing at address: (nil)

Looks like b_point.coords = NULL.  Has this been allocated on rank=1?

You might need to use MPI_Get_count to decide how much to allocate.
Also, if you don't have a convenient upper bound on the size of the
receive buffer, you can use MPI_Probe followed by MPI_Get_count to
determine this before calling MPI_Recv.

Jed

Re: [OMPI users] [mpich-discuss] problem with MPI_Get_count() for very long (but legal length) messages.

2010-02-06 Thread Jed Brown

On Fri, 5 Feb 2010 14:28:40 -0600, Barry Smith  wrote:
> To cheer you up, when I run with openMPI it runs forever sucking down  
> 100% CPU trying to send the messages :-)

On my test box (x86 with 8GB memory), Open MPI (1.4.1) does complete
after several seconds, but still prints the wrong count.

MPICH2 does not actually send the message, as you can see by running the
attached code.

  # Open MPI 1.4.1, correct cols[0]
  [0] sending...
  [1] receiving...
  count -103432106, cols[0] 0

  # MPICH2 1.2.1, incorrect cols[1]
  [1] receiving...
  [0] sending...
  [1] count -103432106, cols[0] 1

How much memory does crush have (you need about 7GB to do this without
swapping)?  In particular, most of the time it took Open MPI to send the
message (with your source) was actually just spent faulting the
send/recv buffers.  The attached faults the buffers first, and the
subsequent send/recv takes less than 2 seconds.

Actually, it's clear that MPICH2 never touches either buffer because it
returns immediately regardless of whether they have been faulted first.

Jed

#include 
#include 
#include 

int main(int argc,char **argv)
{
  intierr,i,size,rank;
  intcnt = 433438806;
  MPI_Status status;
  long long  *cols;

  MPI_Init(&argc,&argv);
  ierr = MPI_Comm_size(MPI_COMM_WORLD,&size);
  ierr = MPI_Comm_rank(MPI_COMM_WORLD,&rank);
  if (size != 2) {
fprintf(stderr,"[%d] usage: mpiexec -n 2 %s\n",rank,argv[0]);
MPI_Abort(MPI_COMM_WORLD,1);
  }

  cols = malloc(cnt*sizeof(long long));
  for (i=0; i

Re: [OMPI users] speed up this problem by MPI

2010-01-29 Thread Jed Brown

On Fri, 29 Jan 2010 11:25:09 -0500, Richard Treumann  
wrote:
> Any support for automatic serialization of C++ objects would need to be in
> some sophisticated utility that is not part of MPI.  There may be such
> utilities but I do not think anyone who has been involved in the discussion
> knows of one you can use.  I certainly do not.

C++ really doesn't offer sufficient type introspection to implement
something like this.  Boost.MPI offers serialization for a few types
(e.g. some STL containers), but the general solution that you would like
just doesn't exist (you'd have to write special code for every type you
want to be able to operate on).

Python can do things like this, mpi4py can operate transparently on any
(pickleable) object, and also offers complete bindings to the low-level
MPI interface.  CL-MPI (Common Lisp) can also do these things, but it's
much less mature than mpi4py.

Jed

Re: [OMPI users] ABI stabilization/versioning

2010-01-26 Thread Jed Brown

On Tue, 26 Jan 2010 11:15:45 +, Dave Love  wrote:
> > Versions where bumped to 0.0.1 for libmpi which has no
> > effect for dynamic linking.
> 
> I've forgotten the rules on this, but the point is that it needs to
> affect dynamic linking to avoid running with earlier libraries
> (specifically picking up ones from 1.2, which is the most common
> problem).

Dave, I think you are correct that this has not actually been done.  In
particular, I have 1.4.1 installed, but the soname is still libmpi.so.0.
It's irrelevant that the symbolic links are set up for libmpi.so.0.0.1,
this minor versioning only affects which DSO gets used when the linker
(not the loader) sees -lmpi.

And inspecting a binary built in Sep 2008 (must have been 1.2.7), ldd
resolves to my 1.4.1 copy without complaints.  However, the loader is
intelligent and at least offers a warning when I try to run this ancient
binary

  ./a.out: Symbol `ompi_mpi_comm_null' has different size in shared object, 
consider re-linking

Chapter 3 of this paper was useful to me when learning about ABI
versioning.

  http://people.redhat.com/drepper/dsohowto.pdf

Jed

Re: [OMPI users] ABI stabilization/versioning

2010-01-25 Thread Jed Brown

On Mon, 25 Jan 2010 15:10:12 -0500, Jeff Squyres  wrote:
> Indeed.  Our wrapper compilers currently explicitly list all 3
> libraries (-lmpi -lopen-rte -lopen-pal) because we don't know if those
> libraries will be static or shared at link time.

I am suggesting that it is unavoidable for the person doing the linking
to be explicit about whether they want static or dynamic libs when they
invoke mpicc.  Consider the pkg-config model where you might write

  gcc -static -o my-app main.o `pkg-config --libs --static openmpi fftw3`

  gcc -o my-app main.o `pkg-config --libs openmpi fftw3`

In MPI world,

  gcc -static -o my-app main.o `mpicc -showme:link-static` `pkg-config --libs 
--static fftw3`

  gcc -o my-app main.o `mpicc -showme:link` `pkg-config --libs fftw3`

seems tolerable.  The trick (as you point out) is to get the option
processed when the wrapper is being invoked as the compiler instead of
just for the -showme options.  Possible options are defining an
OMPI_STATIC environment variable or inspecting argv for --link:static
(or some such).  This is one many the reasons why wrappers are a
horrible solution, especially when they are expected to be used in
nontrivial cases.

Ideally, the adopted plan could be done in some coordination with MPICH2
(which lacks a -showme:link analogue) so that it is not so hard to write
portable build systems.

> > On the cited bug report, I just wanted to note that collapsing
> > libopen-rte and libopen-pal (even only in production builds) has the
> > undesirable effect that their ABI cannot change without incrementing
> > the soname of libmpi (i.e. user binaries are coupled just as tightly
> > to these libraries as when they were separate but linked explicitly,
> > so this offers no benefit at all).
> 
> Indeed -- this is exactly the reason we ended up leaving libopen-* .so
> versions at 0:0:0.

But not versioning those libs isn't much of a solution either since it
becomes possible to get an ABI mismatch at runtime (consider someone who
uses them independently, or if they are packaged separately as in a
distribution so that it becomes possible to update these out from
underneath libmpi).

> There's an additional variable -- we had considered collapsing all 3
> libraries into libmpi for production builds,

My point was that this is no solution at all since you have to bump the
soname any time you change libopen-*.  So even users who NEVER call into
libopen-* have to relink any time something happens there, despite their
interface not changing.  And that is exactly the situation if the
wrappers continue to overlink AND libopen-* became versioned, so at
least by keeping them separate, you give users the option of not
overlinking (albeit manually) and the option of using libopen-* without
libmpi.

> Yuck.

It's 2010 and we still don't have a standard way to represent link
dependencies (pkg-config might be the closest thing, but it's bad if you
have multiple versions of the same library, and the granularity is
wrong, e.g. if you want to link some exotic lib statically and the
common ones dynamically).

Jed

Re: [OMPI users] ABI stabilization/versioning

2010-01-25 Thread Jed Brown

On Mon, 25 Jan 2010 09:09:47 -0500, Jeff Squyres  wrote:
> The short version is that the possibility of static linking really
> fouls up the scheme, and we haven't figured out a good way around this
> yet.  :-(

So pkg-config addresses this with it's Libs.private field and an
explicit command-line argument when you want static libs, e.g.

  $ pkg-config --libs libavcodec
  -lavcodec  
  $ pkg-config --libs --static libavcodec
  -pthread -lavcodec -lz -lbz2 -lfaac -lfaad -lmp3lame -lopencore-amrnb 
-lopencore-amrwb -ltheoraenc -ltheoradec -lvorbisenc -lvorbis -logg -lx264 -lm 
-lxvidcore -ldl -lasound -lavutil

There is no way to simultaneously (a) prevent overlinking shared libs
and (b) correctly link static libs without an explicit statement from
the user about whether to link *your library* statically or dynamically.

Unfortunately, pkgconfig doesn't work well with multiple builds of a
package, and doesn't know how to link some libs statically and some
dynamically.

On the cited bug report, I just wanted to note that collapsing
libopen-rte and libopen-pal (even only in production builds) has the
undesirable effect that their ABI cannot change without incrementing the
soname of libmpi (i.e. user binaries are coupled just as tightly to
these libraries as when they were separate but linked explicitly, so
this offers no benefit at all).

Jed

Re: [OMPI users] More NetBSD fixes

2010-01-15 Thread Jed Brown

On Thu, 14 Jan 2010 21:55:06 -0500, Jeff Squyres  wrote:
> That being said, you could sign up on it and then set your membership to 
> receive no mail...?

This is especially dangerous because the Open MPI lists munge the
Reply-To header, which is a bad thing

  http://www.unicom.com/pw/reply-to-harmful.html

But lots of mailers have poor default handling of mailing lists, so it's
complicated.

With munging, a mailer's "reply-to-sender" function will send mail
*only* to the list and "reply-to-all" will send it to the list and any
other recipients, but *not* the sender (unless the mailer does special
detection of munged reply-to headers).  This makes it rather difficult
to participate in a discussion without receiving mail from the list, or
even to reliably filter list traffic (you have to write filter rules
that walk the References tree to find if it is something that would be
interesting to you, and then you get false positives from people who
reply to an existing thread when they wanted to make a new thread).

Jed

Re: [OMPI users] MPI debugger

2010-01-11 Thread Jed Brown

On Sun, 10 Jan 2010 19:29:18 +, Ashley Pittman  wrote:
> It'll show you parallel stack traces but won't let you single step for
> example.

Two lightweight options if you want stepping, breakpoints, watchpoints,
etc.

* Use serial debuggers on some interesting processes, for example with

mpiexec -n 1 xterm -e gdb --args ./trouble args : -n 2 ./trouble args : -n 
1 xterm -e gdb --args ./trouble args

  to put an xterm on rank 0 and 3 of a four process job (there are lots
  of other ways to get here).

* MPICH2 has a poor-man's parallel debugger, mpiexec.mpd -gdb allows you
  to send the same gdb commands to each process and collate the output.

Jed

Re: [OMPI users] Wrappers should put include path after user args

2009-12-04 Thread Jed Brown

On Fri, 4 Dec 2009 16:20:23 -0500, Jeff Squyres  wrote:
> Oy -- more specifically, we should not be putting -I/usr/include on
> the command line *at all* (because it's special and already included
> by the compiler search paths; similar for /usr/lib and /usr/lib64).

If I remember correctly, the issue was that some versions of gfortran
were not searching /usr/include for mpif.h.

> Can you send the contents of your 
> $prefix/share/openmpi/mpif90-wrapper-data.txt?

Attached.

Jed

# There can be multiple blocks of configuration data, chosen by
# compiler flags (using the compiler_args key to chose which block
# should be activated.  This can be useful for multilib builds.  See the
# multilib page at:
#https://svn.open-mpi.org/trac/ompi/wiki/compilerwrapper3264 
# for more information.

project=Open MPI
project_short=OMPI
version=1.3.4
language=Fortran 90
compiler_env=FC
compiler_flags_env=FCFLAGS
compiler=/usr/bin/gfortran
module_option=-I
extra_includes=
preprocessor_flags=
compiler_flags=-pthread  
linker_flags=
libs=-lmpi_f90 -lmpi_f77 -lmpi -lopen-rte -lopen-pal   -ldl   
-Wl,--export-dynamic -lnsl -lutil -lm -ldl 
required_file=
includedir=${includedir}
libdir=${libdir}

[OMPI users] Wrappers should put include path after user args

2009-12-04 Thread Jed Brown

Open MPI is installed by the distro with headers in /usr/include

  $ mpif90 -showme:compile -I/some/special/path
  -I/usr/include -pthread -I/usr/lib/openmpi -I/some/special/path

Here's why it's a problem:

HDF5 is also installed in /usr with modules at /usr/include/h5*.mod.  A
new HDF5 cannot be compiled using the wrappers because it will always
resolve the USE statements to /usr/include which is binary-incompatible
with the the new version (at a minimum, they "fixed" the size of an
argument to H5Lget_info_f between 1.8.3 and 1.8.4).

To build the library, the current choices are 

  (a) get rid of the system copy before building
  (b) not use mpif90 wrapper


I just checked that MPICH2 wrappers consistently put command-line args
before the wrapper args.

Jed

Re: [OMPI users] Program deadlocks, on simple send/recv loop

2009-12-03 Thread Jed Brown

On Thu, 3 Dec 2009 12:21:50 -0500, Jeff Squyres  wrote:
> On Dec 3, 2009, at 10:56 AM, Brock Palen wrote:
> 
> > The allocation statement is ok:
> > allocate(vec(vec_size,vec_per_proc*(size-1)))
> > 
> > This allocates memory vec(32768, 2350)

It's easier to translate to C rather than trying to read Fortran
directly.

  #define M 2350
  #define N 32768
  complex double vec[M*N];

> This means that in the first iteration, you're calling:
> 
> irank = 1
> ivec = 1
> vec_ind = (47 - 1) * 50 + 1 = 
> call MPI_RECV(vec(1, 2301), 32768, ...)

  MPI_Recv(&vec[2300*N],N,...);

> And in the last iteration, you're calling:
> 
> irank = 47
> ivec = 50
> vec_ind = (47 - 1) * 50 + 50 = 
> call MPI_RECV(vec(1, 2350), 32768, ...)

  MPI_Recv(&vec[2349*N],N,...);

> That doesn't seem right.

Should be one non-overlapping column (C row) at a time.  It will be
contiguous in memory, but this isn't using that property.

Jed

Re: [OMPI users] segmentation fault: Address not mapped

2009-11-23 Thread Jed Brown

On Mon, 23 Nov 2009 10:39:28 -0800, George Bosilca  wrote:
> In the case of Open MPI we use pointers, which are different than int  
> on most cases

I just want to comment that Open MPI's opaque (to the user) pointers are
significantly better than int because it offers type safety.  That is,
the compiler can distinguish between MPI_Comm, MPI_Group, MPI_Status,
MPI_Op, etc., and warn you if you mix them up.  When they are all
typedef'd to int, you get no such warnings, and instead just get runtime
errors/crashes.

> (btw int is what MPICH is using I think).

It is.

Jed

Re: [OMPI users] memchecker overhead?

2009-10-26 Thread Jed Brown

Jeff Squyres wrote:

> Verbs and Open MPI don't have these options on by default because a)
> you need to compile against Valgrind's header files to get them to
> work, and b) there's a tiny/small amount of overhead inserted by OMPI
> telling Valgrind "this memory region is ok", but we live in an
> intensely competitive HPC environment.

It's certainly competitive, but we spend most of our implementation time
getting things correct rather than tuning.  The huge speed benefits come
from algorithmic advances, and finding bugs quickly makes the
implementation of new algorithms easier.  I'm not arguing that it should
be on by default, but it's helpful to have an environment where the
lower-level libs are valgrind-clean.  These days, I usually revert to
MPICH when hunting something with valgrind, but use OMPI most other
times.

> The option to enable this Valgrind Goodness in OMPI is --with-valgrind. 
> I *think* the option may be the same for libibverbs, but I don't
> remember offhand.

I see plenty of warning over btl sm.  Several variations, including the
excessive

--enable-debug --enable-mem-debug --enable-mem-profile \
   --enable-memchecker --with-valgrind=/usr

were not sufficient.  (I think everything in this line except
--with-valgrind increases the number of warnings, but it's nontrivial
with plain --with-valgrind.)

Thanks,

Jed

signature.asc
Description: OpenPGP digital signature

Re: [OMPI users] memchecker overhead?

2009-10-26 Thread Jed Brown

Samuel K. Gutierrez wrote:
> Hi Jed,
> 
> I'm not sure if this will help, but it's worth a try.  Turn off OMPI's
> memory wrapper and see what happens.
> 
> c-like shell
> setenv OMPI_MCA_memory_ptmalloc2_disable 1
> 
> bash-like shell
> export OMPI_MCA_memory_ptmalloc2_disable=1
> 
> Also add the following MCA parameter to you run command.
> 
> --mca mpi_leave_pinned 0

Thanks for the tip, but these make very little difference.

Jed



signature.asc
Description: OpenPGP digital signature

Re: [OMPI users] memchecker overhead?

2009-10-26 Thread Jed Brown

Jeff Squyres wrote:
> Using --enable-debug adds in a whole pile of developer-level run-time
> checking and whatnot.  You probably don't want that on production runs.

I have found that --enable-debug --enable-memchecker actually produces
more valgrind noise than leaving them off.  Are there options to make
Open MPI strict about initializing and freeing memory?  At one point I
tried to write policy files, but even with judicious globbing, I kept
getting different warnings when run on a different program.  (All these
codes were squeaky-clean under MPICH2.)

Jed

signature.asc
Description: OpenPGP digital signature

Re: [OMPI users] Question about OpenMPI performance vs. MVAPICH2

2009-09-20 Thread Jed Brown

Brian Powell wrote:
> I ran a final test which I find very strange: I ran the same test case
> on 1 cpu. The MVAPICH2 case was 23% faster!?!? This makes little sense
> to me. Both are using ifort as the mpif90 compiler using *identical*
> optimization flags, etc. I don't understand how the results could be
> different.

Are you saying the output of mpicc/mpif90 -show has the same
optimization flags?  MPICH2 usually puts it's own optimization flags
into the wrappers.

Jed



signature.asc
Description: OpenPGP digital signature

[OMPI users] MPI_Barrier called late within ompi_mpi_finalize when MPIIO fd not closed

2009-07-20 Thread Jed Brown

This helped me track down a leaked file descriptor, but I think the
order of events is not desirable.  If an MPIIO file descriptor is not
closed before MPI_Finalize, I get the following.


*** An error occurred in MPI_Barrier
*** after MPI was finalized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[brakk:1193] Abort after MPI_FINALIZE completed successfully; not able to 
guarantee that all other processes were killed!
[Switching to Thread 0x7fa523b78710 (LWP 1193)]

Breakpoint 2, 0x7fa51ed39a20 in exit () from /lib/libc.so.6
(gdb) bt
#0  0x7fa51ed39a20 in exit () from /lib/libc.so.6
#1  0x7fa520ff6613 in ompi_mpi_abort () from /usr/lib/libmpi.so.0
#2  0x7fa520fe59b7 in ompi_mpi_errors_are_fatal_comm_handler () from 
/usr/lib/libmpi.so.0
#3  0x7fa52100acb2 in PMPI_Barrier () from /usr/lib/libmpi.so.0
#4  0x7fa52106638a in mca_io_romio_dist_MPI_File_close () from 
/usr/lib/libmpi.so.0
#5  0x7fa520feaa2e in file_destructor () from /usr/lib/libmpi.so.0
#6  0x7fa520fea7c1 in ompi_file_finalize () from /usr/lib/libmpi.so.0
#7  0x7fa520ff7496 in ompi_mpi_finalize () from /usr/lib/libmpi.so.0
#8  0x7fa5233bc2d1 in PetscFinalize () at pinit.c:897
#9  0x00402091 in main (argc=1, args=0x7fff70f1f498) at ex5.c:72



Open MPI 1.3.3, GCC-4.4.0
Linux brakk 2.6.30-ARCH #1 SMP PREEMPT Fri Jun 19 20:44:03 UTC 2009 x86_64 
Intel(R) Core(TM)2 Duo CPU T9300 @ 2.50GHz GenuineIntel GNU/Linux

Jed



signature.asc
Description: OpenPGP digital signature

Re: [OMPI users] Bogus memcpy or bogus valgrind record

2009-04-29 Thread Jed Brown

Jeff Squyres wrote:

> But I'm able to replicate your error (but shouldn't the 2nd buffer be
> the 1st + size (not 2)?) -- let me dig into it a bit... we definitely
> shouldn't be getting invalid writes in the convertor, etc.

As Eugene pointed out earlier, it is fine.

  dataloctab = malloc (2 * (procglbnbr + 1) * sizeof (int));
  dataglbtab = dataloctab + 2;

dataloctab is the 2-element send buffer, dataglbtab is the receive
buffer of length 2*procglbnbr.

Jed

signature.asc
Description: OpenPGP digital signature

Re: [OMPI users] Open MPI programs with autoconf/automake?

2008-11-10 Thread Jed Brown

On Mon 2008-11-10 12:35, Raymond Wan wrote:

> One thing I was wondering about was whether it is possible, though the
> use of #define's, to create code that is both multi-processor
> (MPI/mpic++) and single-processor (normal g++).  That is, if users do
> not have any MPI installed, it compiles it with g++.
>
> With #define's and compiler flags, I think that can be  easily done --  
> was wondering if this is something that developers using MPI do and  
> whether AC/AM  supports it.

The normal way to do this is by building against a serial implementation
of MPI.  Lots of parallel numerical libraries bundle such an
implementation so you could just grab one of those.  For example, see
PETSc's mpiuni ($PETSC_DIR/include/mpiuni/mpi.h and
$PETSC_DIR/src/sys/mpiuni/mpi.c) which implements many MPI calls as
macros.

Note that your serial implementation only needs to provide the subset of
MPI that your program actually uses.  For instance, if you never send
messages to yourself, you can implement MPI_Send as MPI_Abort since it
should never be called in serial.

Jed

pgprlwscpafzZ.pgp
Description: PGP signature

Re: [OMPI users] OpenMPI runtime-specific environment variable?

2008-10-22 Thread Jed Brown

On Wed 2008-10-22 00:40, Reuti wrote:
>
> Okay, now I see. Why not just call MPI_Comm_size(MPI_COMM_WORLD,  
> &nprocs) When nprocs is 1, it's a serial run. It can also be executed  
> when not running within mpirun AFAICS.

This is absolutely NOT okay.  You cannot call any MPI functions before
MPI_Init (and at least OMPI 1.2+ and MPICH2 1.1a will throw an error if
you try).

I'm slightly confused about the original problem.  Is the program linked
against an MPI when running in serial?  You have to recompile anyway if
you change MPI implementation, so if it's not linked against a real MPI
then you know at compile time.  But what is the problem with calling
MPI_Init for a serial job?  All implementations I've used allow you to
call MPI_Init when the program is run as ./foo (no mpirun).

Jed

pgpQQUWd4bLAL.pgp
Description: PGP signature

Re: [OMPI users] on SEEK_*

2008-10-16 Thread Jed Brown

On Thu 2008-10-16 08:21, Jeff Squyres wrote:
> FWIW: https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/20 is a  
> placemarker for discussion for the upcoming MPI Forum meeting (next  
> week).
>
> Also, be aware that OMPI's 1.2.7 solution isn't perfect, either.  You  
> can see from ticket 20 that it actually causes a problem if you try to  
> use SEEK_SET in a switch/case statement.  But we did this a little  
> better in the trunk/v1.3 (see 
> https://svn.open-mpi.org/trac/ompi/changeset/19494); this solution *does* 
> allow for SEEK_SET to be used in a case statement, but it does always 
> bring in  (probably not a huge deal).

I see.

> The real solution is that we're likely going to change these names to  
> something else in the MPI spec itself.  And/or drop the C++ bindings  
> altogether (see http://lists.mpi-forum.org/mpi-22/2008/10/0177.php).

Radical.  I don't use the C++ bindings anyway.  I especially like
proposal (4) Data in User-Defined Callbacks.

On a related note, it would be nice to be able to call an MPI_Op from
user code.  For instance, I have an irregular Reduce-like operation
where each proc needs to reduce data from a few other procs (much fewer
than the entire communicator).  I implement this using a few nonblocking
point-to-point calls followed by a local reduction.  I would like my
special reduction to accept an arbitrary MPI_Op, but I currently use a
function pointer.  Having a public version of ompi_op_reduce would make
this much cleaner.

> Additionally -- I should have pointed this out in my first mail -- you  
> can also just use MPI_SEEK_SET (and friends).  The spec defines that  
> these constants must have the same values as their MPI::SEEK_*  
> counterparts.

Right, MPI::SEEK_* is never used.

Thanks Jeff.

Jed

pgpH59WXzCO57.pgp
Description: PGP signature

Re: [OMPI users] on SEEK_*

2008-10-16 Thread Jed Brown

On Thu 2008-10-16 07:43, Jeff Squyres wrote:
> On Oct 16, 2008, at 6:29 AM, Jed Brown wrote:
>
> Open MPI doesn't require undef'ing of anything.  It should also not  
> require any special ordering of include files.  Specifically, the  
> following codes both compile fine for me with 1.2.8 and the OMPI SVN  
> trunk (which is what I assume you mean by "-dev"?):

That's what I meant.  This, works with 1.2.7 but not with -dev:

#include 
#undef SEEK_SET
#undef SEEK_CUR
#undef SEEK_END
#include 

If iostream is replaced by stdio, then both fail.

> This is actually a problem in the MPI-2 spec; the names "MPI::SEEK_SET" 
> (and friends) were unfortunately chosen poorly.  Hopefully that'll be 
> fixed relatively soon, in MPI-2.2.

It wasn't addressed in the MPI-2.1 spec I was reading, hence my
confusion.  When namespaces and macros don't play well.

> MPICH chose to handle this situation a different way than we did, and  
> apparently requires that you either #undef something or you #define an  
> MPICH-specific macro.  I guess the portable way might be to just always 
> define that MPICH-specific macro.  It should be harmless for OMPI.

I'll go with this, thanks.

> FWIW, I was chatting with the MPICH developers at the recent MPI Forum  
> meeting and showed them how we did our SEEK_* solution in Open MPI.

Certainly the OMPI solution is better for users.

Jed


pgpnUCoTagZ3S.pgp
Description: PGP signature

[OMPI users] on SEEK_*

2008-10-16 Thread Jed Brown

I've just run into this chunk of code.

/* MPICH2 will fail if SEEK_* macros are defined
 * because they are also C++ enums. Undefine them
 * when including mpi.h and then redefine them
 * for sanity.
 */
#  ifdef SEEK_SET
#define MB_SEEK_SET SEEK_SET
#define MB_SEEK_CUR SEEK_CUR
#define MB_SEEK_END SEEK_END
#undef SEEK_SET
#undef SEEK_CUR
#undef SEEK_END
#  endif
#include "mpi.h"
#  ifdef MB_SEEK_SET
#define SEEK_SET MB_SEEK_SET
#define SEEK_CUR MB_SEEK_CUR
#define SEEK_END MB_SEEK_END
#undef MB_SEEK_SET
#undef MB_SEEK_CUR
#undef MB_SEEK_END
#  endif


MPICH2 (1.1.0a1) gives these errors if SEEK_* are present:

/opt/mpich2/include/mpicxx.h:26:2: error: #error "SEEK_SET is #defined but must 
not be for the C++ binding of MPI"
/opt/mpich2/include/mpicxx.h:30:2: error: #error "SEEK_CUR is #defined but must 
not be for the C++ binding of MPI"
/opt/mpich2/include/mpicxx.h:35:2: error: #error "SEEK_END is #defined but must 
not be for the C++ binding of MPI"

but when SEEK_* is not present and iostream has been included, OMPI-dev
gives these errors.

/home/ompi/include/openmpi/ompi/mpi/cxx/mpicxx.h:53: error: ‘SEEK_SET’ was not 
declared in this scope
/home/ompi/include/openmpi/ompi/mpi/cxx/mpicxx.h:54: error: ‘SEEK_CUR’ was not 
declared in this scope
/home/ompi/include/openmpi/ompi/mpi/cxx/mpicxx.h:55: error: ‘SEEK_END’ was not 
declared in this scope

There is a subtle difference between OMPI 1.2.7 and -dev at least with
GCC 4.3.2.  If iostream was included before mpi.h and then SEEK_* are
#undef'd then 1.2.7 succeeds while -dev fails with the message above.
If stdio.h is included and SEEK_* are #undef'd then both OMPI versions
fail.  MPICH2 requires in both cases that SEEK_* be #undef'd.

What do you recommend to remain portable?  Is this really an MPICH2
issue?  The standard doesn't seem to address this issue.  The MPICH2 FAQ
has this

http://www.mcs.anl.gov/research/projects/mpich2/support/index.php?s=faqs#cxxseek


Jed


pgpDbo1XASXHc.pgp
Description: PGP signature

Re: [OMPI users] compilation error about Open Macro when building the code with OpenMPI on Mac OS 10.5.5

2008-10-08 Thread Jed Brown

On Wed, Oct 8, 2008 at 21:19, Sudhakar Mahalingam  wrote:
> I am having a problem about "Open" Macro's number of arguments, when I try
> to build a C++ code with the openmpi-1.2.7 on my Mac OS 10.5.5 machine. The
> error message is given below. When I look at the file.h and file_inln.h
> header files in the cxx folder, I am seeing that the "Open" function indeed
> takes four arguments but I don't know why there is this error about the
> number of arguments of 4. Does anyone else seen this type of error before ?.

MPI::File::Open is an inline function, not a macro.  You must have an
unqualified Open macro defined in this compilation unit.  Maybe in one
of the headers that were included in your code before hdf5.h.  Does it
work if you include hdf5.h first?

Jed

Re: [OMPI users] Execution in multicore machines

2008-09-29 Thread Jed Brown

On Mon 2008-09-29 20:30, Leonardo Fialho wrote:
> 1) If I use one node (8 cores) the "user" % is around 100% per core. The  
> execution time is around 430 seconds.
>
> 2) If I use 2 nodes (4 cores in each node) the "user" % is around 95%  
> per core and the "sys" % is 5%. The execution time is around 220 seconds.
>
> 3) If I use 4 nodes (1 cores in each node) the "user" % is around %85  
> per core and the "sys" % is 15%. The execution time is around 200 
> seconds.

Do you mean 2 cores per node (1 core per socket).

> Well... the questions are:
>
> A) The execution time in case "1" should be smaller (only sm  
> communication, no?) than case "2" and "3", no? Cache problems?

Is this benchmark memory bandwidth limited?  Your results are fairly
typical for sparse matrix kernels.  One core can more or less saturate
the bus on its own, two cores can overlap memory access so it doesn't
hurt too much, more than two and they are all waiting on memory.  The
extra cores are cheaper than more sockets but they don't do much/any
good for many workloads.

> B) Why the "sys" time while using communication inter nodes? NIC driver?  
> Why this time increase when I balance the load across the nodes?

Messages over Ethernet cost more than messages in shared memory.  When
you only use 1 core per socket, the application is faster because the
single thread has the full memory bandwidth to itself, however MPI needs
to move more data over the wire so that phase costs more.  If your
network was faster (e.g. InfiniBand) you could expect the communication
to stay quite cheap even with only one process per node.

Jed

pgp5Y_f0ZRGxY.pgp
Description: PGP signature

Re: [OMPI users] where is mpif.h ?

2008-09-23 Thread Jed Brown

On Tue 2008-09-23 08:50, Simon Hammond wrote:
> Yes, it should be there.

Shouldn't the path be automatically included by the mpif77 wrapper?  I
ran into this problem when building BLACS (my default OpenMPI 1.2.7
lives in /usr, MPICH2 is at /opt/mpich2).  The build tries

  $ /usr/bin/mpif90 -c -I. -fPIC  -Wno-unused-variable -g bi_f77_mpi_attr_get.f
  Error: Can't open included file 'mpif.h'

but this succeeds

  $ /usr/bin/mpif90 -c -I. -I/usr/include -fPIC  -Wno-unused-variable -g   
bi_f77_mpi_attr_get.f

and this works fine as well

  $ /opt/mpich2/mpif90 -c -I. -fPIC -Wno-unused-variable -g 
bi_f77_mpi_attr_get.f

Is this the expected behavior?

Jed


pgp66MlGd2epW.pgp
Description: PGP signature

1 2 >

100 matches

Mail list logo