from:"Brian Barrett"

Re: [OMPI devel] oshmem test suite errors

2014-02-20 Thread Brian Barrett

On Feb 20, 2014, at 7:10 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:

> For all of these, I'm using the openshmem test suite that is now committed to 
> the ompi-svn SVN repo.  I don't know if the errors are with the tests or with 
> oshmem itself.
> 
> 1. I'm running the oshmem test suite at 32 processes across 2 16-core 
> servers.  I'm seeing a segv in "examples/shmem_2dheat.x 10 10".  It seems to 
> run fine at lower np values such as 2, 4, and 8; I didn't try to determine 
> where the crossover to badness occurs.

My memory is bad and my notes are on a machine I no longer have access to, but 
I did this to the test suite run for Portals SHMEM:

Index: shmem_2dheat.c
===
--- shmem_2dheat.c  (revision 270)
+++ shmem_2dheat.c  (revision 271)
@@ -129,6 +129,11 @@
   p = _num_pes ();
   my_rank = _my_pe ();

+  if (p > 8) {
+  fprintf(stderr, "Ignoring test when run with more than 8 pes\n");
+  return 77;
+  }
+
   /* argument processing done by everyone */
   int c, errflg;
   extern char *optarg;

The commit comment was that there was a scaling issue in the code itself, I 
just wish I could remember exactly what it was.

> 2. "examples/adjacent_32bit_amo.x 10 10" seems to hang with both tcp and 
> usnic BTLs, even when running at np=2 (I let it run for several minutes 
> before killing it).

If atomics aren't fast, this test can run for a very long time (also, it takes 
no arguments, so the 10 10 is being ignored).  It's essentially looking for a 
race by blasting 32-bit atomic ops at both parts of a 64 bit word.

> 3. Ditto for "example/ptp.x 10 10".
> 
> 4. "examples/shmem_matrix.x 10 10" seems to run fine at np=32 on usnic, but 
> hangs with TCP (i.e., I let it run for 8+ minutes before killing it -- 
> perhaps it would have finished eventually?).
> 
> ...there's more results (more timeouts and more failures), but they're not 
> yet complete, and I've got to keep working on my own features for v1.7.5, so 
> I need to move to other things right now.

These start to sound like issues in the code; those last two are pretty decent 
tests.

> I think I have oshmem running well enough to add these to Cisco's nightly MTT 
> runs now, so the results will start showing up there without needing my 
> manual attention.

Woot.

Brian

-- 
 Brian Barrett

 There is an art . . . to flying. The knack lies in learning how to
 throw yourself at the ground and miss.
 Douglas Adams, 'The Hitchhikers Guide to the Galaxy'

Re: [OMPI devel] RFC: new OMPI RTE define:

2014-02-18 Thread Brian Barrett

And what will you do for RTE components that aren't ORTE?  This really isn't a 
feature of a run-time, so it doesn't seem like it should be part of the RTE 
interface...

Brian

On Feb 17, 2014, at 3:03 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:

> WHAT: New OMPI_RTE_EVENT_BASE define
> 
> WHY: The usnic BTL needs to run some events asynchronously; the ORTE event 
> base already exists and is running asynchronously in MPI processes
> 
> WHERE: in ompi/mca/rte/rte.h and rte_orte.h
> 
> TIMEOUT: COB Friday, 21 Feb 2014
> 
> MORE DETAIL:
> 
> The WHY line described it pretty well: we want to run some things 
> asynchronously in the usnic BTL and we don't really want to re-invent the 
> wheel (or add yet another thread in each MPI process).  The ORTE event base 
> is already there, there's already a thread servicing it, and Ralph tells me 
> that it is safe to add our own events on to it.
> 
> The patch below adds the new OMPI_RTE_EVENT_BASE #define.
> 
> 
> diff --git a/ompi/mca/rte/orte/rte_orte.h b/ompi/mca/rte/orte/rte_orte.h
> index 3c88c6d..3ceadb8 100644
> --- a/ompi/mca/rte/orte/rte_orte.h
> +++ b/ompi/mca/rte/orte/rte_orte.h
> @@ -142,6 +142,9 @@ typedef struct {
> } ompi_orte_tracker_t;
> OBJ_CLASS_DECLARATION(ompi_orte_tracker_t);
> 
> +/* define the event base that the RTE exports */
> +#define OMPI_RTE_EVENT_BASE orte_event_base
> +
> END_C_DECLS
> 
> #endif /* MCA_OMPI_RTE_ORTE_H */
> diff --git a/ompi/mca/rte/rte.h b/ompi/mca/rte/rte.h
> index 69ad488..de10dff 100644
> --- a/ompi/mca/rte/rte.h
> +++ b/ompi/mca/rte/rte.h
> @@ -150,7 +150,9 @@
>  *a. OMPI_DB_HOSTNAME
>  *b. OMPI_DB_LOCALITY
>  *
> - * (g) Communication support
> + * (g) Asynchronous / event support
> + * 1. OMPI_RTE_EVENT_BASE - the libevent base that executes in a
> + *separate thread
>  *
>  */
> 
> @@ -162,6 +164,7 @@
> #include "opal/dss/dss_types.h"
> #include "opal/mca/mca.h"
> #include "opal/mca/base/base.h"
> +#include "opal/mca/event/event.h"
> 
> BEGIN_C_DECLS
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 

-- 
 Brian Barrett

 There is an art . . . to flying. The knack lies in learning how to
 throw yourself at the ground and miss.
 Douglas Adams, 'The Hitchhikers Guide to the Galaxy'

Re: [OMPI devel] Updating the trunk

2009-06-30 Thread Brian Barrett


On Jun 30, 2009, at 5:57 PM, Ralph Castain wrote:

If you are updating a prior checkout of the OMPI trunk with r21568,  
please be aware that there is an additional step required to make it  
build. Due to a quirk of the build system, you will need to do:


rm ompi/tools/ompi_info/.deps/*

and then re-run autogen/configure in order to build.

The reason this is required is that the new ompi_info implementation  
generates .o files of the same name as the prior C++ implementation.  
As a result, the .deps files do not get updated - and therefore  
insist on looking for the old .cc files.


Removing the .deps and re-running autogen/configure will resolve the  
problem.


If you are doing a fresh checkout of the OMPI trunk, this will not  
affect you.


Slightly less safe, but you can also do the rm command Ralph gave,  
followed by a "make -k", which will regenerate just that makefile,  
then update the .deps files, then build the sources.  You probably  
want to do a plain-old make after that to make sure nothing failed in  
the build, as Make will report an error occurred during "make -k".


Brian

Re: [OMPI devel] trac ticket 1944 and pending sends

2009-06-24 Thread Brian Barrett

Or go to what I proposed and USE A LINKED LIST!  (as I said before,  
not an original idea, but one I think has merit)  Then you don't have  
to size the fifo, because there isn't a fifo.  Limit the number of  
send fragments any one proc can allocate and the only place memory can  
grow without bound is the OB1 unexpected list.  Then use SEND_COMPLETE  
instead of SEND_NORMAL in the collectives without barrier semantics  
(bcast, reduce, gather, scatter) and you effectively limit how far  
ahead any one proc can get to something that we can handle, with no  
performance hit.


Brian

On Jun 24, 2009, at 12:46 AM, George Bosilca wrote:

In other words, as long as a queue is peer based (peer not peers),  
the management of the pending send list was doing what it was  
supposed to, and there was no possibility of deadlock. With the new  
code, as a third party can fill up a remote queue, getting a  
fragment back [as you stated] became a poor indicator for retry.


I don't see how the proposed solution will solve the issue without a  
significant overhead. As we only call the MCA_BTL_SM_FIFO_WRITE once  
before the fragment get into the pending list, reordering the  
fragments will not solve the issue. When the peers is overloaded,  
the fragments will end-up in the pending list, and there is nothing  
to get it out of there except a message from the peer. In some  
cases, such a message might never be delivered, simply because the  
peer doesn't have any data to send us.


The other solution is to always check all pending lists. While this  
might work, it will certainly add undesirable overhead to the send  
path.


You last patch was doing the right thing. Globally decreasing the  
size of the memory used by the MPI library is _the right_ way to go.  
Unfortunately, your patch only address this at the level of the  
shared memory file. Now, instead of using less memory we use even  
more because we have to store that data somewhere ... in the  
fragments returned by the btl_sm_alloc function. These fragments are  
allocated on demand and by default there is no limit to the number  
of such fragments.


Here is a simple fix for both problems. Enforce a reasonable limit  
on the number of fragments in the BTL free list (1K should be more  
than enough), and make sure the fifo has a size equal to p *  
number_of_allowed_fragments_in_the_free_list, where p is the number  
of local processes. While this solution will certainly increase  
again the size of the mapped file, it will do it by a small margin  
compared with what is happening today in the code. This is without  
talking about the fact that it will solve the deadlock problem, by  
removing the inability to return a fragment. In addition, the PML is  
capable of handing such situations, so we're getting back to a  
deadlock free sm BTL.


 george.


On Jun 23, 2009, at 11:04 , Eugene Loh wrote:

The sm BTL used to have two mechanisms for dealing with congested  
FIFOs.  One was to grow the FIFOs.  Another was to queue pending  
sends locally (on the sender's side).  I think the grow-FIFO  
mechanism was typically invoked and the pending-send mechanism used  
only under extreme circumstances (no more memory).


With the sm makeover of 1.3.2, we dropped the ability to grow  
FIFOs.  The code added complexity and there seemed to be no need to  
have two mechanisms to deal with congested FIFOs.  In ticket 1944,  
however, we see that repeated collectives can produce hangs, and  
this seems to be due to the pending-send code not adequately  
dealing with congested FIFOs.


Today, when a process tries to write to a remote FIFO and fails, it  
queues the write as a pending send.  The only condition under which  
it retries pending sends is when it gets a fragment back from a  
remote process.


I think the logic must have been that the FIFO got congested  
because we issued too many sends.  Getting a fragment back  
indicates that the remote process has made progress digesting those  
sends.  In ticket 1944, we see that a FIFO can also get congested  
from too many returning fragments.  Further, with shared FIFOs, a  
FIFO could become congested due to the activity of a third-party  
process.


In sum, getting a fragment back from a remote process is a poor  
indicator that it's time to retry pending sends.


Maybe the real way to know when to retry pending sends is just to  
check if there's room on the FIFO.


So, I'll try modifying MCA_BTL_SM_FIFO_WRITE.  It'll start by  
checking if there are pending sends.  If so, it'll retry them  
before performing the requested write.  This should also help  
preserve ordering a little better.  I'm guessing this will not hurt  
our message latency in any meaningful way, but I'll check this out.


Meanwhile, I wanted to check in with y'all for any guidance you  
might have.

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] CMR one-sided changes? (r21134)

2009-05-20 Thread Brian Barrett


Yeah, putting together a CMR is on the todo list :).

Brian

--  
Brian Barrett


There is an art . . . to flying. The knack lies in learning how to
throw yourself at the ground and miss.

On May 20, 2009, at 12:41, Jeff Squyres <jsquy...@cisco.com> wrote:


Brian: can we CMR over your OSD changes from 30 Apr (r21134)?

I have noticed an enormous performance difference between the pt2pt  
and rdma osc components when running the IMB-EXT benchmark over IB:


 - pt2pt: 11+ minutes
 - rdma: 43 seconds

rdma is the default on the trunk, since r21134 (https://svn.open-mpi.org/trac/ompi/changeset/21134 
).  pt2pt is still the default on v1.3.


There's a conflict in ompi/mca/osc/rdma/osc_rdma_sync.c, so I don't  
quite know how to proceed...


--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC: Warn user about deprecated MPI functionality and "wrong" compiler usage

2009-05-18 Thread Brian Barrett

I think care must be taken on this front. While I know we don't like  
to admit it, there is no reason the C compilers have to match, and  
indeed good reasons they might not. For example, at LANL, we  
frequently compiled OMPI with GCC, then fixed up the wrapper compilers  
to use Icc or whatever, to work around optimizer bugs. This is  
functionality I don't think should be lost just to warn about  
deprecated functions.


Brian

--
Brian Barrett

There is an art . . . to flying. The knack lies in learning how to
throw yourself at the ground and miss.

On May 18, 2009, at 1:34, Rainer Keller <kel...@ornl.gov> wrote:

What:  Warn user about deprecated MPI functionality and "wrong"  
compiler usage


Why:   Because deprecated MPI functions, are ... deprecated

Where: On trunk

When:  Apply on trunk before branching for v1.5 (it is user-visible)

Timeout: 1 weeks - May 26, 2009 after the teleconf.

-

I'd like to propose a patch that addresses two issues:
- Users shoot themselves in the foot compiling with a different  
compiler

  than what was used to compile OMPI (think ABI)
- The MPI-2.1 std. defines several functions to be deprecated.

This will warn Open MPI users, when accessing deprecated functions,  
even

giving a proper warning such as:
"MPI_TYPE_HVECTOR is superseded by MPI_TYPE_CREATE_HVECTOR"
Also, now we may _warn_ when using a different compiler (gcc vs.  
intel vs.

pgcc)


This is achieved using __opal_attribute_deprecated__ and obviously  
needs to be

added into mpi.h, therefore being a user-visible change.

This however has a few caveats:
1.) Having Open MPI compiled with gcc and having users compiling  
with another
compiler, which is not supporting __attribute__((deprecated)) is  
going to be a

problem
2.) The attribute is most useful, when having a proper description  
(as above)
-- which requires support for the optional argument to  
__deprecate__. This 
feature is offered only in gcc>4.4 (see http://gcc.gnu.org/ml/gcc-

patches/2009-04/msg00087.html).


Therefore, I added a configure-check for the compiler's support of the
optional argument.
And we need to store, which compiler is used to compile Open MPI and  
at (user-
app) compile-time again check (within mpi.h), which compiler (and  
version!) is

being used.
This is then compared at user-level compile-time.

To prevent users getting swamped with error msg. this can be turned  
off using

the configure-option:
 --enable-mpi-interface-warning
which turns on OMPI_WANT_MPI_INTERFACE_WARNING (default: DISabled), as
suggested by Jeff.

The user can however override that with (check mpi2basic_tests):
   mpicc -DOMPI_WANT_MPI_INTERFACE_WARNING -c lalala.c
lots of warnings follow

Please take a look into:
http://bitbucket.org/jsquyres/ompi-deprecated/


With best regards,
Rainer


PS:
Also, we need to disable the warning, when building Open MPI  
itselve ;-)


PPS:
Thanks to Paul Hargrove and Dan Bonachea for the GASnet file
portable_platform.h which offers the CPP magic to figure out  
compilers and

esp. compiler-versions.
--
--- 
-

Rainer Keller, PhD  Tel: +1 (865) 241-6293
Oak Ridge National Lab  Fax: +1 (865) 241-4811
PO Box 2008 MS 6164   Email: kel...@ornl.gov
Oak Ridge, TN 37831-2008AIM/Skype: rusraink

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] Revise paffinity method?

2009-05-08 Thread Brian Barrett

Jumping in late (travelling this morning). I think this is the right  
answer :).


Brian

--
Brian Barrett

There is an art . . . to flying. The knack lies in learning how to
throw yourself at the ground and miss.

On May 8, 2009, at 9:45, Ralph Castain <r...@open-mpi.org> wrote:

I think that's the way to go then - it also follows our "the user is  
always right - even when they are wrong" philosophy. I'll probably  
have to draw on others to help ensure that the paffinity modules all  
report appropriately.


Think I have enough now to start on this - probably middle of next  
week.


Thanks!

On May 8, 2009, at 8:37 AM, Jeff Squyres wrote:


On May 8, 2009, at 10:32 AM, Ralph Castain wrote:


Actually, I was wondering (hot tub thought for the night) if the
paffinity system can't just tell us if the proc has been bound or  
not?

That would remove the need for YAP (i.e., yet another param).



Yes, it can.

What it can't tell, though, is who set it.  So a user may have  
overridden the paffinity after main() starts but before MPI_INIT is  
invoked.


But perhaps that's not a crime -- users can override the paffinity  
at their own risk (we actually have no way to preventing them from  
doing so).  So perhaps just checking if affinity is already set is  
a "good enough" mechanism for the MPI_INIT-set-paffinity logic to  
determine whether it should set affinity itself or not.


--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] require newer autoconf?

2009-03-17 Thread Brian Barrett

I'd rather not.  I have a couple of platforms with 2.59 installed, but  
not 2.60+.  I really don't want to have to install my own autotools  
because of some bug that doesn't affect me.


I don't, however, have a problem with forcing users to upgrade in  
order to get support for build-related issues.  The version of  
Autoconf used is in config.log, so it's not hard to find which version  
the user actually used.


Brian


On Mar 17, 2009, at 7:00 PM, Jeff Squyres wrote:


Per this thread:

   http://www.open-mpi.org/community/lists/users/2009/03/8402.php

It took a *lng* time to figure out that an outdated Autoconf  
install was the culprit of the "restrict" mess.  The issue is that  
somewhere between v2.61 and v2.63, Autoconf changed the order of  
looking for "restrict"-like keywords -- AC 2.63 has the "good"  
order; AC 2.61 has the "bad" order (hence, PGI worked for me with AC  
2.63, but barfed for Mostyn with AC 2.61).


Should we have our autogen.sh force the use of AC 2.63 and above?   
(currently, it forces 2.59 and above)


--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC: Move of ompi_bitmap_t

2009-02-01 Thread Brian Barrett

 we need further discussions.

--td

Brian Barrett wrote:

So once again, I bring up my objection of this entire line of
moving
until such time as the entire process is properly mapped out.  I
believe it's premature to being moving around code in
preparation for
a move that hasn't been proven viable yet.  Until there is
concrete
evidence that such a move is possible, won't degrade application
performance, and does not make the code totally unmaintainable, I
believe that any related code changes should not be brought into
the
trunk.

Brian


On Jan 30, 2009, at 12:30 PM, Rainer Keller wrote:


On behalf of Laurent Broto

RFC: Move of ompi_bitmap_t

WHAT: Move ompi_bitmap_t into opal or onet-layer

WHY: Remove dependency on ompi-layer.

WHERE: ompi/class

WHEN: Open MPI-1.4

TIMEOUT: February 3, 2009.

-
Details:
WHY:
The ompi_bitmap_t is being used in various places within
opal/orte/ompi. With
the proposed splitting of BTLs into a separate library, we are
currently
investigating several of the differences between ompi/class/*  
and

opal/class/*

One of the items is the ompi_bitmap_t which is quite similar to
the
opal_bitmap_t.
The question is, whether we can remove favoring a solution just
in opal.

WHAT:
The data structures in the opal-version are the same,
so is the interface,
the implementation is *almost* the same

The difference is the Fortran handles ;-]!

Maybe we're missing something but could we have a discussion,
on why
Fortran
sizes are playing a role here, and if this is a hard
requirement, how
we could
settle that into that current interface (possibly without a
notion of
Fortran,
but rather, set some upper limit that the bitmap may grow to?)

With best regards,
Laurent and Rainer
--

Rainer Keller, PhD  Tel: (865) 241-6293
Oak Ridge National Lab  Fax: (865) 241-4811
PO Box 2008 MS 6164   Email: kel...@ornl.gov
Oak Ridge, TN 37831-2008AIM/Skype: rusraink






___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/

Re: [OMPI devel] RFC: Move of ompi_bitmap_t

2009-02-01 Thread Brian Barrett

In that case, I remove my objection to this particular RFC.  It  
remains for all other RFCs related to moving any of the BTL move code  
to the trunk before the critical issues with the BTL move have been  
sorted out in a temporary branch.  This includes renaming functions  
and such.  Perhaps we should have a discussion about those issues  
during the Forum in a couple weeks?


Brian

On Feb 1, 2009, at 5:37 AM, Jeff Squyres wrote:

I just looked through both opal_bitmap_t and ompi_bitmap_t and I  
think that the only real difference is that in the ompi version, we  
check (in various places) that the size of the bitmap never grows  
beyond OMPI_FORTRAN_HANDLE_MAX; the opal version doesn't do these  
kind of size checks.


I think it would be fairly straightforward to:

- add generic checks into the opal version, perhaps by adding a new  
API call (opal_bitmap_set_max_size())
- if the max size has been set, then ensure that the bitmap never  
grows beyond that size, otherwise let it have the same behavior as  
today (grow without bound -- assumedly until malloc() fails)


It'll take a little care to ensure to merge the functionality  
correctly, but it is possible.  Once that is done, you can:


- remove the ompi_bitmap_t class
- s/ompi_bitmap/opal_bitmap/g in the OMPI layer
- add new calls to opal_bitmap_set_max_size(,  
OMPI_FORTRAN_HANDLE_MAX) in the OMPI layer (should only be in a few  
places -- probably one for each MPI handle type...?  It's been so  
long since I've looked at that code that I don't remember offhand)


I'd generally be in favor of this because, although this is not a  
lot of repeated code, it *is* repeated code -- so cleaning it up and  
consolidating the non-Fortran stuff down in opal is not a Bad Thing.



On Jan 30, 2009, at 4:59 PM, Ralph Castain wrote:

The history is simple. Originally, there was one bitmap_t in orte  
that was also used in ompi. Then the folks working on Fortran found  
that they had to put a limit in the bitmap code to avoid getting  
values outside of Fortran's range. However, this introduced a  
problem - if we had the limit in the orte version, then we limited  
ourselves unnecessarily, and introduced some abstraction questions  
since orte knows nothing about Fortran.


So two were created. Then the orte_bitmap_t was blown away at a  
later time when we removed the GPR as George felt it wasn't  
necessary (which was true). It was later reborn when we needed it  
in the routed system, but this time it was done in opal as others  
indicated a potential more general use for that capability.


The problem with uniting the two is that you either have to  
introduce Fortran-based limits into opal (which messes up the non- 
ompi uses), or deal with the Fortran limits in some other fashion.  
Neither is particularly pleasant, though it could be done.


I think it primarily is a question for the Fortran folks to address  
- can they deal with Fortran limits in some other manner without  
making the code unmanageable and/or taking a performance hit?


Ralph


On Jan 30, 2009, at 2:40 PM, Richard Graham wrote:

This should really be viewed as a code maintenance RFC.  The  
reason this
came up in the first place is because we are investigating the btl  
move, but
these are really two very distinct issues.  There are two bits of  
code that
have virtually the same functionality - they do have the same  
interface I am

told.  The question is, is there a good reason to keep two different
versions in the repository ?  Not knowing the history of why a  
second

version was created this is an inquiry.  Is there some performance
advantage, or some other advantage to having these two versions ?

Rich


On 1/30/09 3:23 PM, "Terry D. Dontje" <terry.don...@sun.com> wrote:

I second Brian's concern.  So unless this is just an announcement  
that
this is being done on a tmp branch only until everything is in  
order I

think we need further discussions.

--td

Brian Barrett wrote:
So once again, I bring up my objection of this entire line of  
moving

until such time as the entire process is properly mapped out.  I
believe it's premature to being moving around code in  
preparation for
a move that hasn't been proven viable yet.  Until there is  
concrete

evidence that such a move is possible, won't degrade application
performance, and does not make the code totally unmaintainable, I
believe that any related code changes should not be brought into  
the

trunk.

Brian


On Jan 30, 2009, at 12:30 PM, Rainer Keller wrote:


On behalf of Laurent Broto

RFC: Move of ompi_bitmap_t

WHAT: Move ompi_bitmap_t into opal or onet-layer

WHY: Remove dependency on ompi-layer.

WHERE: ompi/class

WHEN: Open MPI-1.4

TIMEOUT: February 3, 2009.

-
Details:
WHY:
The ompi_bitmap_t is being used in various places within
opal/orte/ompi. With
the proposed splitting of BTLs into a separate library, we are  
currently

investigating several of

Re: [OMPI devel] RFC: Move of ompi_bitmap_t

2009-01-30 Thread Brian Barrett

So once again, I bring up my objection of this entire line of moving  
until such time as the entire process is properly mapped out.  I  
believe it's premature to being moving around code in preparation for  
a move that hasn't been proven viable yet.  Until there is concrete  
evidence that such a move is possible, won't degrade application  
performance, and does not make the code totally unmaintainable, I  
believe that any related code changes should not be brought into the  
trunk.


Brian


On Jan 30, 2009, at 12:30 PM, Rainer Keller wrote:


On behalf of Laurent Broto

RFC: Move of ompi_bitmap_t

WHAT: Move ompi_bitmap_t into opal or onet-layer

WHY: Remove dependency on ompi-layer.

WHERE: ompi/class

WHEN: Open MPI-1.4

TIMEOUT: February 3, 2009.

-
Details:
WHY:
The ompi_bitmap_t is being used in various places within opal/orte/ 
ompi. With
the proposed splitting of BTLs into a separate library, we are  
currently
investigating several of the differences between ompi/class/* and  
opal/class/*


One of the items is the ompi_bitmap_t which is quite similar to the
opal_bitmap_t.
The question is, whether we can remove favoring a solution just in  
opal.


WHAT:
The data structures in the opal-version are the same,
so is the interface,
the implementation is *almost* the same

The difference is the Fortran handles ;-]!

Maybe we're missing something but could we have a discussion, on why  
Fortran
sizes are playing a role here, and if this is a hard requirement,  
how we could
settle that into that current interface (possibly without a notion  
of Fortran,

but rather, set some upper limit that the bitmap may grow to?)

With best regards,
Laurent and Rainer
--

Rainer Keller, PhD  Tel: (865) 241-6293
Oak Ridge National Lab  Fax: (865) 241-4811
PO Box 2008 MS 6164   Email: kel...@ornl.gov
Oak Ridge, TN 37831-2008AIM/Skype: rusraink






___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/

Re: [OMPI devel] RFC: sm Latency

2009-01-20 Thread Brian Barrett

I unfortunately don't have time to look in depth at the patch.  But my  
concern is that currently (today, not at some made up time in the  
future, maybe), we use the BTLs for more than just MPI point-to- 
point.  The rdma one-sided component (which was added for 1.3 and  
hopefully will be the default for 1.4) sends messages directly over  
the btls.  It would be interesting to know how that is handled.


Brian


On Jan 20, 2009, at 6:53 PM, Jeff Squyres wrote:

This all sounds really great to me.  I agree with most of what has  
been said -- e.g., benchmarks *are* important.  Improving them can  
even sometimes have the side effect of improving real  
applications.  ;-)


My one big concern is the moving of architectural boundaries of  
making the btl understand MPI match headers.  But even there, I'm  
torn:


1. I understand why it is better -- performance-wise -- to do this.   
And the performance improvement results are hard to argue with.  We  
took a similar approach with ORTE; ORTE is now OMPI-specific, and  
many, many things have become better (from the OMPI perspective, at  
least).


2. We all have the knee-jerk reaction that we don't want to have the  
BTLs know anything about MPI semantics because they've always been  
that way and it has been a useful abstraction barrier.  Now there's  
even a project afoot to move the BTLs out into a separate later that  
cannot know about MPI (so that other things can be built upon it).   
But are we sacrificing potential MPI performance here?  I think  
that's one important question.


Eugene: you mentioned that there are other possibilities to having  
the BTL understand match headers, such as a callback into the PML.   
Have you tried this approach to see what the performance cost would  
be, perchance?


I'd like to see George's reaction to this RFC, and Brian's (if he  
has time).



On Jan 20, 2009, at 8:04 PM, Eugene Loh wrote:


Patrick Geoffray wrote:


Eugene Loh wrote:



replace the fifo’s with a single link list per process in shared
memory, with senders to this process adding match envelopes
atomically, with each process reading its own link list (multiple



*) Doesn't strike me as a "simple" change.



Actually, it's much simpler than trying to optimize/scale the N^2
implementation, IMHO.


1) The version I talk about is already done. Check my putbacks.  
"Already

done" is easier! :^)

2) The two ideas are largely orthogonal. The RFC talks about a  
variety

of things: cleaning up the sendi function, moving the sendi call up
higher in the PML, bypassing the PML receive-request structure  
(similar
to sendi), and stream-lining the data convertors in common cases.  
Only

one part of the RFC (directed polling) overlaps with having a single
FIFO per receiver.

*) Not sure this addresses all-to-all well.  E.g., let's say you  
post a
receive for a particular source.  Do you then wade through a long  
FIFO

to look for your match?


The tradeoff is between demultiplexing by the sender, which cost  
in time

and in space, or by the receiver, which cost an atomic inc. ANY_TAG
forces you to demultiplex on the receive side anyway. Regarding
all-to-all, it won't be more expensive if the receives are pre- 
posted,

and they should be.



Not sure I understand this paragraph. I do, however, think there are
great benefits to the single-receiver-queue model. It implies  
congestion
on the receiver side in the many-to-one case, but if a single  
receiver

is reading all those messages anyhow, message-processing is already
going to throttle the message rate. The extra "bottleneck" at the  
FIFO

might never be seen.


What the RFC talks about is not the last SM development we'll ever
need.  It's only supposed to be one step forward from where we are
today.  The "single queue per receiver" approach has many  
advantages,

but I think it's a different topic.


But is this intermediate step worth it or should we (well,  
you :-) ) go

directly for the single queue model ?


To recap:
1) The work is already done.
2) The single-queue model addresses only one of the RFC's issues.
3) I'm a fan of the single-queue model, but it's just a separate  
discussion.

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] 1.3 PML default choice

2009-01-13 Thread Brian Barrett


George -

I don't care what we end up doing, but what you state is wrong.  We do  
not use the CM for all other MTLs by default.  PSM is the *ONLY* MTL  
that will cause CM to be used by default.  Portals still falls back to  
OB1 by default.  Again, don't care, don't want to change, just want  
the documentation and current behavior to match.


Brian

On Jan 13, 2009, at 6:27 PM, George Bosilca wrote:

This topic was raised on the mailing list quite a few times. There  
is a major difference between the PSM and the MX support. For PSM  
there is just an MTL, which makes everything a lot simpler. The  
problem with MX is that we have an MTL and a BTL. In order to figure  
out which one to use, we have to call the init function and this  
function initialize MX. The MTL use the default values for this,  
while the BTL give some hints to the MX library (about how to behave  
based on the support level we want, i.e. such as who will deal with  
shared memory or self communications). As there can be only one MX  
initialization, as the MTL initialize first, the BTL will always get  
a wrongly initialized MX library (which can generate some  
performance problems).


What Brian describe is the best compromise we manage to find few  
months ago. If you want to get the MX CM to run, you will have to  
clearly specify on the command line --mca pml cm. All other MTL will  
have the behavior described on the README.


 george.

On Jan 13, 2009, at 20:18 , Brian Barrett wrote:


On Jan 13, 2009, at 5:48 PM, Patrick Geoffray wrote:


Jeff Squyres wrote:
Gaah!  I specifically asked Patrick and George about this and  
they said that the README text was fine.  Grr...


When I looked at that time, I vaguely remember that _both_ PMLs  
were initialized but CM was eventually used because it was the  
last one. It looked broken, but it worked in the end (MTL was used  
with CM PML). I don't know if that behavior changed since.


I just tested 1.3rc4 with MX and it uses the btl by default.  The  
reason is the cm init lowers the priority to 1 unless the MTL that  
loaded is psm, in which case it stays at the higher default of 30.   
It's a fairly easy fix, I think.  But the last time this was  
discussed people in the group had objections to using the MTL by  
default with MX.


Brian

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] 1.3 PML default choice

2009-01-13 Thread Brian Barrett


On Jan 13, 2009, at 5:48 PM, Patrick Geoffray wrote:


Jeff Squyres wrote:
Gaah!  I specifically asked Patrick and George about this and they  
said that the README text was fine.  Grr...


When I looked at that time, I vaguely remember that _both_ PMLs were  
initialized but CM was eventually used because it was the last one.  
It looked broken, but it worked in the end (MTL was used with CM  
PML). I don't know if that behavior changed since.


I just tested 1.3rc4 with MX and it uses the btl by default.  The  
reason is the cm init lowers the priority to 1 unless the MTL that  
loaded is psm, in which case it stays at the higher default of 30.   
It's a fairly easy fix, I think.  But the last time this was discussed  
people in the group had objections to using the MTL by default with MX.


Brian

Re: [OMPI devel] Fwd: [OMPI users] Onesided + derived datatypes

2008-12-13 Thread Brian Barrett

Sorry, I really won't have time to look until after Christmas.  I'll  
put it on the to-do list, but that's as soon as it has a prayer of  
reaching the top.


Brian

On Dec 13, 2008, at 1:02 PM, George Bosilca wrote:


Brian,

I found a second problem with rebuilding the datatype on the remote.  
Originally, the displacement were wrongly computed. This is now  
fixed. However, the data at the end of the fence is still not  
correct on the remote.


I can confirm that the packed message contains only 0 instead of the  
real value, but I couldn't figure out how these 0 got there. The  
pack function works correctly for the MPI_Send function, I don't see  
any reason not to do the same for the MPI_Put. As you're the one- 
sided guy in ompi, can you take a look at the MPI_Put to see why the  
data is incorrect?


 george.

On Dec 11, 2008, at 19:14 , Brian Barrett wrote:

I think that's a reasonable solution.  However, the words "not it"  
come to mind.  Sorry, but I have way too much on my plate this  
month.  By the way, in case no one noticed, I had e-mailed my  
findings to devel.  Someone might want to reply to Dorian's e-mail  
on users.



Brian

On Dec 11, 2008, at 2:31 PM, George Bosilca wrote:


Brian,

You're right, the datatype is being too cautious with the  
boundaries when detecting the overlap. There is no good solution  
to detect the overlap except parsing the whole memory layout to  
check the status of every predefined type. As one can imagine this  
is a very expensive operation. This is reason I preferred to use  
the true extent and the size of the data to try to detect the  
overlap. This approach is a lot faster, but has a poor accuracy.


The best solution I can think of in short term is to remove  
completely the overlap check. This will have absolutely no impact  
on the way we pack the data, but can lead to unexpected results  
when we unpack and the data overlap. But I guess this can be  
considered as a user error, as the MPI standard clearly state that  
the result of such an operation is ... unexpected.


george.

On Dec 10, 2008, at 22:20 , Brian Barrett wrote:


Hi all -

I looked into this, and it appears to be datatype related.  If  
the displacements are set t o 3, 2, 1, 0, there the datatype will  
fail the type checks for one-sided because is_overlapped()  
returns 1 for the datatype.  My reading of the standard seems to  
indicate this should not be.  I haven't looked into the problems  
with displacement set to 0, 1, 2, 3, but I'm guessing it has  
something to do with the reverse problem.


This looks like a datatype issue, so it's out of my realm of  
expertise.  Can someone else take a look?


Brian

Begin forwarded message:


From: doriankrause <doriankra...@web.de>
Date: December 10, 2008 4:07:55 PM MST
To: us...@open-mpi.org
Subject: [OMPI users] Onesided + derived datatypes
Reply-To: Open MPI Users <us...@open-mpi.org>

Hi List,

I have a MPI program which uses one sided communication with  
derived
datatypes (MPI_Type_create_indexed_block). I developed the code  
with

MPICH2 and unfortunately didn't thought about trying it out with
OpenMPI. Now that I'm "porting" the Application to OpenMPI I'm  
facing
some problems. On the most machines I get an SIGSEGV in  
MPI_Win_fence,
sometimes an invalid datatype shows up. I ran the program in  
Valgrind
and didn't get anything valuable. Since I can't see a reason for  
this
problem (at least if I understand the standard correctly), I  
wrote the

attached testprogram.

Here are my experiences:

* If I compile without ONESIDED defined, everything works and V1  
and V2

give the same results
* If I compile with ONESIDED and V2 defined  
(MPI_Type_contiguous) it works.
* ONESIDED + V1 + O2: No errors but obviously nothing is send?  
(Am I in

assuming that V1+O2 and V2 should be equivalent?)
* ONESIDED + V1 + O1:
[m02:03115] *** An error occurred in MPI_Put
[m02:03115] *** on win
[m02:03115] *** MPI_ERR_TYPE: invalid datatype
[m02:03115] *** MPI_ERRORS_ARE_FATAL (goodbye)

I didn't get a segfault as in the "real life example" but if  
ompitest.cc
is correct it means that OpenMPI is buggy when it comes to  
onesided
communication and (some) derived datatypes, so that it is  
probably not

of problem in my code.

I'm using OpenMPI-1.2.8 with the newest gcc 4.3.2 but the same  
behaviour

can be be seen with gcc-3.3.1 and intel 10.1.

Please correct me if ompitest.cc contains errors. Otherwise I  
would be
glad to hear how I should report these problems to the  
develepors (if

they don't read this).

Thanks + best regards

Dorian








___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list

Re: [OMPI devel] Fwd: [OMPI users] Onesided + derived datatypes

2008-12-11 Thread Brian Barrett

I think that's a reasonable solution.  However, the words "not it"  
come to mind.  Sorry, but I have way too much on my plate this month.   
By the way, in case no one noticed, I had e-mailed my findings to  
devel.  Someone might want to reply to Dorian's e-mail on users.



Brian

On Dec 11, 2008, at 2:31 PM, George Bosilca wrote:


Brian,

You're right, the datatype is being too cautious with the boundaries  
when detecting the overlap. There is no good solution to detect the  
overlap except parsing the whole memory layout to check the status  
of every predefined type. As one can imagine this is a very  
expensive operation. This is reason I preferred to use the true  
extent and the size of the data to try to detect the overlap. This  
approach is a lot faster, but has a poor accuracy.


The best solution I can think of in short term is to remove  
completely the overlap check. This will have absolutely no impact on  
the way we pack the data, but can lead to unexpected results when we  
unpack and the data overlap. But I guess this can be considered as a  
user error, as the MPI standard clearly state that the result of  
such an operation is ... unexpected.


 george.

On Dec 10, 2008, at 22:20 , Brian Barrett wrote:


Hi all -

I looked into this, and it appears to be datatype related.  If the  
displacements are set t o 3, 2, 1, 0, there the datatype will fail  
the type checks for one-sided because is_overlapped() returns 1 for  
the datatype.  My reading of the standard seems to indicate this  
should not be.  I haven't looked into the problems with  
displacement set to 0, 1, 2, 3, but I'm guessing it has something  
to do with the reverse problem.


This looks like a datatype issue, so it's out of my realm of  
expertise.  Can someone else take a look?


Brian

Begin forwarded message:


From: doriankrause <doriankra...@web.de>
Date: December 10, 2008 4:07:55 PM MST
To: us...@open-mpi.org
Subject: [OMPI users] Onesided + derived datatypes
Reply-To: Open MPI Users <us...@open-mpi.org>

Hi List,

I have a MPI program which uses one sided communication with derived
datatypes (MPI_Type_create_indexed_block). I developed the code with
MPICH2 and unfortunately didn't thought about trying it out with
OpenMPI. Now that I'm "porting" the Application to OpenMPI I'm  
facing
some problems. On the most machines I get an SIGSEGV in  
MPI_Win_fence,
sometimes an invalid datatype shows up. I ran the program in  
Valgrind
and didn't get anything valuable. Since I can't see a reason for  
this
problem (at least if I understand the standard correctly), I wrote  
the

attached testprogram.

Here are my experiences:

* If I compile without ONESIDED defined, everything works and V1  
and V2

give the same results
* If I compile with ONESIDED and V2 defined (MPI_Type_contiguous)  
it works.
* ONESIDED + V1 + O2: No errors but obviously nothing is send? (Am  
I in

assuming that V1+O2 and V2 should be equivalent?)
* ONESIDED + V1 + O1:
[m02:03115] *** An error occurred in MPI_Put
[m02:03115] *** on win
[m02:03115] *** MPI_ERR_TYPE: invalid datatype
[m02:03115] *** MPI_ERRORS_ARE_FATAL (goodbye)

I didn't get a segfault as in the "real life example" but if  
ompitest.cc

is correct it means that OpenMPI is buggy when it comes to onesided
communication and (some) derived datatypes, so that it is probably  
not

of problem in my code.

I'm using OpenMPI-1.2.8 with the newest gcc 4.3.2 but the same  
behaviour

can be be seen with gcc-3.3.1 and intel 10.1.

Please correct me if ompitest.cc contains errors. Otherwise I  
would be
glad to hear how I should report these problems to the develepors  
(if

they don't read this).

Thanks + best regards

Dorian








___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

[OMPI devel] Fwd: [OMPI users] Onesided + derived datatypes

2008-12-10 Thread Brian Barrett


Hi all -

I looked into this, and it appears to be datatype related.  If the  
displacements are set t o 3, 2, 1, 0, there the datatype will fail the  
type checks for one-sided because is_overlapped() returns 1 for the  
datatype.  My reading of the standard seems to indicate this should  
not be.  I haven't looked into the problems with displacement set to  
0, 1, 2, 3, but I'm guessing it has something to do with the reverse  
problem.


This looks like a datatype issue, so it's out of my realm of  
expertise.  Can someone else take a look?


Brian

Begin forwarded message:


From: doriankrause 
Date: December 10, 2008 4:07:55 PM MST
To: us...@open-mpi.org
Subject: [OMPI users] Onesided + derived datatypes
Reply-To: Open MPI Users 

Hi List,

I have a MPI program which uses one sided communication with derived
datatypes (MPI_Type_create_indexed_block). I developed the code with
MPICH2 and unfortunately didn't thought about trying it out with
OpenMPI. Now that I'm "porting" the Application to OpenMPI I'm facing
some problems. On the most machines I get an SIGSEGV in MPI_Win_fence,
sometimes an invalid datatype shows up. I ran the program in Valgrind
and didn't get anything valuable. Since I can't see a reason for this
problem (at least if I understand the standard correctly), I wrote the
attached testprogram.

Here are my experiences:

* If I compile without ONESIDED defined, everything works and V1 and  
V2

give the same results
* If I compile with ONESIDED and V2 defined (MPI_Type_contiguous) it  
works.
* ONESIDED + V1 + O2: No errors but obviously nothing is send? (Am I  
in

assuming that V1+O2 and V2 should be equivalent?)
* ONESIDED + V1 + O1:
[m02:03115] *** An error occurred in MPI_Put
[m02:03115] *** on win
[m02:03115] *** MPI_ERR_TYPE: invalid datatype
[m02:03115] *** MPI_ERRORS_ARE_FATAL (goodbye)

I didn't get a segfault as in the "real life example" but if  
ompitest.cc

is correct it means that OpenMPI is buggy when it comes to onesided
communication and (some) derived datatypes, so that it is probably not
of problem in my code.

I'm using OpenMPI-1.2.8 with the newest gcc 4.3.2 but the same  
behaviour

can be be seen with gcc-3.3.1 and intel 10.1.

Please correct me if ompitest.cc contains errors. Otherwise I would be
glad to hear how I should report these problems to the develepors (if
they don't read this).

Thanks + best regards

Dorian






ompitest.tar.gz
Description: GNU Zip compressed data

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI devel] memcpy MCA framework

2008-08-17 Thread Brian Barrett

I obviously won't be in Dublin (I'll be in a fishing boat in the  
middle of nowhere Canada -- much better), so I'm going to chime in now.

The m4 part actually isn't too bad and is pretty simple.  I'm not sure  
other than looking at some variables set by ompi_config_asm that there  
is much to check.  The hard parts are dealing with the finer grained  
instruction set requirements.

On x86 in particular, many of the operations in the memcpy are part of  
SSE, SSE2, or SSE3.  Currently, we don't have any finer concept of a  
processor than x86 and most compilers target an instruction set that  
will run on anything considered 686, which is almost everything out  
there.  We'd have to decide how to handle instruction streams which  
are no longer going to work on every chip.  Since we know we have a  
number of users with heterogeneous x86 clusters, this is something to  
think about.

Brian

On Aug 17, 2008, at 7:57 AM, Jeff Squyres wrote:

Let's talk about this in Dublin.  I can probably help with the m4  
magic, but I need to understand exactly what needs to be done first.

On Aug 16, 2008, at 11:51 AM, Terry Dontje wrote:

George Bosilca wrote:
The intent of the memcpy framework is to allow a selection between  
several memcpy at runtime. Of course, there will be a preselection  
at compile time, but all versions that can compile on a given  
architecture will be benchmarked at runtime and the best one will  
be selected. There is a file with several versions of memcpy for  
x86 (32 and 64) somewhere around (I should have one if  
interested), that can be used as a starting point.

Ok, I guess I need to look at this code.  I wonder if there may be  
cases for Sun's machines in which this benchmark could end up  
picking the wrong memcpy?
The only thing we need is a volunteer to build the m4 magic.  
Figuring out what we can compile if kind of tricky, as some of the  
functions are in assembly, some others in C, and some others a  
mixture (the MMX headers).

Isn't the atomic code very similar?  If I get to this point before  
anyone else I probably will volunteer.

--td

george.

On Aug 16, 2008, at 3:19 PM, Terry Dontje wrote:

Hi Tim,
Thanks for bringing the below up and asking for a redirection to  
the devel list.  I think looking/using the MCA memcpy framework  
would be a good thing to do and maybe we can work on this  
together once I get out from under some commitments.  However,  
some of the challenges that originally scared me away from  
looking at the memcpy MCA is whether we really want all the OMPI  
memcpy's to be replaced or just specific ones.  Also, I was  
concerned on trying to figure out which version of memcpy I  
should be using.  I believe currently things are done such that  
you get one version based on which system you compile on.  For  
Sun there may be several different SPARC platforms that would  
need to use different memcpy code but we would like to just ship  
one set of bits.
Not saying the above not doable under the memcpy MCA framework  
just that it somewhat scared me away from thinking about it at  
first glance.

--td
Date: Fri, 15 Aug 2008 12:08:18 -0400 From: "Tim Mattox"  Subject: Re: [OMPI users] SM btl slows down bandwidth? To:  
"Open MPI Users"  Message-ID:  Content-Type: text/plain; charset=ISO-8859-1 Hi Terry (and  
others), I have previously explored this some on Linux/X86-64  
and concluded that Open MPI needs to supply it's own memcpy  
routine to get good sm performance, since the memcpy supplied by  
glibc is not even close to optimal. We have an unused MCA  
framework already set up to supply an opal_memcpy. AFAIK, George  
and Brian did the original work to set up that framework. It has  
been on my to-do list for awhile to start implementing  
opal_memcpy components for the architectures I have access to,  
and to modify OMPI to actually use opal_memcpy where ti makes  
sense. Terry, I presume what you suggest could be dealt with  
similarly when we are running/building on SPARC. Any followup  
discussion on this should probably happen on the developer  
mailing list. On Thu, Aug 14, 2008 at 12:19 PM, Terry Dontje  wrote:
> Interestingly enough on the SPARC platform the Solaris  
memcpy's actually use
> non-temporal stores for copies >= 64KB.  By default some of  
the mca
> parameters to the sm BTL stop at 32KB.  I've done  
experimentations of
> bumping the sm segment sizes to above 64K and seen incredible  
speedup on our
> M9000 platforms.  I am looking for some nice way to integrate  
a memcpy that

> lowers this boundary to 32KB or lower into Open MPI.
> I have not looked into whether Solaris x86/x64 memcpy's use  
the non-temporal

> stores or not.
>
> --td

>>
>> Message: 1
>> Date: Thu, 14 Aug 2008 09:28:59 -0400
>> From: Jeff Squyres 
>> Subject: Re: [OMPI users] SM btl slows

Re: [OMPI devel] if btl->add_procs() fails...?

2008-08-04 Thread Brian Barrett


On Aug 4, 2008, at 9:40 AM, Jeff Squyres wrote:


On Aug 2, 2008, at 2:34 PM, Brian Barrett wrote:

I am curious how all of the above affects client/server or spawned  
jobs.  If you finalize a BTL then do a connect to a process that  
would use that BTL would it reinitialize itself?


To deal with all the dynamics issues, I wouldn't finalized the BTL.  
The BML should handle the progress stuff, just as if the add_procs  
succeeded but returned no active peers. But I'd guess that's part  
of the bit that doesn't work today. I would further suspect that a  
BTL will need to have a working progress function  in the face of  
add_procs failures to cope with all the dynamics options. I'm  
travelling this weekend, so I can't verify any of this at the moment.



This seems a little different than the rest of the code base --  
we're talking about having the BTL return an error but have the  
upper level not treat it as a fatal error.


I think we actually have a few different situations ("fail" means  
"not returning OMPI_SUCCESS"):


1. btl component init fails (only during MPI_INIT): the API supports  
no notion of failure -- it either returns modules or not (i.e.,  
valid pointers or NULL).  If NULL is returned, the component is  
ignored and unloaded.

2. btl add_procs during MPI_INIT fails: this is under debate
3. btl add_procs during MPI-2 dynamics fails: this is under debate

For #2 and #3, I suspect that only the BTL knows if it can continue  
or not.  For example, a failure in #3 may cause the entire BTL to be  
hosed such that it can no longer communicate with procs that it  
previously successfully added (e.g., in MPI_INIT).  So we really  
need add_procs to be able to return multiple things:


A. OMPI_SUCCESS / all was good
B. a non-fatal error occurred such that this BTL cannot communicate  
with the desired peers, but the upper level PML can continue
C. a fatal error has occurred such that the upper level should abort  
(or, more specifically, do whatever the error manager says)


I think that for B in both #2 and #3, we can just have the BTL set  
all the reachability bits to 0 and return OMPI_SUCCESS.  But for C,  
the BTL should return != OMPI_SUCCESS.  The PML should treat it as a  
fatal error and therefore call the error manager.


I think that this is in-line with Brian's original comments, right?


I suppose, but that's a pain when you just want to say "I don't  
support calling add_procs a second time" :).  But I'm not going to fix  
all the BTLs to make that work right, so I suppose in the end I really  
don't have a strong opinion.


Brian

Re: [OMPI devel] if btl->add_procs() fails...?

2008-08-02 Thread Brian Barrett

My thought is that if add_procs fails, then that BTL should be removed  
(as if init failed) and things should continue on.  If that BTL was  
the only way to reach another process, we'll catch that later and abort.


There are always going to be errors that can't be detected until the  
device is actually used, so I think that add_procs errors should be  
treated the same as init errors.  The error_cb is a red herring, as  
that's supposed to be used in situations where an error can't directly  
be returned to the upper layers (like the progress function).  In this  
case, we can directly return an error, so we should do so (and I  
believe we do, it's the BML/PML that's the problem).


Brian


On Aug 1, 2008, at 8:03 PM, Jeff Squyres wrote:

I wasted a bunch of time today debugging a scenario where openib- 
>add_procs() was (legitimately) failing during MPI_INIT.   
Specifically: an openib BTL module had successfully been  
initialized, but then was failing during add_procs().  I say  
"legitimately" failing because something external was causing  
add_procs to fail (i.e., a misconfiguration on my cluster).  By  
"fail", I mean add_procs() returned != OMPI_SUCCESS.


The problem is that OMPI does not handle this situation gracefully;  
every MPI process dumped core.


My question is: what exactly should happen when BTL add_procs()  
fails?  Is the BTL expected to recover?  What if the BTL has no  
procs as a result of this failure; should the PML (or BML) remove it  
from progress loops?  Or should the BTL be able to handle if  
progress is called on its component?  (which seems kinda wasteful)


Or should the failure of add_procs() be a fatal error?  If so, what  
should the BTL do?  The PML's error_cb has not yet been registered,  
and returning != OMPI_SUCCESS does not [currently] cause the PML to  
abort.  This fact seems to indicate to me that the PML/BTL designers  
envisioned that the MPI process should be able to continue.  But I'm  
not sure that I agree with that assessment: we have a successfully  
initialized BTL module, but an error occurred during add_procs().   
Shouldn't we gracefully abort?


My $0.02:

- if the BTL returns != OMPI_SUCCESS from add_procs(), the PML  
should gracefully abort.
- if a BTL fails add_procs() in a non-fatal way, it can set all  
reachable bits to 0 and return OMPI_SUCCESS.  The PML will therefore  
effectively ignore it.


Comments?  I'd like to fix the openib btl's add_procs() one way or  
another for v1.3.


--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] Need v1.3 RM ruling (was: Help on building openmpi with "-Wl, --as-needed -Wl, --no-undefined")

2008-07-24 Thread Brian Barrett


George -

When I looked at the same problem in LAM, I couldn't get the  
dependencies right between libraries when only one makefile was used.  
It worked up until I would do a parallel build, then would die because  
the libraries weren't ready at the right time.  There's probably a  
way, but I ended up with Jeff's approach.


Brian

--
Brian Barrett

There is an art . . . to flying. The knack lies in learning how to
throw yourself at the ground and miss.

On Jul 24, 2008, at 2:23, George Bosilca <bosi...@eecs.utk.edu> wrote:

I tend to agree with Brian's comments, I would like to see this  
pushed into the 1.3 branch asap. However, I'm concerned with the  
impact on the code base of the required modifications as described  
on the TRAC ticket and on the email thread.


I wonder if we cannot use the same technique that we use for  
improving the build time, i.e. getting information from the  
Makefile.am in the subdirs and adding it in the upper level  
Makefile.am. As an example for the F77 build tree:

- if we create the following directories structure:
 -> ompi
-> mpi
   -> f77
  -> global (this is new and will contain the 5 files  
actually in the f77 base)

  -> profiling
- then we include in the ompi/Makefile.am: include mpi/f77/global/ 
Makefile.am
- and in the mpi/f77/global/Makefile.am we add the 5 C files in the  
SOURCES.
- the compiling of the f77 bindings and profiling information will  
then depend on the libmpi, as long as we enforce the buildinf of the  
f77 library after the libmpi.so.


With this approach, all files related to f77 will stay in the f77  
directory (and the same will apply for cxx and f90), and the  
required modifications to the makefiles are minimal.


Auto* gurus would such a solution works ?

 Thanks,
   george.

On Jul 23, 2008, at 6:52 PM, Brian Barrett wrote:

First, sorry about the previous message - I'm incapable of using my  
e-mail apparently.


Based on discusions with people when this came up for LAM, it  
sounds like this will become common for the next set of major  
releases from the distros.  The feature is fairly new to GNU ld,  
but has some nice features for the OS, which I don't totally  
understand.


Because this problem will only become more common during the  
lifespan of 1.3.x , it makes sense to put this in v1.3, in my  
opinion.


Brian

--
Brian Barrett

There is an art . . . to flying. The knack lies in learning how to
throw yourself at the ground and miss.

On Jul 23, 2008, at 9:32, Jeff Squyres <jsquy...@cisco.com> wrote:

Release managers: I have created ticket 1409 for this issue.  I  
need a ruling: do you want this fixed for v1.3?


 https://svn.open-mpi.org/trac/ompi/ticket/1409

PRO: It's not too heinous to fix, but it does require moving some  
code around.

CON: This is the first time anyone has ever run into this issue.
???: I don't know if this is a trend where distros will start  
wanting to compile with -Wl,--no-undefined.




On Jul 23, 2008, at 10:15 AM, Jeff Squyres wrote:


On Jul 23, 2008, at 10:08 AM, Ralf Wildenhues wrote:


Is the attached patch what you're talking about?

If so, I'll commit to trunk, v1.2, and v1.3.


Can you verify that it work with a pristine build?  The  
dependencies as
such look sane to me, also the cruft removal, but I fail to see  
how

your directory ordering can work:


You're right; I tested only in an already-built tree.  I also  
didn't run "make install" to an empty tree, which also shows the  
problem.


Let me twonk around with this a bit...

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] Need v1.3 RM ruling (was: Help on building openmpi with "-Wl, --as-needed -Wl, --no-undefined")

2008-07-23 Thread Brian Barrett

First, sorry about the previous message - I'm incapable of using my e- 
mail apparently.


Based on discusions with people when this came up for LAM, it sounds  
like this will become common for the next set of major releases from  
the distros.  The feature is fairly new to GNU ld, but has some nice  
features for the OS, which I don't totally understand.


Because this problem will only become more common during the lifespan  
of 1.3.x , it makes sense to put this in v1.3, in my opinion.


Brian

--
Brian Barrett

There is an art . . . to flying. The knack lies in learning how to
throw yourself at the ground and miss.

On Jul 23, 2008, at 9:32, Jeff Squyres <jsquy...@cisco.com> wrote:

Release managers: I have created ticket 1409 for this issue.  I need  
a ruling: do you want this fixed for v1.3?


   https://svn.open-mpi.org/trac/ompi/ticket/1409

PRO: It's not too heinous to fix, but it does require moving some  
code around.

CON: This is the first time anyone has ever run into this issue.
???: I don't know if this is a trend where distros will start  
wanting to compile with -Wl,--no-undefined.




On Jul 23, 2008, at 10:15 AM, Jeff Squyres wrote:


On Jul 23, 2008, at 10:08 AM, Ralf Wildenhues wrote:


Is the attached patch what you're talking about?

If so, I'll commit to trunk, v1.2, and v1.3.


Can you verify that it work with a pristine build?  The  
dependencies as

such look sane to me, also the cruft removal, but I fail to see how
your directory ordering can work:


You're right; I tested only in an already-built tree.  I also  
didn't run "make install" to an empty tree, which also shows the  
problem.


Let me twonk around with this a bit...

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] Need v1.3 RM ruling (was: Help on building openmpi with "-Wl, --as-needed -Wl, --no-undefined")

2008-07-23 Thread Brian Barrett




Brian

--  
Brian Barrett


There is an art . . . to flying. The knack lies in learning how to
throw yourself at the ground and miss.

On Jul 23, 2008, at 9:32, Jeff Squyres <jsquy...@cisco.com> wrote:

Release managers: I have created ticket 1409 for this issue.  I need  
a ruling: do you want this fixed for v1.3?


   https://svn.open-mpi.org/trac/ompi/ticket/1409

PRO: It's not too heinous to fix, but it does require moving some  
code around.

CON: This is the first time anyone has ever run into this issue.
???: I don't know if this is a trend where distros will start  
wanting to compile with -Wl,--no-undefined.




On Jul 23, 2008, at 10:15 AM, Jeff Squyres wrote:


On Jul 23, 2008, at 10:08 AM, Ralf Wildenhues wrote:


Is the attached patch what you're talking about?

If so, I'll commit to trunk, v1.2, and v1.3.


Can you verify that it work with a pristine build?  The  
dependencies as

such look sane to me, also the cruft removal, but I fail to see how
your directory ordering can work:


You're right; I tested only in an already-built tree.  I also  
didn't run "make install" to an empty tree, which also shows the  
problem.


Let me twonk around with this a bit...

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

[OMPI devel] Two large patches in trunk

2008-06-13 Thread Brian Barrett


Hi all -

I just pushed two rather large patches into the trunk for v1.3 (with  
George and Brad's blessing).


First, the ptmalloc2 changes are in the trunk.  So going forward,  
ptmalloc2 will not be linked into the application binary by default.   
You will have to set libs to -lopenmpi-malloc to get ptmalloc2.   
However, leave_pinned will turn on mallopt by default now, so for most  
users there will be no visible change.  There is also a configure flag  
if users really want the old behavior.  Nothing substantial has  
changed on this front since my more detailed e-mail last week.


Second, Open MPI now provides the option of using perl-based wrapper  
compilers instead of the traditional C based ones.  The Perl based  
ones do not have nearly as much functionality as the C based ones,  
lacking multilib, installdirs, and multi-project (ie opalcc/ortecc)  
support (in addition to not supporting many of the -showme options).   
The C versions are still the default and are intended to remain that  
way for the foreseeable future.  The Perl compilers are intended to be  
used for cross-compile installs, which seems to be the bulk of my use  
of Open MPI these days.  Specifying --enable-script-wrapper-compilers  
will disable the C based compilers and enable the Perl based  
compilers.  Finally, --enable-script-wrapper-compilers combined with -- 
disable-binaries will still result in the Perl based wrapper compilers  
being installed.


As always, let me know what I broke.

Brian


--
  Brian Barrett

  There is an art . . . to flying. The knack lies in learning how to
  throw yourself at the ground and miss.
  Douglas Adams, 'The Hitchhikers Guide to the Galaxy'

Re: [OMPI devel] Memory hooks change testing

2008-06-11 Thread Brian Barrett

Did anyone get a chance to test (or think about testing) this?  I'd  
like to commit the changes on Friday evening, if I haven't heard any  
negative feedback.


Brian

On Jun 9, 2008, at 8:56 PM, Brian Barrett wrote:


Hi all -

Per the RFC I sent out last week, I've implemented a revised  
behavior of the memory hooks for high speed networks.  It's a bit  
different than the RFC proposed, but still very minor and fairly  
straight foward.


The default is to build ptmalloc2 support, but as an almost complete  
standalone library.  If the user wants to use ptmalloc2, he only has  
to add -lopenmpi-malloc to his link line.  Even when standalone and  
openmpi-malloc is not linked in, we'll still intercept munmap as  
it's needed for mallopt (below) and we've never had any trouble with  
that part of ptmalloc2 (it's easy to intercept).


As a *CHANGE* in behavior, if leave_pinned support is turned on and  
there's no ptmalloc2 support, we will automatically enable mallopt.   
As a *CHANGE* in behavior, if the user disables mallopt or mallopt  
is not available and leave pinned is requested, we'll abort.  I  
think these both make sense and are closest to expected behavior,  
but wanted to point them out.  It is possible for the user to  
disable mallopt and enable leave_pinned, but that will *only* work  
if there is some other mechanism for intercepting free (basically,  
it allows a way to ensure your using ptmalloc2 instead of mallopt).


There is also a new memory component, mallopt, which only intercepts  
munmap and exists only to allow users to enable mallopt while not  
building the ptmalloc2 component at all.  Previously, our mallopt  
support was lacking in that it didn't cover the case where users  
explicitly called munmap in their applications.  Now, it does.


The changes are fairly small and can be seen/tested in the HG  
repository bwb/mem-hooks, URL below.  I think this would be a good  
thing to push to 1.3, as it will solve an ongoing problem on Linux  
(basically, users getting screwed by our ptmalloc2 implementation).


   http://www.open-mpi.org/hg/hgwebdir.cgi/bwb/mem-hooks/

Brian

--
 Brian Barrett
 Open MPI developer
 http://www.open-mpi.org/




--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/

Re: [OMPI devel] Notes from mem hooks call today

2008-05-28 Thread Brian Barrett


On May 28, 2008, at 5:09 PM, Roland Dreier wrote:

I think Patrick's point is that it's not too much more expensive to  
do the

syscall on Linux vs just doing the cache lookup, particularly in the
context of a long message.  And it means that upper layer protocols  
like

MPI don't have to deal with caches (and since MPI implementors hate
registration caches only slightly less than we hate MPI_CANCEL,  
that will

make us happy).


Stick in a separate library then?

I don't think we want the complexity in the kernel -- I personally  
would
argue against merging it upstream; and given that the userspace  
solution

is actually faster, it becomes pretty hard to justify.


If someone would like to pull registration cache into OFED, that would  
be great.  But something tells me they won't want to.  It's a pain, it  
screws up users, and it only works about 50% of the time.


It's a support issue -- pushing it in a separate library doesn't help  
anyone unless someone's willing to handle the support.  I sure as heck  
don't want to do the support anymore, particularly since OFED is the  
*ONLY* major software stack that requires such evil hacks.  MX handles  
it at the lower layer.  Portals is specified such that the hardware  
and/or Portals library must handle it (by specifying semantics that  
require registration per message).  Quadrics (with tports) handles it  
in a combination of the kernel and library.  TCP doesn't require  
pinning and/or registration.


Brian

--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/

Re: [OMPI devel] undefined references forrdma_get_peer_addr & rdma_get_local_addr

2008-05-04 Thread Brian Barrett

hat others are aware)
>>> Crud.  Can you send me your config.log?  I don't know why it's  
able
>>> to  find rdma_get_peer_addr() in configure, but then later not  
able

>>> to  find it during the build - I'd like to see what happened
>>> during  configure.
>>> On May 2, 2008, at 7:09 PM, Pak Lui wrote:
>>>> Hi Jeff,
>>>>
>>>> It seems that the cpc3 merge causes my Ranger build to break. I
>>>> believe it is using OFED 1.2 but I don't know how to check. It
>>>> passes the ompi_check_openib.m4 that you added in for the
>>>> rdma_get_peer_addr. Is there a missing #include for openib/ofed
>>>> related somewhere?
>>>>
>>>>
>>>>  1236 checking rdma/rdma_cma.h usability... yes
>>>>  1237 checking rdma/rdma_cma.h presence... yes
>>>>  1238 checking for rdma/rdma_cma.h... yes
>>>>  1239 checking for rdma_create_id in -lrdmacm... yes
>>>>  1240 checking for rdma_get_peer_addr... yes
>>>>
>>>>
>>>> pgCC -DHAVE_CONFIG_H -I. -I../../../../ompi/tools/ompi_info -
>>>> I../../../opal/include -I../../../orte/include -I../../../ompi/
>>>> include -I../../../opal/mca/paffinity/linux/plpa/src/libplpa -
>>>> DOMPI_CONFIGURE_USER="\"paklui\"" -
>>>> DOMPI_CONFIGURE_HOST="\"login4.ranger.tacc.utexas.edu\"" -
>>>> DOMPI_CONFIGURE_DATE="\"Fri May  2 17:07:01 CDT 2008\"" -
>>>> DOMPI_BUILD_USER="\"$USER\"" -DOMPI_BUILD_HOST="\"`hostname`\"" -
>>>> DOMPI_BUILD_DATE="\"`date`\"" -DOMPI_BUILD_CFLAGS="\"-O -DNDEBUG
>>>> \"" -DOMPI_BUILD_CPPFLAGS="\"-I../../../.. -I../../.. -
>>>> I../../../../ opal/include -I../../../../orte/include -
>>>> I../../../../ompi/include  - D_REENTRANT\"" -
>>>> DOMPI_BUILD_CXXFLAGS="\"-O -DNDEBUG  \"" -
>>>> DOMPI_BUILD_CXXCPPFLAGS="\"-I../../../.. -I../../.. - 
I../../../../

>>>> opal/include -I../../../../orte/include -I../../../../ompi/
>>>> include  - D_REENTRANT\"" -DOMPI_BUILD_FFLAGS="\"\"" -
>>>> DOMPI_BUILD_FCFLAGS="\"\"" -DOMPI_BUILD_LDFLAGS="\" \"" -
>>>> DOMPI_BUILD_LIBS="\"-lnsl -lutil  -lpthread\"" -
>>>> DOMPI_CC_ABSOLUTE="\"/opt/apps/pgi/7.1/linux86-64/7.1-2/bin/pgcc
>>>> \"" - DOMPI_CXX_ABSOLUTE="\"/opt/apps/pgi/7.1/linux86-64/7.1-2/ 
bin/
>>>> pgCC\""  -DOMPI_F77_ABSOLUTE="\"/opt/apps/pgi/7.1/ 
linux86-64/7.1-2/

>>>> bin/ pgf77\"" -DOMPI_F90_ABSOLUTE="\"/opt/apps/pgi/7.1/
>>>> linux86-64/7.1-2/ bin/pgf95\"" -DOMPI_F90_BUILD_SIZE="\"small 
\"" -
>>>> I../../../.. - I../../.. -I../../../../opal/include - 
I../../../../

>>>> orte/include - I../../../../ompi/include  -D_REENTRANT  -O -
>>>> DNDEBUG   -c -o  version.o ../../../../ompi/tools/ompi_info/
>>>> version.cc
>>>> /bin/sh ../../../libtool --tag=CXX   --mode=link pgCC  -O - 
DNDEBUG

>>>> - o ompi_info components.o ompi_info.o output.o param.o
>>>> version.o ../../../ompi/libmpi.la -lnsl -lutil  -lpthread
>>>> libtool: link: pgCC -O -DNDEBUG -o .libs/ompi_info components.o
>>>> ompi_info.o output.o param.o version.o  ../../../ompi/.libs/
>>>> libmpi.so -L/opt/ofed/lib64 -libcm -lrdmacm -libverbs -lrt / 
share/

>>>> home/00951/paklui/ompi-trunk5/config-data1/orte/.libs/libopen-
>>>> rte.so /share/home/00951/paklui/ompi-trunk5/config-data1/
>>>> opal/.libs/ libopen-pal.so -lnuma -ldl -lnsl -lutil -lpthread -
>>>> Wl,--rpath -Wl,/ share/home/00951/paklui/ompi-trunk5/shared-
>>>> install1/lib
>>>>
>>>> [1]Exit 2make install >&
>>>> make.install.log.0
>>>> ../../../ompi/.libs/libmpi.so: undefined reference to
>>>> `rdma_get_peer_addr'
>>>> ../../../ompi/.libs/libmpi.so: undefined reference to
>>>> `rdma_get_local_addr'
>>>> make[2]: *** [ompi_info] Error 2
>>>> make[2]: Leaving directory `/share/home/00951/paklui/ompi-trunk5/
>>>> config-data1/ompi/tools/ompi_info'
>>>> make[1]: *** [install-recursive] Error 1
>>>> make[1]: Leaving directory `/share/home/00951/paklui/ompi-trunk5/
>>>> config-data1/ompi'
>>>> make: *** [install-recursive] Error 1
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> - Pak Lui
>>>> pak@sun.com
>>
>> --
>>
>>
>> - Pak Lui
>> pak@sun.com
>> 
>
>


--


- Pak Lui
pak@sun.com
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/

Re: [OMPI devel] FreeBSD timer_base_open error?

2008-03-25 Thread Brian Barrett


On Mar 25, 2008, at 6:16 PM, Jeff Squyres wrote:

"linux" is the name of the component.  It looks like opal/mca/timer/
linux/timer_linux_component.c is doing some checks during component
open() and returning an error if it can't be used (e.g,. if it's not
on linux).

The timer components are a little different than normal MCA
frameworks; they *must* be compiled in libopen-pal statically, and
there will only be one of them built.

In this case, I'm guessing that linux was built simply because nothing
else was selected to be built, but then its component_open() function
failed because it didn't find /proc/cpuinfo.



This is actually incorrect.  The linux component looks for /proc/ 
cpuinfo and builds if it founds that file.  There's a base component  
that's built if nothing else is found.  The configure logic for the  
linux component is probably not the right thing to do -- it should  
probably be modified to check both for that file (there are systems  
that call themselves "linux" but don't have a /proc/cpuinfo) is  
readable and that we're actually on Linux.


Brian

--
  Brian Barrett

  There is an art . . . to flying. The knack lies in learning how to
  throw yourself at the ground and miss.
  Douglas Adams, 'The Hitchhikers Guide to the Galaxy'

Re: [OMPI devel] [RFC] Non-blocking collectives (LibNBC) merge to trunk

2008-02-07 Thread Brian Barrett

Let me start by reminding everyone that I have no vote, so this should  
probably be sent to /dev/null.


I think Ralph raised some good points.  I'd like to raise another.

Does it make sense to bring LibNBC into the release at this point,  
given the current standardization process of non-blocking collectives?


My feeling is no, based on the long term support costs.  We had this  
problem with a function in LAM/MPI -- MPIL_SPAWN, I believe it was --  
that was almost but not quite MPI_COMM_SPAWN.  It was added to allow  
spawn before the standard was finished for dynamics.  The problem is,  
it wasn't quite MPI_COMM_SPAWN, so we were now stuck with yet another  
function to support (in a touchy piece of code) for infinity and beyond.


I worry that we'll have the same with LibNBC -- a piece of code that  
solves an immediate problem (no non-blocking collectives in MPI) but  
will become a long-term support anchor.  Since this is something we'll  
be encouraging users to write code to, it's not like support for  
mvapi, where we can just deprecate it and users won't really notice.   
It's one thing to tell them to update their cluster software stack --  
it's another to tell them to rewrite their applications.



Just my $0.02,

Brian

--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/

Re: [OMPI devel] SDP support for OPEN-MPI

2007-12-31 Thread Brian Barrett

   if (ORTE_SUCCESS != rc &&

+  (EAFNOSUPPORT != opal_socket_errno ||

+  mca_oob_tcp_component.tcp_debug >=  
OOB_TCP_DEBUG_CONNECT)) {


+  opal_output(0,

+ "mca_oob_tcp_init: unable to create IPv6  
listen socket: %s\n",


+ opal_strerror(rc));

+   }

 #endif

+ }

 if (mca_oob_tcp_component.tcp_debug >= OOB_TCP_DEBUG_INFO) {

 opal_output(0, "%s accepting connections via event  
library",


 ORTE_NAME_PRINT(orte_process_info.my_name));

Index: orte/mca/oob/tcp/oob_tcp.h

===

--- orte/mca/oob/tcp/oob_tcp.h  (revision 17027)

+++ orte/mca/oob/tcp/oob_tcp.h  (working copy)

@@ -217,6 +217,9 @@

 inttcp6_port_min;/**< Minimum allowed  
port for the OOB listen socket */


 inttcp6_port_range;  /**< Range of allowed  
TCP ports */


 #endif  /* OPAL_WANT_IPV6 */

+#if OPAL_WANT_SDP

+intsdp_enable;   /**< support for SDP */

+#endif /* OAP_WANT_SDP */

 opal_mutex_t   tcp_lock; /**< lock for  
accessing module state */


 opal_list_ttcp_events;   /**< list of pending  
events (accepts) */


 opal_list_ttcp_msg_post; /**< list of recieves  
user has posted */








Thanks,

Verkhovsky Lenny.



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/

Re: [OMPI devel] collective problems

2007-11-07 Thread Brian Barrett

As it stands today, the problem is that we can inject things into the  
BTL successfully that are not injected into the NIC (due to software  
flow control).  Once a message is injected into the BTL, the PML marks  
completion on the MPI request.  If it was a blocking send that got  
marked as complete, but the message isn't injected into the NIC/NIC  
library, and the user doesn't re-enter the MPI library for a  
considerable amount of time, then we have a problem.

Personally, I'd rather just not mark MPI completion until a local  
completion callback from the BTL.  But others don't like that idea, so  
we came up with a way for back pressure from the BTL to say "it's not  
on the wire yet".  This is more complicated than just not marking MPI  
completion early, but why would we do something that helps real apps  
at the expense of benchmarks?  That would just be silly!

Brian

On Nov 7, 2007, at 7:56 PM, Richard Graham wrote:

Does this mean that we don’t have a queue to store btl level  
descriptors that
 are only partially complete ?  Do we do an all or nothing with  
respect to btl

 level requests at this stage ?

Seems to me like we want to mark things complete at the MPI level  
ASAP, and

 that this proposal is not to do that – is this correct ?

Rich

On 11/7/07 11:26 PM, "Jeff Squyres"  wrote:

On Nov 7, 2007, at 9:33 PM, Patrick Geoffray wrote:

>> Remember that this is all in the context of Galen's proposal for
>> btl_send() to be able to return NOT_ON_WIRE -- meaning that the  
send

>> was successful, but it has not yet been sent (e.g., openib BTL
>> buffered it because it ran out of credits).
>
> Sorry if I miss something obvious, but why does the PML has to be
> aware
> of the flow control situation of the BTL ? If the BTL cannot send
> something right away for any reason, it should be the  
responsibility

> of
> the BTL to buffer it and to progress on it later.

That's currently the way it is.  But the BTL currently only has the
option to say two things:

1. "ok, done!" -- then the PML will think that the request is  
complete

2. "doh -- error!" -- then the PML thinks that Something Bad
Happened(tm)

What we really need is for the BTL to have a third option:

3. "not done yet!"

So that the PML knows that the request is not yet done, but will  
allow

other things to progress while we're waiting for it to complete.
Without this, the openib BTL currently replies "ok, done!", even when
it has only buffered a message (rather than actually sending it out).
This optimization works great (yeah, I know...) except for apps that
don't dip into the MPI library frequently.  :-\

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] bml_btl->btl_alloc() instead of mca_bml_base_alloc() in OSC

2007-10-28 Thread Brian Barrett


Can't think of any good reason -- the patch should be fine.

Thanks,

Brian

On Oct 28, 2007, at 7:13 AM, Gleb Natapov wrote:


Hi Brian,

 Is there a special reason why you call btl functions directly instead
of using bml wrappers? What about applying this patch?


diff --git a/ompi/mca/osc/rdma/osc_rdma_component.c b/ompi/mca/osc/ 
rdma/osc_rdma_component.c

index 2d0dc06..302dd9e 100644
--- a/ompi/mca/osc/rdma/osc_rdma_component.c
+++ b/ompi/mca/osc/rdma/osc_rdma_component.c
@@ -1044,9 +1044,8 @@ rdma_send_info_send(ompi_osc_rdma_module_t  
*module,

ompi_osc_rdma_rdma_info_header_t *header = NULL;

bml_btl = peer_send_info->bml_btl;
-descriptor = bml_btl->btl_alloc(bml_btl->btl,
-MCA_BTL_NO_ORDER,
- 
sizeof(ompi_osc_rdma_rdma_info_header_t));

+mca_bml_base_alloc(bml_btl, , MCA_BTL_NO_ORDER,
+sizeof(ompi_osc_rdma_rdma_info_header_t));
if (NULL == descriptor) {
ret = OMPI_ERR_TEMP_OUT_OF_RESOURCE;
goto cleanup;
diff --git a/ompi/mca/osc/rdma/osc_rdma_data_move.c b/ompi/mca/osc/ 
rdma/osc_rdma_data_move.c

index e9fd17c..e7b5813 100644
--- a/ompi/mca/osc/rdma/osc_rdma_data_move.c
+++ b/ompi/mca/osc/rdma/osc_rdma_data_move.c
@@ -454,10 +454,10 @@  
ompi_osc_rdma_sendreq_send(ompi_osc_rdma_module_t *module,

/* get a buffer... */
endpoint = (mca_bml_base_endpoint_t*) sendreq- 
>req_target_proc->proc_bml;
bml_btl = mca_bml_base_btl_array_get_next( 
>btl_eager);

-descriptor = bml_btl->btl_alloc(bml_btl->btl,
-MCA_BTL_NO_ORDER,
-module->m_use_buffers ?  
bml_btl->btl_eager_limit : needed_len < bml_btl->btl_eager_limit ?  
needed_len :

-bml_btl->btl_eager_limit);
+mca_bml_base_alloc(bml_btl, , MCA_BTL_NO_ORDER,
+module->m_use_buffers ? bml_btl->btl_eager_limit :
+needed_len < bml_btl->btl_eager_limit ? needed_len :
+bml_btl->btl_eager_limit);
if (NULL == descriptor) {
ret = OMPI_ERR_TEMP_OUT_OF_RESOURCE;
goto cleanup;
@@ -698,9 +698,8 @@  
ompi_osc_rdma_replyreq_send(ompi_osc_rdma_module_t *module,

/* Get a BTL and a fragment to go with it */
endpoint = (mca_bml_base_endpoint_t*) replyreq->rep_origin_proc- 
>proc_bml;

bml_btl = mca_bml_base_btl_array_get_next(>btl_eager);
-descriptor = bml_btl->btl_alloc(bml_btl->btl,
-MCA_BTL_NO_ORDER,
-bml_btl->btl_eager_limit);
+mca_bml_base_alloc(bml_btl, , MCA_BTL_NO_ORDER,
+bml_btl->btl_eager_limit);
if (NULL == descriptor) {
ret = OMPI_ERR_TEMP_OUT_OF_RESOURCE;
goto cleanup;
@@ -1260,9 +1259,8 @@  
ompi_osc_rdma_control_send(ompi_osc_rdma_module_t *module,

/* Get a BTL and a fragment to go with it */
endpoint = (mca_bml_base_endpoint_t*) proc->proc_bml;
bml_btl = mca_bml_base_btl_array_get_next(>btl_eager);
-descriptor = bml_btl->btl_alloc(bml_btl->btl,
-MCA_BTL_NO_ORDER,
- 
sizeof(ompi_osc_rdma_control_header_t));

+mca_bml_base_alloc(bml_btl, , MCA_BTL_NO_ORDER,
+sizeof(ompi_osc_rdma_control_header_t));
if (NULL == descriptor) {
ret = OMPI_ERR_TEMP_OUT_OF_RESOURCE;
goto cleanup;
@@ -1322,9 +1320,8 @@  
ompi_osc_rdma_rdma_ack_send(ompi_osc_rdma_module_t *module,

ompi_osc_rdma_control_header_t *header = NULL;

/* Get a BTL and a fragment to go with it */
-descriptor = bml_btl->btl_alloc(bml_btl->btl,
-rdma_btl->rdma_order,
- 
sizeof(ompi_osc_rdma_control_header_t));

+mca_bml_base_alloc(bml_btl, , rdma_btl->rdma_order,
+sizeof(ompi_osc_rdma_control_header_t));
if (NULL == descriptor) {
ret = OMPI_ERR_TEMP_OUT_OF_RESOURCE;
goto cleanup;
--
Gleb.

Re: [OMPI devel] problem in runing MPI job through XGrid

2007-10-26 Thread Brian Barrett

XGrid does not forward X11 credentials, so you would have to setup an  
X11 environment by yourself.  Using ssh or a local starter does  
forward X11 credentials, which is why it works in that case.


Brian

On Oct 25, 2007, at 10:23 PM, Jinhui Qin wrote:


Hi Brian,
   I got another problem in running an MPI job through XGrid.  
During the execution of this MPI job it will call Xlib functions  
(i.e. XOpenDisplay()) to open an X window.  The XOpenDisplay()  
function call failed (return "null"), it can not open a display no  
matter how many processors that I requested.


However, when I tuned off the xgrid controller, I used "mpirun -n 4  
" to start the job again, four X windows opened properly, but four  
processes were all running on the local machine instead of on any  
remote nodes.


I have also tested to use "ssh -x" from a terminal of my local  
machine to login to any other node in the cluster  to run the job  
(I have the copies of the same job on all nodes and in the same  
path), the X window can display on my local machine  properly. I  
know it is "-x" option set up the environment properly for starting  
the xwindow. If only use "ssh" without "-x" option, it won't work.


I am wondering why the xwindow can not open if the job is started  
through Xgrid.  How does the Xgrid controller contact to each agent  
node?


Is there anyone who has seen a similar problem?

I have installed X11 and OpenMPI on all 8 mac mini nodes in my  
cluster, and have also tested running an  MPI job,  which  has no X  
window function calls, through XGrid, it worked perfectly fine on  
all nodes.


Thanks a lot for any suggestions!

Jane


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] PML cm and heterogeneous support

2007-10-25 Thread Brian Barrett

No, it's because the CM PML was never designed to be used in a  
heterogeneous environment :).  While the MX BTL does support  
heterogeneous operations (at one point, I believe I even had it  
working), none of the MTLs have ever been tested in heterogeneous  
environments and it's known the datatype usage in the CM PML won't  
support heterogeneous operation.


Brian

On Oct 24, 2007, at 6:21 PM, Jeff Squyres wrote:


George / Patrick / Rich / Christian --

Any idea why that's there?  Is that because portals, MX, and PSM all
require homogeneous environments?


On Oct 18, 2007, at 3:59 PM, Sajjad Tabib wrote:



Hi,

I am tried to run an MPI program in a heterogeneous environment
using the pml cm component. However, open mpi returned with an
error message indicating that PML add procs returned "Not
supported". I dived into the cm code to see what was wrong and I
came upon the code below, which basically shows that if the
processes are running on different architectures, then return "not
supported". Now, I'm wondering whether my interpretation is correct
or not. Is it true that the cm component does not support a
heterogeneous environment? If so, will the developers support this
in the future? How could I get around this while still using the cm
component? What will happen if I rebuilt openmpi without these
statements?

I would appreciate your help.

 Code:

mca_pml_cm_add_procs(){

#if OMPI_ENABLE_HETEROGENEOUS_SUPPORT
107 for (i = 0 ; i < nprocs ; ++i) {
108 if (procs[i]->proc_arch != ompi_proc_local()- 
>proc_arch) {

109 return OMPI_ERR_NOT_SUPPORTED;
110 }
111 }
112 #endif
.
.
.
}

Sajjad Tabib
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] PathScale 3.0 problems with Open MPI 1.2.[34]

2007-10-23 Thread Brian Barrett


On Oct 23, 2007, at 10:58 AM, Patrick Geoffray wrote:


Bogdan Costescu wrote:

I made some progress: if I configure with "--without-memory-manager"
(along with all other options that I mentioned before), then it  
works.

This was inspired by the fact that the segmentation fault occured in
ptmalloc2. I have previously tried to remove the MX support without
any effect; with ptmalloc2 out of the picture I have had test runs
over MX and TCP without problems.


We have had portability problems using ptmalloc2 in MPICH-GM,  
specially
relative to threads. In MX, we choose to use dlmalloc instead. It  
is not

as optimized and its thread-safety has a coarser grain, but it is much
more portable.

Disabling the memory manager in OpenMPI is not a bad thing for MX, as
its own dlmalloc-based registration cache will operate transparently
with MX_RCACHE=1 (default).


If you're not packaging Open MPI with MX support, I'd configure Open  
MPI with the extra parameters:


  --without-memory-manager --enable-mca-static=btl-mx,mtl-mx

This will provide the least possibility of something getting in the  
way of MX doing its thing with its memory hooks.  It causes libmpi.so  
to depend on libmyriexpress.so, which is both a good and bad thing.   
Good because the malloc hooks in libmyriexpress aren't "seen" when we  
dlopen the OMPI MX drivers to suck in libmyriexpress, but they would  
be with this configuration.  Bad in that libmpi.so now depends on  
libmyriexpress, so packaging for multiple machines could be more  
difficult.


Brian

Re: [OMPI devel] RFC: versioning OMPI libraries

2007-10-15 Thread Brian Barrett


BTW, Here's the documentation I was referring to:

  http://www.gnu.org/software/libtool/manual.html#Versioning

Now, the problem Open MPI faces is that while our MPI interface  
rarely changes  (and almost never in a backwards-incompatible way),  
the interface between components and libraries does.  So that could  
cause some interesting heartaches.


Good luck,

Brian

On Oct 15, 2007, at 1:56 PM, Jeff Squyres wrote:


Ok, having read the libtool docs now, I see why the release number is
a bad idea.  :-)

I'm assuming that:

- The libmpi interface will rarely change, but we may add to it over
time (there's a specific point about this in the libtool docs -- no
problem)
- The libopen-rte interface historically has had major changes
between major releases and may have interface changes between minor
releases, too
- The libopen-pal interface is relatively stable -- I actually
haven't been checking how often it changes

So if we do this, I think the RM's would need to be responsible for
marshaling this information and setting the appropriate values.  I
can convert the build system to do use this kind of information; the
real question is whether the community wants to utilize it or not
(and whether the RM's will take on the responsibility of gathering
this data for each release).


On Oct 15, 2007, at 1:16 PM, Christian Bell wrote:


On Mon, 15 Oct 2007, Brian Barrett wrote:


No! :)

It would be good for everyone to read the Libtool documentation to
see why versioning on the release number would be a really bad idea.
Then comment.  But my opinion would be that you should change based
on interface changes, not based on release numbers.


Yes, I second Brian.  Notwithstanding what the popular vote is on MPI
ABI compatibility across MPI implementations, I think that
major/minor numbering within an implementation should be used to
indiciate when interfaces break, not give hints as to what release
they pertain to.

. . christian

--
christian.b...@qlogic.com
(QLogic Host Solutions Group, formerly Pathscale)
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC: versioning OMPI libraries

2007-10-15 Thread Brian Barrett


No! :)

It would be good for everyone to read the Libtool documentation to  
see why versioning on the release number would be a really bad idea.   
Then comment.  But my opinion would be that you should change based  
on interface changes, not based on release numbers.



Brian

On Oct 15, 2007, at 12:29 PM, Jeff Squyres wrote:


WHAT: Add versioning to all OMPI libraries so that shared libraries
use the real version number in the filename (vs. the current "*.so.
0.0.0")

WHY: It's a Good Thing(tm) to do.

WHERE: Minor changes in a few Makefile.am's; probably some small
tweaking to top-level configure.ac and/or some support m4 files.

WHEN: After timeout.

TIMEOUT: COB, Tuesday Oct 23rd, 2007

-

Currently, all OMPI shared libraries are created with the extension
".so.0.0.0".  We have long discussed using Libtool properly to use a
real/meaningful version number instead of "0.0.0" but no one has ever
gotten a round tuit.

I propose that v1.3 is [finally] the time to do this properly.  I'm
trolling through the configure/build system for a few other issues; I
could pick this up along the way.  My specific proposal is that all
shared libraries be suffixed the numeric version number of Open MPI
itself.  For example, the first release that uses this functionality
will have libmpi.so.1.3.0.

Note that this still does not enable installing multiple versions of
OMPI into the same prefix (for lots of other reasons not covered
here), but at least it does allow multiple libraries in the same tree
for backwards binary compatibility issues, and gives a visual
reference of the library's version number in its filename.

DSOs will remain un-suffixed (e.g., mca_btl_openib.so).

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] problem in runing MPI job through XGrid

2007-10-09 Thread Brian Barrett


On Oct 4, 2007, at 3:06 PM, Jinhui Qin wrote:

sib:sharcnet$ mpirun -n 3 ~/openMPI_stuff/Hello

Process 0.1.1 is unable to reach 0.1.2 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.



This is very odd -- it looks like two of the processes don't think  
they can talk to each other.  Can you try running with:


  mpirun -n 3 -mca btl tcp,self 

If that fails, then the next piece of information that would be  
useful is the IP addresses and netmasks for all the nodes in your  
cluster.  We have some logic in our TCP communication system that can  
cause some interesting results for some network topologies.


Just to verify it's not an XGrid problem, you might want to try  
running with a hostfile -- I think you'll find that the results are  
the same, but it's always good to verify.


Brian

Re: [OMPI devel] OpenIB component static compile failures

2007-10-02 Thread Brian Barrett


By the way, I filed a bug on this issue:

  https://svn.open-mpi.org/trac/ompi/ticket/1155

Brian


On Oct 2, 2007, at 8:57 AM, Brian Barrett wrote:

No, actually, my report isn't about that issue at all.  I'm not  
talking about making an entirely statically built application.  I'm  
talking about a statically compiled Open MPI with a dynamically  
linked application and OFED.  Take a look at the output of mpicc - 
showme -- it's not adding *ANY* -l or -L options for InfiniBand.   
This is something wrong with Open MPI's configure, which has  
changed since v1.2.  On v1.2, the same commands result in:


[10:54] brbarret@odin:pts/27 v1.2> mpicc -showme
gcc -I/u/brbarret/Software/x86_64-unknown-linux-gnu/ompi/devel/ 
include -pthread -L/usr/local/ofed/lib64 -L/u/brbarret/Software/ 
x86_64-unknown-linux-gnu/ompi/devel/lib -lmpi -lopen-rte -lopen-pal  
-libverbs -lrt -lnuma -ldl -Wl,--export-dynamic -lnsl -lutil -lm -ldl

[10:55] brbarret@odin:pts/27 examples> make ring_c
mpicc -gring_c.c   -o ring_c
[10:55] brbarret@odin:pts/27 examples>


Brian


On Oct 1, 2007, at 10:21 PM, Jeff Squyres wrote:

This is a known issue; no one had expressed any desire to have it  
fixed:


   http://www.open-mpi.org/faq/?category=mpi-apps#static-mpi-apps
   http://www.open-mpi.org/faq/?category=mpi-apps#static-ofa-mpi-apps

Feel free to file a ticket and fix if you'd like...


On Oct 1, 2007, at 11:56 PM, Brian Barrett wrote:


Hi all -

There's a problem with the OpenIB components when statically
linking.  For whatever reason, the configure logic is not adding the
right -L and -l flags to the mpicc wrapper flags.

[17:26] brbarret@odin:pts/8 examples> mpicc -showme
gcc -I/u/brbarret/Software/x86_64-unknown-linux-gnu/ompi/devel/
include -pthread -L/u/brbarret/Software/x86_64-unknown-linux-gnu/ 
ompi/

devel/lib -lmpi -lopen-rte -lopen-pal -lnuma -ldl -Wl,--export-
dynamic -lnsl -lutil -lm -ldl
[17:42] brbarret@odin:pts/8 examples> make hello_c
mpicc -ghello_c.c   -o hello_c
/u/brbarret/Software/x86_64-unknown-linux-gnu/ompi/devel/lib/ 
libmpi.a

(btl_openib_component.o)(.text+0x895): In function `openib_reg_mr':
/u/brbarret/odin/ompi/trunk/ompi/mca/btl/openib/
btl_openib_component.c:304: undefined reference to `ibv_reg_mr'

and many more, obviously.

Good luck,

Brian
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] OpenIB component static compile failures

2007-10-02 Thread Brian Barrett

No, actually, my report isn't about that issue at all.  I'm not  
talking about making an entirely statically built application.  I'm  
talking about a statically compiled Open MPI with a dynamically  
linked application and OFED.  Take a look at the output of mpicc - 
showme -- it's not adding *ANY* -l or -L options for InfiniBand.   
This is something wrong with Open MPI's configure, which has changed  
since v1.2.  On v1.2, the same commands result in:


[10:54] brbarret@odin:pts/27 v1.2> mpicc -showme
gcc -I/u/brbarret/Software/x86_64-unknown-linux-gnu/ompi/devel/ 
include -pthread -L/usr/local/ofed/lib64 -L/u/brbarret/Software/ 
x86_64-unknown-linux-gnu/ompi/devel/lib -lmpi -lopen-rte -lopen-pal - 
libverbs -lrt -lnuma -ldl -Wl,--export-dynamic -lnsl -lutil -lm -ldl

[10:55] brbarret@odin:pts/27 examples> make ring_c
mpicc -gring_c.c   -o ring_c
[10:55] brbarret@odin:pts/27 examples>


Brian


On Oct 1, 2007, at 10:21 PM, Jeff Squyres wrote:

This is a known issue; no one had expressed any desire to have it  
fixed:


   http://www.open-mpi.org/faq/?category=mpi-apps#static-mpi-apps
   http://www.open-mpi.org/faq/?category=mpi-apps#static-ofa-mpi-apps

Feel free to file a ticket and fix if you'd like...


On Oct 1, 2007, at 11:56 PM, Brian Barrett wrote:


Hi all -

There's a problem with the OpenIB components when statically
linking.  For whatever reason, the configure logic is not adding the
right -L and -l flags to the mpicc wrapper flags.

[17:26] brbarret@odin:pts/8 examples> mpicc -showme
gcc -I/u/brbarret/Software/x86_64-unknown-linux-gnu/ompi/devel/
include -pthread -L/u/brbarret/Software/x86_64-unknown-linux-gnu/ 
ompi/

devel/lib -lmpi -lopen-rte -lopen-pal -lnuma -ldl -Wl,--export-
dynamic -lnsl -lutil -lm -ldl
[17:42] brbarret@odin:pts/8 examples> make hello_c
mpicc -ghello_c.c   -o hello_c
/u/brbarret/Software/x86_64-unknown-linux-gnu/ompi/devel/lib/libmpi.a
(btl_openib_component.o)(.text+0x895): In function `openib_reg_mr':
/u/brbarret/odin/ompi/trunk/ompi/mca/btl/openib/
btl_openib_component.c:304: undefined reference to `ibv_reg_mr'

and many more, obviously.

Good luck,

Brian
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

[OMPI devel] OpenIB component static compile failures

2007-10-01 Thread Brian Barrett


Hi all -

There's a problem with the OpenIB components when statically  
linking.  For whatever reason, the configure logic is not adding the  
right -L and -l flags to the mpicc wrapper flags.


[17:26] brbarret@odin:pts/8 examples> mpicc -showme
gcc -I/u/brbarret/Software/x86_64-unknown-linux-gnu/ompi/devel/ 
include -pthread -L/u/brbarret/Software/x86_64-unknown-linux-gnu/ompi/ 
devel/lib -lmpi -lopen-rte -lopen-pal -lnuma -ldl -Wl,--export- 
dynamic -lnsl -lutil -lm -ldl

[17:42] brbarret@odin:pts/8 examples> make hello_c
mpicc -ghello_c.c   -o hello_c
/u/brbarret/Software/x86_64-unknown-linux-gnu/ompi/devel/lib/libmpi.a 
(btl_openib_component.o)(.text+0x895): In function `openib_reg_mr':
/u/brbarret/odin/ompi/trunk/ompi/mca/btl/openib/ 
btl_openib_component.c:304: undefined reference to `ibv_reg_mr'


and many more, obviously.

Good luck,

Brian

Re: [OMPI devel] Malloc segfaulting?

2007-09-20 Thread Brian Barrett


On Sep 20, 2007, at 7:02 AM, Tim Prins wrote:

In our nightly runs with the trunk I have started seeing cases  
where we
appear to be segfaulting within/below malloc. Below is a typical  
output.


Note that this appears to only happen on the trunk, when we use  
openib,

and are in 32 bit mode. It seems to happen randomly at a very low
frequency (59 out of about 60,000 32 bit openib runs).

This could be a problem with our machine, and has showed up since I
started testing 32bit ofed 10 days ago.

Anyways, just curious if anyone had any ideas.


As someone else said, this usually points to a duplicate free or the  
like in malloc.  You might want to try compiling with --without- 
memory-manager, as the ptmalloc2 in glibc frequently is more verbose  
about where errors occurred than is the one in Open MPI.


Brian

Re: [OMPI devel] FreeBSD Support?

2007-09-19 Thread Brian Barrett


On Sep 19, 2007, at 4:11 PM, Tim Prins wrote:

Here is where it gets nasty. On FreeBSD, /usr/include/string.h  
includes

strings.h in some cases. But there is a strings.h in the ompi/mpi/f77
directory, so that is getting included instead of the proper
/usr/include/strings.h.

I suppose we could rename our strings.h to f77_strings.h, or something
similar. Does anyone have an opinion on this?


I think this is the best path forward.  Ugh.

Brian

Re: [OMPI devel] Maximum Shared Memory Segment - OK to increase?

2007-08-28 Thread Brian Barrett


On Aug 28, 2007, at 9:05 AM, Li-Ta Lo wrote:


On Mon, 2007-08-27 at 15:10 -0400, Rolf vandeVaart wrote:

We are running into a problem when running on one of our larger SMPs
using the latest Open MPI v1.2 branch.  We are trying to run a job
with np=128 within a single node.  We are seeing the following error:

"SM failed to send message due to shortage of shared memory."

We then increased the allowable maximum size of the shared segment to
2Gigabytes-1 which is the maximum allowed on 32-bit application.  We
used the mca parameter to increase it as shown here.

-mca mpool_sm_max_size 2147483647

This allowed the program to run to completion.  Therefore, we would
like to increase the default maximum from 512Mbytes to 2G-1  
Gigabytes.

Does anyone have an objection to this change?  Soon we are going to
have larger CPU counts and would like to increase the odds that  
things

work "out of the box" on these large SMPs.




There is a serious problem with the 1.2 branch, it does not allocate
any SM area for each process at the beginning. SM areas are allocated
on demand and if some of the processes are more aggressive than the
others, it will cause starvation. This problem is fixed in the trunk
by assign at least one SM area for each process. I think this is what
you saw (starvation) and an increase of max size may not be necessary.


Although I'm pretty sure this is fixed in the v1.2 branch already.

I don't think we should raise that ceiling at this point.  We create  
the file in /tmp, and if someone does -np 32 on a single, small node  
(not unheard of), it'll do really evil things.


Personally, I don't think we need nearly as much shared memory as  
we're using.  It's a bad design in terms of its unbounded memory  
usage.  We should fix that, rather than making the file bigger.  But  
I'm not going to fix it, so take my opinion with a grain of salt.


Brian

Re: [OMPI devel] ompi_mpi_abort

2007-08-25 Thread Brian Barrett


On Aug 25, 2007, at 7:10 AM, Jeff Squyres wrote:


1. We have logic in ompi_mpi_abort to prevent recursive invocation
(ompi_mpi_abort.c:60):

 /* Protection for recursive invocation */
 if (have_been_invoked) {
 return OMPI_SUCCESS;
 }
 have_been_invoked = true;


This, IMHO, is a wrong thing to do.  The intent of ompi_mpi_abort()  
was that it never returned.  But now it is?  That seems wrong to me.


Brian

--
  Brian W. Barrett
  Networking Team, CCS-1
  Los Alamos National Laboratory

Re: [OMPI devel] [devel-core] [RFC] Runtime Services Layer

2007-08-24 Thread Brian Barrett


On Aug 24, 2007, at 9:08 AM, George Bosilca wrote:


By heterogeneous RTE I was talking about what will happened once we
have the RSL. Different back-end will support different features, so
from the user perspective we will not provide a homogeneous execution
environment in all situations. On the other hand, focusing our
efforts in ORTE will guarantee this homogeneity in all cases.


Is this a good thing?  I think no, and we already don't have it.  On  
Cray, we don't use mpirun but yod.  Livermore wants us to use SLURM  
directly instead of our mpirun kludge.  Those are heterogeneous from  
the user perspective.  But are also what the user expects on those  
platforms.


Brian

Re: [OMPI devel] Question on NX bit patch in Debian

2007-08-18 Thread Brian Barrett

You are correct -- we now add a .note.GNU-stack to the assembly file  
if the assembler supports it, so that patch should no longer be needed.


Brian

On Aug 18, 2007, at 9:03 AM, Manuel Prinz wrote:


Hi everyone,

in the Debian package of OpenMPI there has been a patch [1] for some
time which I think is obsolete. I did some reading on that topic  
but I'm

not very familiar with assembler, so I'm asking you here.

As far as I can see, removing the patch doesn't change the binaries
much. Neither scanelf nor readelf show something I'd consider as
suspicious. I think that the .note.GNU-stack instruction is added  
to the

assembler files by generate-asm.pl, so everything's set properly.

But as I said, I'm not very familiar with the matter and it would be
great to get a statement on that issue from you. (We could drop a  
rather

large patch along with this one, if it's obsolete.) Thanks in advance!

Best regards
Manuel

Footnote:
 1. http://svn.debian.org/wsvn/pkg-openmpi/openmpi/trunk/debian/ 
patches/10opal_noexecstack.dpatch?op=file=0=0



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [OMPI svn] svn:open-mpi r15903

2007-08-17 Thread Brian Barrett

Fixed.  Sorry about the configure change mid-day, but it seemed like  
the right thing to do.


Brian


On Aug 17, 2007, at 10:37 AM, Brian Barrett wrote:


Oh, crud.  I forgot to fix that issue.  Will fix asap.

Brian

On Aug 17, 2007, at 10:12 AM, George Bosilca wrote:


This patch break the trunk. It looks like the LT_PACKAGE_VERSION
wasn't defined before the 2.x version. The autogen fails with the
following error:

*** Running GNU tools
[Running] autom4te --language=m4sh ompi_get_version.m4sh -o
ompi_get_version.sh
[Running] aclocal
configure.ac:998: error: m4_defn: undefined macro: LT_PACKAGE_VERSION
configure.ac:998: the top level
autom4te: /usr/bin/m4 failed with exit status: 1
aclocal: autom4te failed with exit status: 1

   george.

On Aug 17, 2007, at 12:08 AM, brbar...@osl.iu.edu wrote:


Author: brbarret
Date: 2007-08-17 00:08:23 EDT (Fri, 17 Aug 2007)
New Revision: 15903
URL: https://svn.open-mpi.org/trac/ompi/changeset/15903

Log:
Support versions of the Libtool 2.1a snapshots after the
lt_dladvise code
was brought in.  This supercedes the GLOBL patch that we had been
using
with Libtool 2.1a versions prior to the lt_dladvise code.  Autogen
tries to figure out which version you're on, so either will now
work with
the trunk.

Text files modified:
   trunk/configure.ac  |18 +
+++--
   trunk/opal/mca/base/mca_base_component_find.c   | 8 +
+++
   trunk/opal/mca/base/mca_base_component_repository.c |24 +
+++
   3 files changed, 48 insertions(+), 2 deletions(-)

Modified: trunk/configure.ac
 
=

=

--- trunk/configure.ac  (original)
+++ trunk/configure.ac  2007-08-17 00:08:23 EDT (Fri, 17 Aug 2007)
@@ -995,10 +995,15 @@

 ompi_show_subtitle "Libtool configuration"

+m4_if(m4_version_compare(m4_defn([LT_PACKAGE_VERSION]), 2.0), -1, [
 AC_LIBLTDL_CONVENIENCE(opal/libltdl)
 AC_LIBTOOL_DLOPEN
 AC_PROG_LIBTOOL
-
+], [
+LT_CONFIG_LTDL_DIR([opal/libltdl], [subproject])
+LTDL_CONVENIENCE
+LT_INIT([dlopen win32-dll])
+])
 ompi_show_subtitle "GNU libltdl setup"

 # AC_CONFIG_SUBDIRS appears to be broken for non-gcc compilers
(i.e.,
@@ -1038,6 +1043,13 @@
 if test "$HAPPY" = "1"; then
 LIBLTDL_SUBDIR=libltdl

+CPPFLAGS_save="$CPPFLAGS"
+CPPFLAGS="-I."
+AC_EGREP_HEADER([lt_dladvise_init], [opal/libltdl/ltdl.h],
+[OPAL_HAVE_LTDL_ADVISE=1],
+[OPAL_HAVE_LTDL_ADVISE=0])
+CPPFLAGS="$CPPFLAGS"
+
 # Arrgh.  This is gross.  But I can't think of any other
way to do
 # it.  :-(

@@ -1057,7 +1069,7 @@
 AC_MSG_WARN([libltdl support disabled (by --disable-dlopen)])

 LIBLTDL_SUBDIR=
-LIBLTDL=
+OPAL_HAVE_LTDL_ADVISE=0

 # append instead of prepend, since LIBS are going to be system
 # type things needed by everyone.  Normally, libltdl will push
@@ -1073,6 +1085,8 @@
 AC_DEFINE_UNQUOTED(OMPI_WANT_LIBLTDL, $OMPI_ENABLE_DLOPEN_SUPPORT,
 [Whether to include support for libltdl or not])

+AC_DEFINE_UNQUOTED(OPAL_HAVE_LTDL_ADVISE, $OPAL_HAVE_LTDL_ADVISE,
+[Whether libltdl appears to have the lt_dladvise interface])

 ##
 # visibility

Modified: trunk/opal/mca/base/mca_base_component_find.c
 
=

=

--- trunk/opal/mca/base/mca_base_component_find.c   (original)
+++ trunk/opal/mca/base/mca_base_component_find.c   2007-08-17
00:08:23 EDT (Fri, 17 Aug 2007)
@@ -75,6 +75,10 @@
   char name[MCA_BASE_MAX_COMPONENT_NAME_LEN];
 };
 typedef struct ltfn_data_holder_t ltfn_data_holder_t;
+
+#if OPAL_HAVE_LTDL_ADVISE
+extern lt_dladvise opal_mca_dladvise;
+#endif
 #endif /* OMPI_WANT_LIBLTDL */


@@ -387,7 +391,11 @@

   /* Now try to load the component */

+#if OPAL_HAVE_LTDL_ADVISE
+  component_handle = lt_dlopenadvise(target_file->filename,
opal_mca_dladvise);
+#else
   component_handle = lt_dlopenext(target_file->filename);
+#endif
   if (NULL == component_handle) {
 err = strdup(lt_dlerror());
 if (0 != show_errors) {

Modified: trunk/opal/mca/base/mca_base_component_repository.c
 
=

=

--- trunk/opal/mca/base/mca_base_component_repository.c (original)
+++ trunk/opal/mca/base/mca_base_component_repository.c 2007-08-17
00:08:23 EDT (Fri, 17 Aug 2007)
@@ -85,6 +85,10 @@
 static repository_item_t *find_component(const char *type, const
char *name);
 static int link_items(repository_item_t *src, repository_item_t
*depend);

+#if OPAL_HAVE_LTDL_ADVISE
+lt_dladvise opal_mca_dladvise;
+#endif
+
 #endif /* OMPI_WANT_LIBLTDL */


@@ -103,6 +107,20 @@
   return OPAL_ERR_OUT_OF_RESOURCE;
 }

+#if OPAL_HAVE_LTDL_ADVISE
+if (lt_dladvise_init(_mca_dladvise)) {
+retu

Re: [OMPI devel] [OMPI svn] svn:open-mpi r15903

2007-08-17 Thread Brian Barrett


Oh, crud.  I forgot to fix that issue.  Will fix asap.

Brian

On Aug 17, 2007, at 10:12 AM, George Bosilca wrote:


This patch break the trunk. It looks like the LT_PACKAGE_VERSION
wasn't defined before the 2.x version. The autogen fails with the
following error:

*** Running GNU tools
[Running] autom4te --language=m4sh ompi_get_version.m4sh -o
ompi_get_version.sh
[Running] aclocal
configure.ac:998: error: m4_defn: undefined macro: LT_PACKAGE_VERSION
configure.ac:998: the top level
autom4te: /usr/bin/m4 failed with exit status: 1
aclocal: autom4te failed with exit status: 1

   george.

On Aug 17, 2007, at 12:08 AM, brbar...@osl.iu.edu wrote:


Author: brbarret
Date: 2007-08-17 00:08:23 EDT (Fri, 17 Aug 2007)
New Revision: 15903
URL: https://svn.open-mpi.org/trac/ompi/changeset/15903

Log:
Support versions of the Libtool 2.1a snapshots after the
lt_dladvise code
was brought in.  This supercedes the GLOBL patch that we had been
using
with Libtool 2.1a versions prior to the lt_dladvise code.  Autogen
tries to figure out which version you're on, so either will now
work with
the trunk.

Text files modified:
   trunk/configure.ac  |18 +
+++--
   trunk/opal/mca/base/mca_base_component_find.c   | 8 +
+++
   trunk/opal/mca/base/mca_base_component_repository.c |24 +
+++
   3 files changed, 48 insertions(+), 2 deletions(-)

Modified: trunk/configure.ac
= 
=


--- trunk/configure.ac  (original)
+++ trunk/configure.ac  2007-08-17 00:08:23 EDT (Fri, 17 Aug 2007)
@@ -995,10 +995,15 @@

 ompi_show_subtitle "Libtool configuration"

+m4_if(m4_version_compare(m4_defn([LT_PACKAGE_VERSION]), 2.0), -1, [
 AC_LIBLTDL_CONVENIENCE(opal/libltdl)
 AC_LIBTOOL_DLOPEN
 AC_PROG_LIBTOOL
-
+], [
+LT_CONFIG_LTDL_DIR([opal/libltdl], [subproject])
+LTDL_CONVENIENCE
+LT_INIT([dlopen win32-dll])
+])
 ompi_show_subtitle "GNU libltdl setup"

 # AC_CONFIG_SUBDIRS appears to be broken for non-gcc compilers  
(i.e.,

@@ -1038,6 +1043,13 @@
 if test "$HAPPY" = "1"; then
 LIBLTDL_SUBDIR=libltdl

+CPPFLAGS_save="$CPPFLAGS"
+CPPFLAGS="-I."
+AC_EGREP_HEADER([lt_dladvise_init], [opal/libltdl/ltdl.h],
+[OPAL_HAVE_LTDL_ADVISE=1],
+[OPAL_HAVE_LTDL_ADVISE=0])
+CPPFLAGS="$CPPFLAGS"
+
 # Arrgh.  This is gross.  But I can't think of any other
way to do
 # it.  :-(

@@ -1057,7 +1069,7 @@
 AC_MSG_WARN([libltdl support disabled (by --disable-dlopen)])

 LIBLTDL_SUBDIR=
-LIBLTDL=
+OPAL_HAVE_LTDL_ADVISE=0

 # append instead of prepend, since LIBS are going to be system
 # type things needed by everyone.  Normally, libltdl will push
@@ -1073,6 +1085,8 @@
 AC_DEFINE_UNQUOTED(OMPI_WANT_LIBLTDL, $OMPI_ENABLE_DLOPEN_SUPPORT,
 [Whether to include support for libltdl or not])

+AC_DEFINE_UNQUOTED(OPAL_HAVE_LTDL_ADVISE, $OPAL_HAVE_LTDL_ADVISE,
+[Whether libltdl appears to have the lt_dladvise interface])

 ##
 # visibility

Modified: trunk/opal/mca/base/mca_base_component_find.c
= 
=


--- trunk/opal/mca/base/mca_base_component_find.c   (original)
+++ trunk/opal/mca/base/mca_base_component_find.c   2007-08-17
00:08:23 EDT (Fri, 17 Aug 2007)
@@ -75,6 +75,10 @@
   char name[MCA_BASE_MAX_COMPONENT_NAME_LEN];
 };
 typedef struct ltfn_data_holder_t ltfn_data_holder_t;
+
+#if OPAL_HAVE_LTDL_ADVISE
+extern lt_dladvise opal_mca_dladvise;
+#endif
 #endif /* OMPI_WANT_LIBLTDL */


@@ -387,7 +391,11 @@

   /* Now try to load the component */

+#if OPAL_HAVE_LTDL_ADVISE
+  component_handle = lt_dlopenadvise(target_file->filename,
opal_mca_dladvise);
+#else
   component_handle = lt_dlopenext(target_file->filename);
+#endif
   if (NULL == component_handle) {
 err = strdup(lt_dlerror());
 if (0 != show_errors) {

Modified: trunk/opal/mca/base/mca_base_component_repository.c
= 
=


--- trunk/opal/mca/base/mca_base_component_repository.c (original)
+++ trunk/opal/mca/base/mca_base_component_repository.c 2007-08-17
00:08:23 EDT (Fri, 17 Aug 2007)
@@ -85,6 +85,10 @@
 static repository_item_t *find_component(const char *type, const
char *name);
 static int link_items(repository_item_t *src, repository_item_t
*depend);

+#if OPAL_HAVE_LTDL_ADVISE
+lt_dladvise opal_mca_dladvise;
+#endif
+
 #endif /* OMPI_WANT_LIBLTDL */


@@ -103,6 +107,20 @@
   return OPAL_ERR_OUT_OF_RESOURCE;
 }

+#if OPAL_HAVE_LTDL_ADVISE
+if (lt_dladvise_init(_mca_dladvise)) {
+return OPAL_ERR_OUT_OF_RESOURCE;
+}
+
+if (lt_dladvise_ext(_mca_dladvise)) {
+return OPAL_ERROR;
+}
+
+if (lt_dladvise_global(_mca_dladvise)) {
+return OPAL_ERROR;
+}
+#endif
+
 OBJ_CONSTRUCT(,

Re: [OMPI devel] openib btl header caching

2007-08-13 Thread Brian Barrett


On Aug 13, 2007, at 9:33 AM, George Bosilca wrote:


On Aug 13, 2007, at 11:28 AM, Pavel Shamis (Pasha) wrote:


Jeff Squyres wrote:

I guess reading the graph that Pasha sent is difficult; Pasha -- can
you send the actual numbers?


Ok here is the numbers on my machines:
0 bytes
mvapich with header caching: 1.56
mvapich without  header caching: 1.79
ompi 1.2: 1.59

So on zero bytes ompi not so bad. Also we can see that header caching
decrease the mvapich latency on 0.23

1 bytes
mvapich with header caching: 1.58
mvapich without  header caching: 1.83
ompi 1.2: 1.73

And here ompi make some latency jump.

In mvapich the header caching decrease the header size from  
56bytes to

12bytes.
What is the header size (pml + btl) in ompi ?


The match header size is 16 bytes, so it looks like ours is already
optimized ...


Pasha -- Is your build of Open MPI built with --disable- 
heterogeneous?  If not, our headers all grow slightly to support  
heterogeneous operations.  For the heterogeneous case, a 1 byte  
message includes:


  16 bytes for the match header
  4 bytes for the Open IB header
  1 byte for the payload
 
  21 bytes total

If you are using eager RDMA, there's an extra 4 bytes for the RDMA  
length in the footer.  Without heterogeneous support, 2 bytes get  
knocked off the size of the match header, so the whole thing will be  
19 bytes (+ 4 for the eager RDMA footer).


There are also considerably more ifs in the code if heterogeneous is  
used, especially on x86 machines.


Brian

[OMPI devel] Collectives interface change

2007-08-09 Thread Brian Barrett


Hi all -

There was significant discussion this week at the collectives meeting  
about improving the selection logic for collective components.  While  
we'd like the automated collectives selection logic laid out in the  
Collv2 document, it was decided that as a first step, we would allow  
more than one + basic compnents to be used for a given communicator.


This mandated the change of a couple of things in the collectives  
interface, namely how collectives module data is found (passed into a  
function, rather tha a static pointer on the component) and a bit of  
the initialization sequence.


The revised interface and the rest of the code is available in an svn  
temp branch:


https://svn.open-mpi.org/svn/ompi/tmp/bwb-coll-select

Thus far, most of the components in common use have been updated.   
The notable exception is the tuned collectives routine, which Ollie  
is updating in the near future.


If you have any comments on the changes, please let me know.  If not,  
the changes will move to the trunk once Ollie is completed with  
updating the tuned component.


Brian

Re: [OMPI devel] Startup failure on mixed IPv4/IPv6 environment (oob tcp bug?)

2007-08-06 Thread Brian Barrett


On Aug 5, 2007, at 3:05 PM, dispan...@sobel.ls.la wrote:

I fixed the problem by setting the peer_state to  
MCA_OOB_TCP_CONNECTING
after creating the socket, which works for me.  I'm not sure if  
this is

always correct, though.


Can you try the attached patch?  It's pretty close to what you've  
suggested, but should eliminate one corner case that you could, in  
theory, run into with your solution.  You are using a nightly tarball  
from the trunk, correct?


Thanks,

Brian



oob_ipv6.diff
Description: Binary data

Re: [OMPI devel] MPI_Win_get_group

2007-07-28 Thread Brian Barrett


On Jul 28, 2007, at 6:27 AM, Jeff Squyres wrote:


On Jul 27, 2007, at 8:27 PM, Lisandro Dalcin wrote:

MPI_WIN_GET_GROUP returns a duplicate of the group of the  
communicator

used to create the window. associated with win. The group is returned
in group.

Well, it seems OMPI (v1.2 svn) is not returning a duplicate,  
comparing

the handles with == C operator gives true. Can you confirm this?
Should the word 'duplicate' be interpreted as 'a new reference to' ?


I would tend to agree with this wording; I think we're doing the
wrong thing.

Brian -- what do you think?


In my opinion, we conform to the standard.  We reference count the  
group, it's incremented on call to MPI_WIN_GROUP, and you can safely  
call MPI_GROUP_FREE on the group returned from MPI_WIN_GROUP.  Groups  
are essentially immutable, so there's no way I can think of that we  
violate the MPI standard.


Others are, of course, free to disagree with me.

Brian

--
  Brian W. Barrett
  Networking Team, CCS-1
  Los Alamos National Laboratory

Re: [OMPI devel] [RFC] Sparse group implementation

2007-07-26 Thread Brian Barrett


On Jul 26, 2007, at 1:01 PM, Mohamad Chaarawi wrote:



On Thu, July 26, 2007 1:18 pm, Brian Barrett wrote:

On Jul 26, 2007, at 12:00 PM, Mohamad Chaarawi wrote:


2) I think it would be better to always have the flags and macros
available (like OMPI_GROUP_SPORADIC and OMPI_GROUP_IS_SPORADIC)  
even

when sparse groups are disabled.  They dont' take up any space, and
mean less #ifs in the general code base


That's what i actually was proposing.. keep the flags (there are no
macros, just the GROUP_GET_PROC_POINTER) and the sparse parameters
in the
group strucutre, and this will mean, only 1 maybe 2 #ifs..


Why would this mean having the sparse parameters in the group  
structure?


not sure if i understood your question right, but in the group  
struct we
added 5 integers and 3 pointer.. if we want to compile these out,  
we would

then need all the #ifs around the code where we use these parameters..


I don't follow why you would need all the sparse stuff in  
ompi_group_t when OMPI_GROUP_SPARSE is 0.  The OMPI_GROUP_IS and  
OMPI_GROU_SET macros only modify grp_flags, which is always present.


Like the ompi_group_peer_lookup, much can be hidden inside the  
functions rather than exposed through the interface, if you're  
concerned about the other functionality currently #if'ed in the code.


Brian

Re: [OMPI devel] [RFC] Sparse group implementation

2007-07-26 Thread Brian Barrett


On Jul 26, 2007, at 12:00 PM, Mohamad Chaarawi wrote:


2) I think it would be better to always have the flags and macros
available (like OMPI_GROUP_SPORADIC and OMPI_GROUP_IS_SPORADIC) even
when sparse groups are disabled.  They dont' take up any space, and
mean less #ifs in the general code base


That's what i actually was proposing.. keep the flags (there are no
macros, just the GROUP_GET_PROC_POINTER) and the sparse parameters  
in the

group strucutre, and this will mean, only 1 maybe 2 #ifs..


Why would this mean having the sparse parameters in the group structure?


3) Instead of the GROU_GET_PROC_POINTER macro, why not just change
the behavior of the ompi_group_peer_lookup() function, so that there
is symmetry with how you get a proc from a communicator?  static
inline functions (especially short ones like that) are basically
free.  We'll still have to fix all the places in the code where the
macro is used or people poke directly at the group structure, but I
like static inline over macros whenever possible.  So much easier t
debug.


Actually i never knew till this morning that this function was in the
code.. I have an inline function ompi_group_lookup (which does the  
same

thing), that actually checks if the group is dense or not and act
accordingly.. but to use the inline function instead of the macro,  
means
again that we need to compile in all the sparse parameters/code,  
which im

for..


No, it doesn't.  Just have something like:

static inline ompi_proc_t*
ompi_group_lookup(ompi_group_t *group, int peer)
{
#if OMPI_GROUP_SPARSE
/* big long lookup function for sparse groups here */
#else
return group->grp_proc_pointers[peer]
#endif
}

Brian

Re: [OMPI devel] [RFC] Sparse group implementation

2007-07-26 Thread Brian Barrett


Mohamad -

A couple of comments / questions:

1) Why do you need the #if OMPI_GROUP_SPARSE in communicator/comm.c?   
That seems like
   part of the API that should under no conditions change based on  
sparse/not sparse


2) I think it would be better to always have the flags and macros  
available (like OMPI_GROUP_SPORADIC and OMPI_GROUP_IS_SPORADIC) even  
when sparse groups are disabled.  They dont' take up any space, and  
mean less #ifs in the general code base


3) Instead of the GROU_GET_PROC_POINTER macro, why not just change  
the behavior of the ompi_group_peer_lookup() function, so that there  
is symmetry with how you get a proc from a communicator?  static  
inline functions (especially short ones like that) are basically  
free.  We'll still have to fix all the places in the code where the  
macro is used or people poke directly at the group structure, but I  
like static inline over macros whenever possible.  So much easier t  
debug.


Other than that, I think you've got my concerns pretty much addressed.

Brian

On Jul 25, 2007, at 8:45 PM, Mohamad Chaarawi wrote:


In the current code, almost all #ifs are due to the fact that we don't
want to add the extra memory by the sparse parameters that are  
added to

the group structure.
The additional parameters are 5 pointers and 3 integers.
If nobody objects, i would actually keep those extra parameters,  
even if
sparse groups are disabled (in the default case on configure),  
because it
would reduce the number of #ifs in the code to only 2 (as i recall  
that i

had it before) ..

Thank you,

Mohamad

On Wed, July 25, 2007 4:23 pm, Brian Barrett wrote:

On Jul 25, 2007, at 3:14 PM, Jeff Squyres wrote:


On Jul 25, 2007, at 5:07 PM, Brian Barrett wrote:


It just adds a lot of #if's throughout the code.  Other than that,
there's no reason to remove it.


I agree, lots of #ifs are bad.  But I guess I don't see the  
problem.

The only real important thing people were directly accessing in the
ompi_group_t is the array of proc pointers.  Indexing into them  
could

be done with a static inline function that just has slightly
different time complexity based on compile options.  Static inline
function that just does an index in the group proc pointer would  
have

almost no added overhead (none if the compiler doesn't suck).


Ya, that's what I proposed.  :-)

But I did also propose removing the extra #if's so that the sparse
group code would be available and we'd add an extra "if" in the
critical code path.

But we can do it this way instead:

Still use the MACRO to access proc_t's.  In the --disable-sparse-
groups scenario, have it map to comm.group.proc[i].  In the -- 
enable-

sparse-groups scenario, have it like I listed in the original
proposal:

 static inline ompi_proc_t lookup_group(ompi_group_t *group, int
index) {
 if (group_is_dense(group)) {
 return group->procs[index];
 } else {
 return sparse_group_lookup(group, index);
 }
 }

With:

a) groups are always dense if --enable and an MCA parameter turns  
off

sparse groups, or
b) there's an added check in the inline function for whether the MCA
parameter is on

I'm personally in favor of a) because it means only one conditional
in the critical path.


I don't really care about the sparse groups turned on case.  I just
want minimal #ifs in the global code and to not have an if() { ... }
in the critical path when sparse groups are disabled :).

Brian
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Mohamad Chaarawi
Instructional Assistant   http://www.cs.uh.edu/~mschaara
Department of Computer ScienceUniversity of Houston
4800 Calhoun, PGH Room 526Houston, TX 77204, USA

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [RFC] Sparse group implementation

2007-07-25 Thread Brian Barrett


On Jul 25, 2007, at 3:14 PM, Jeff Squyres wrote:


On Jul 25, 2007, at 5:07 PM, Brian Barrett wrote:


It just adds a lot of #if's throughout the code.  Other than that,
there's no reason to remove it.


I agree, lots of #ifs are bad.  But I guess I don't see the problem.
The only real important thing people were directly accessing in the
ompi_group_t is the array of proc pointers.  Indexing into them could
be done with a static inline function that just has slightly
different time complexity based on compile options.  Static inline
function that just does an index in the group proc pointer would have
almost no added overhead (none if the compiler doesn't suck).


Ya, that's what I proposed.  :-)

But I did also propose removing the extra #if's so that the sparse
group code would be available and we'd add an extra "if" in the
critical code path.

But we can do it this way instead:

Still use the MACRO to access proc_t's.  In the --disable-sparse-
groups scenario, have it map to comm.group.proc[i].  In the --enable-
sparse-groups scenario, have it like I listed in the original  
proposal:


 static inline ompi_proc_t lookup_group(ompi_group_t *group, int
index) {
 if (group_is_dense(group)) {
 return group->procs[index];
 } else {
 return sparse_group_lookup(group, index);
 }
 }

With:

a) groups are always dense if --enable and an MCA parameter turns off
sparse groups, or
b) there's an added check in the inline function for whether the MCA
parameter is on

I'm personally in favor of a) because it means only one conditional
in the critical path.


I don't really care about the sparse groups turned on case.  I just  
want minimal #ifs in the global code and to not have an if() { ... }  
in the critical path when sparse groups are disabled :).


Brian

Re: [OMPI devel] [RFC] Sparse group implementation

2007-07-25 Thread Brian Barrett


On Jul 25, 2007, at 2:56 PM, Jeff Squyres wrote:


On Jul 25, 2007, at 10:39 AM, Brian Barrett wrote:


I have an even bigger objection than Rich.  It's near impossible to
measure the latency impact of something like this, but it does have
an additive effect.  It doesn't make sense to have all that code in
the critical path for systems where it's not needed.  We should leave
the compile time decision available, unless there's a very good
reason (which I did not see in this e-mail) to remove it.


It just adds a lot of #if's throughout the code.  Other than that,
there's no reason to remove it.


I agree, lots of #ifs are bad.  But I guess I don't see the problem.   
The only real important thing people were directly accessing in the  
ompi_group_t is the array of proc pointers.  Indexing into them could  
be done with a static inline function that just has slightly  
different time complexity based on compile options.  Static inline  
function that just does an index in the group proc pointer would have  
almost no added overhead (none if the compiler doesn't suck).


Brian

Re: [OMPI devel] [RFC] Sparse group implementation

2007-07-25 Thread Brian Barrett

I have an even bigger objection than Rich.  It's near impossible to  
measure the latency impact of something like this, but it does have  
an additive effect.  It doesn't make sense to have all that code in  
the critical path for systems where it's not needed.  We should leave  
the compile time decision available, unless there's a very good  
reason (which I did not see in this e-mail) to remove it.


Brian

On Jul 25, 2007, at 8:00 AM, Richard Graham wrote:

This is good work, so I am happy to see it come over.  My initial  
understanding was that
 there would be compile time protection for this.  In the absence  
of this, I think we need
 to see performance data on a variety of communication substrates.   
It seems like a latency
 measurement is, perhaps, the most sensitive measurement, and  
should be sufficient to

 see the impact on the critical path.

Rich


On 7/25/07 9:04 AM, "Jeff Squyres"  wrote:

WHAT:Merge the sparse groups work to the trunk; get the  
community's

  opinion on one remaining issue.
WHY: For large MPI jobs, it can be memory-prohibitive to fully
  represent dense groups; you can save a lot of space by  
having

  "sparse" representations of groups that are (for example)
  derived from MPI_COMM_WORLD.
WHERE:   Main changes are (might have missed a few in this analysis,
  but this is 99% of it):
  - Big changes in ompi/group
  - Moderate changes in ompi/comm
  - Trivial changes in ompi/mpi/c, ompi/mca/pml/[dr|ob1],
ompi/mca/comm/sm
WHEN:The code is ready now in /tmp/sparse-groups (it is passing
  all Intel and IBM tests; see below).
TIMEOUT: We'll merge all the work to the trunk and enable the
  possibility of using sparse groups (dense will still be the
  default, of course) if no one objects by COB Tuesday, 31  
Aug

  2007.

= 
===

===

The sparse groups work from U. Houston is ready to be brought into  
the

trunk.  It is built on the premise that for very large MPI jobs, you
don't want to fully represent MPI groups in memory if you don't have
to.  Specifically, you can save memory for communicators/groups that
are derived from MPI_COMM_WORLD by representing them in a sparse
storage format.

The sparse groups work introduces 3 new ompi_group_t storage formats:

* dense (i.e., what it is today -- an array of ompi_proc_t pointers)
* sparse, where the current group's contents are based on the group
   from which the child was derived:
   1. range: a series of (offset,length) tuples
   2. stride: a single (first,stride,last) tuple
   3. bitmap: a bitmap

Currently, all the sparse groups code must be enabled by configuring
with --enable-sparse-groups.  If sparse groups are enabled, each MPI
group that is created will automatically use the storage format that
takes the least amount of space.

The Big Issue with the sparse groups is that getting a pointer to an
ompi_proc_t may no longer be an O(1) operation -- you can't just
access it via comm->group->procs[i].  Instead, you have to call a
macro.  If sparse groups are enabled, this will call a function to do
the resolution and return the proc pointer.  If sparse groups are not
enabled, the macro currently resolves to group->procs[i].

When sparse groups are enabled, looking up a proc pointer is an
iterative process; you have to traverse up through one or more parent
groups until you reach a "dense" group to get the pointer.  So the
time to lookup the proc pointer (essentially) depends on the group  
and

how many times it has been derived from a parent group (there are
corner cases where the lookup time is shorter).  Lookup times in
MPI_COMM_WORLD are O(1) because it is dense, but it now requires an
inline function call rather than directly accessing the data
structure (see below).

Note that the code in /tmp/sparse-groups is currently out-of-date  
with

respect to the orte and opal trees due to SVN merge mistakes and
problems.  Testing has occurred by copying full orte/opal branches
from a trunk checkout into the sparse group tree, so we're confident
that it's compatible with the trunk.  Full integration will occur
before commiting to the trunk, of course.

The proposal we have for the community is as follows:

1. Remove the --enable-sparse-groups configure option
2. Default to use only dense groups (i.e., same as today)
3. If the new MCA parameter "mpi_use_sparse_groups" is enabled,  
enable

the use of sparse groups
4. Eliminate the current macro used for group proc lookups and  
instead

use an inline function of the form:

static inline ompi_proc_t lookup_group(ompi_group_t *group, int
index) {
if (group_is_dense(group)) {
return group->procs[index];
} else {
return sparse_group_lookup(group, index);
}
}

*** NOTE: This design adds a single "if" in some

Re: [OMPI devel] Fwd: [Open MPI] #1101: MPI_ALLOC_MEM with 0 size must be valid

2007-07-24 Thread Brian Barrett


On Jul 24, 2007, at 8:28 AM, Gleb Natapov wrote:


On Tue, Jul 24, 2007 at 11:20:11AM -0300, Lisandro Dalcin wrote:

On 7/23/07, Jeff Squyres  wrote:

Does anyone have any opinions on this?  If not, I'll go implement
option #1.


Sorry, Jeff... just reading this. I think your option #1 is the
better. However, I want to warn you about to issues:

* In my Linux FC6 box, malloc(0) return different pointers for each
call. In fact, I believe this is a requeriment for malloc, in the  
case

man malloc tells me this:
"If size was equal to 0, either NULL or a pointer suitable to be  
passed to free()
is returned". So may be we should just return NULL and be done with  
it?


Which is also what POSIX says:

  http://www.opengroup.org/onlinepubs/009695399/functions/malloc.html

I vote with gleb -- return NULL, don't set errno, and be done with  
it.  The way I read the advice to implementors, this would be a legal  
response to a 0 byte request.


Brian

Re: [OMPI devel] [OMPI svn] svn:open-mpi r15492

2007-07-19 Thread Brian Barrett

Sigh.  Thanks.  Should probably have tested that code ;).  And the  
solaris code.  and the windows code.


Brian

On Jul 19, 2007, at 7:37 AM, Jeff Squyres wrote:


Thanks!

Ralph got it this morning in https://svn.open-mpi.org/trac/ompi/
changeset/15501.


On Jul 19, 2007, at 5:34 AM, Bert Wesarg wrote:


Hello,


Author: brbarret
Date: 2007-07-18 16:23:45 EDT (Wed, 18 Jul 2007)
New Revision: 15492
URL: https://svn.open-mpi.org/trac/ompi/changeset/15492

Log:
add ability to have thread-specific data on windows, pthreads,
solaris
threads,
and non-threaded builds

+int
+opal_tsd_key_create(opal_tsd_key_t *key,
+opal_tsd_destructor_t destructor)
+{
+int i;
+
+if (!atexit_registered) {
+atexit_registered = true;
+if (0 != atexit(run_destructors)) {
+return OPAL_ERR_TEMP_OUT_OF_RESOURCE;
+}
+}
+
+for (i = 0 ; i < TSD_ENTRIES ; ++i) {
+if (entries[i].used == false) {
+entries[i].used = true;
+entries[i].value = NULL;
+entries[i].destructor = destructor;
+*key = i;

   break;

+}
+}
+if (i == TSD_ENTRIES) return ENOMEM;
+
+return OPAL_SUCCESS;
+}


Bert

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

[OMPI devel] RML/OOB change heads up

2007-07-16 Thread Brian Barrett


Hey all -

Thought I would give you guys a heads up on some code that will be  
coming into the trunk in the not too distant future (hopefully  
tomorrow?).  The changes revolve around the RML/OOB interface and  
include:


  * General TCP cleanup for OPAL / ORTE
  * Simplifying the OOB by moving much of the logic into the RML
  * Allowing the OOB RML component to do routing of messages
  * Adding a component framework for handling routing tables
  * Moving the xcast functionality from the OOB base to its own  
framework


The IPv6 code did some things that I (and I know George) didn't  
like.  Some functions had their interface change depending on whether  
IPv6 support was enabled (taking either an sockaddr_in or  
sockaddr_in6 instead of just a sockaddr) and we were inconsistent  
about storage sizes.  I've added a bunch of compatibility code to  
opal_config_bottom.h so that we can always have sockaddr_storage and  
some of the required IPv6 defines, which drastically simplified the  
IPv6 code in the TCP OOB.


Previously, the OOB and RML component interfaces were essentially  
equivalent.  This isn't surprising, as the RML was added at the last  
minute as a wrapper around the OOB as a forward looking way of  
solving multi-cell architectures.  The interface into the OOB was  
also strange, requiring the upper layer (the RML) to call base  
functions that did a bit of work, then called the component.  With  
this change, all the base code has been moved into the RML, and the  
OOB interface has been simplified by removing all the blocking and  
dss buffer communication.  The RML now handles the implementation of  
blocking sends and dss buffer communication.  This not only greatly  
simplifies writing an OOB component, but removes the base code in the  
oob, which was causing problems as it implied that there was one and  
only one oob component active at a time, which some people are  
apparently trying to break (by having multiple OOB components alive).


The OOB RML can now also route messages, using a new framework (the  
routed framework) for determining how a message should be routed.   
Currently, only direct routing is supported, although that will  
change in the near future.  The not-so-long term goal is to allow MPI  
processes to talk to each other and to the HNP through their local  
daemon, rather than directly.  This will drastically reduce the  
number of sockets open in the system, which can only help with the  
speed thing.


Finally, we moved the xcast functionality out of the OOB base and  
into its own framework.  It really didn't make sense to have it in  
the OOB base, as it didn't do anything OOB specific and just utilized  
the RML to move data around.  By moving it to its own framework, we  
can more easily experiment with new xcast protocols (using the  
component infrastructure, rather than the games Ralph currently has  
to play using MCA parameters and if statements).  It also makes a  
clearer distinction as to which components are responsible for which  
functionality.


Anyway, that's where we're at.  You can take a look at the code in  
the temporary branch bwb-oob-rml-cleanup, although it currently does  
not work for singletons due to some merge conflicts from last night.   
This will be resolved before the merge back into the trunk, obviously.



Brian

--
  Brian W. Barrett
  Networking Team, CCS-1
  Los Alamos National Laboratory

Re: [OMPI devel] Build failures of 1.2.3 on Debian hppa, mips, mipsel, s390, m68k

2007-07-14 Thread Brian Barrett

the availability of functionality is set by the header files for each  
platform, not by configure.  So we'd have to play some games to get  
at the information, but it should be possible.


Brian

On Jul 14, 2007, at 12:41 PM, George Bosilca wrote:


Brian,

We should be able to use these defines in the configure.m4 files for
each component right ? I think the asm section is detected before we
go in the component configuration.

So far we know about the following components that have to disable
themselves if no atomic or memory barrier is detected:
  - MPOOL: sm
  - BTL: sm, openib (completely or partially?)

Anybody knows about any other components with atomic requirements ?

   george.

On Jul 14, 2007, at 1:59 PM, Brian Barrett wrote:


On Jul 14, 2007, at 11:51 AM, Gleb Natapov wrote:


On Sat, Jul 14, 2007 at 01:16:42PM -0400, George Bosilca wrote:

Instead of failing at configure time, we might want to disable the
threading features and the shared memory device if we detect  
that we

don't have support for atomics on a specified platform. In a non
threaded build, the shared memory device is the only place where we
need support for memory barrier. I'll look in the code to see  
why we

need support for compare-and-swap on a non threaded build.

Proper memory barrier is also needed for openib BTL eager RDMA
support.


Removed all the platform lists, since they won't care about this
part :).

Ah, true.  The eager RDMA code should check that the preprocessor
symbol OPAL_HAVE_ATOMIC_MEM_BARRIER is 1 and disable itself if that
isn't the case.  All the "sections" of ASM support (memory barriers,
locks, compare-and-swap, and atomic math) have preprocessor symbols
indicating whether support exists or not in the current build.  These
should really be used :).

Brian
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] Build failures of 1.2.3 on Debian hppa, mips, mipsel, s390, m68k

2007-07-14 Thread Brian Barrett


On Jul 14, 2007, at 11:51 AM, Gleb Natapov wrote:


On Sat, Jul 14, 2007 at 01:16:42PM -0400, George Bosilca wrote:

Instead of failing at configure time, we might want to disable the
threading features and the shared memory device if we detect that we
don't have support for atomics on a specified platform. In a non
threaded build, the shared memory device is the only place where we
need support for memory barrier. I'll look in the code to see why we
need support for compare-and-swap on a non threaded build.
Proper memory barrier is also needed for openib BTL eager RDMA  
support.


Removed all the platform lists, since they won't care about this  
part :).


Ah, true.  The eager RDMA code should check that the preprocessor  
symbol OPAL_HAVE_ATOMIC_MEM_BARRIER is 1 and disable itself if that  
isn't the case.  All the "sections" of ASM support (memory barriers,  
locks, compare-and-swap, and atomic math) have preprocessor symbols  
indicating whether support exists or not in the current build.  These  
should really be used :).


Brian

Re: [OMPI devel] Build failures of 1.2.3 on Debian hppa, mips, mipsel, s390, m68k

2007-07-14 Thread Brian Barrett


On Jul 14, 2007, at 10:53 AM, Dirk Eddelbuettel wrote:


Methinks we need to fill in a few blanks here, or make do with non-asm
solutions. I don't know the problem space that well (being a  
maintainer

rather than upstream developer) and am looking for guidance.


Either way is an option.  There are really only a couple of functions  
that have to be implemented:


  * atomic word-size compare and swap
  * memory barrier

We'll emulte atomic adds and spin-locks with compare and swap if not  
directly implemented.  The memory barrier functions have to exist,  
even if they don't do anything.  We require compare-and-swap for a  
couple of pieces of code, which is why we lost our Sparc v8 support a  
couple of releases ago.



For what it's worth, lam (7.1.2, currently) us available on all build
architectures for Debian, but it may not push the (hardware)  
envelope as

hard.


Correct, LAM only had very limited ASM requirements (basically,  
memory barrier on platforms that required it -- like PowerPC).


Brian

Re: [OMPI devel] Build failures of 1.2.3 on Debian hppa, mips, mipsel, s390, m68k

2007-07-14 Thread Brian Barrett


On Jul 14, 2007, at 8:26 AM, Dirk Eddelbuettel wrote:

Please let us (ie Debian's openmpi maintainers) how else we can  
help.  I am
ccing the porters lists (for hppa, m68k, mips) too to invite them  
to help. I
hope that doesn't get the spam filters going...  I may contact the  
'arm'
porters once we have a failure; s390 and sparc activity are not as  
big these

days.


Open MPI uses some assembly for things like atomic locks, atomic  
compare and swap, memory barriers, and the like.  We currently have  
support for:


  * x86 (32 bit)
  * x86_64 / amd64 (32 or 64 bit)
  * UltraSparc (v8plus and v9* targets)
  * IA64
  * PowerPC (32 or 64 bit)

We also have code for:

  * Alpha
  * MIPS (32 bit NEW ABI & 64 bit)

This support isn't well tested in a while and it sounds like it  
doesn't work for MIPS.  At one time, we supported the sparc v8  
target, but that The other platforms (hppa, mipsel (how is this  
different than MIPS?), s390, m68k) aren't at all supported by Open  
MPI.  If you can get the real error messages, I can help on the MIPS  
issue, although it'll have to be a low priority.


We don't currently have support for a non-assembly code path.  We  
originally planned on having one, but the team went away from that  
route over time and there's no way to build Open MPI without assembly  
support right now.



Brian

--
  Brian W. Barrett
  Networking Team, CCS-1
  Los Alamos National Laboratory

Re: [OMPI devel] Notes on building and running Open MPI on Red Storm

2007-07-12 Thread Brian Barrett

Do you have a Subversion account?  If so, feel free to update the  
wiki ;).  If not, we should probably get you an account.  Then feel  
free to update the wiki ;).  But thanks for the notes!


Brian

On Jul 11, 2007, at 4:47 PM, Glendenning, Lisa wrote:


Some supplementary information to the wiki at
https://svn.open-mpi.org/trac/ompi/wiki/CrayXT3.


I. Accessing the Open MPI source:

  * Subversion is installed on redstorm in /projects/unsupported/bin

  * Reddish has subversion in the default path (you don't have to  
load a

module)

  * The proxy information in the wiki is accurate, and works on both
redstorm and reddish


II. Building Open MPI on reddish:

  * 'module load PrgEnv-pgi-xc' to cross compile for redstorm

  * Reddish and redstorm do not have recent enough version of  
autotools,

so you must build your own (currently available in
/project/openmpi/rbbrigh/tools)

  * Suggested configuration: 'configure CC=qk-gcc CXX=qk-pgCC
F77=qk-pgf77 FC=qk-pgf90 --disable-mpi-profile --with- 
platform=redstorm

--host=x86_64-cray-linux-gnu --build=x86_64-unknown-linux-gnu
--disable-mpi-f90 --disable-mem-debug --disable-mem-profile
--disable-debug build_alias=x86_64-unknown-linux-gnu
host_alias=x86_64-cray-linux-gnu'


III. Building with Open MPI:

  * No executables will be installed in $PREFIX/bin, so must compile
without mpicc, e.g. 'qk-gcc -I$PREFIX/include *.c -L$PREFIX/lib -lmpi
-lopen-rte -lopen-pal'

  * When linking with libopen-pal, the following warning is normal:  
'In

function `checkpoint_response': warning: mkfifo is not implemented and
will always fail'


IV. Running on Redstorm:

  * scp your executable from reddish to redstorm

  * Use 'qsub' to submit job and 'yod' to launch job (if you do an
interactive session, you can bypass PBS)

  * qsub expects project/task information - you can either provide  
this

with -A option or set it in $XT_ACCOUNT environmental variable

  * Sample job script for qsub with 8 nodes/processes and 10 minute
runtime:

#!/bin/sh
#PBS -lsize=8
#PBS -lwalltime=10:00
cd $PBS_O_WORKDIR
yod a.out

  * Use 'xtshowmesh' and 'qstat' to query job status and cluster
configuration

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Brian Barrett


On Jul 10, 2007, at 7:09 AM, Tim Prins wrote:


Jeff Squyres wrote:

2. The "--enable-mca-no-build" option takes a comma-delimited list of
components that will then not be built.  Granted, this option isn't
exactly intuitive, but it was the best that we could think of at the
time to present a general solution for inhibiting the build of a
selected list of components.  Hence, "--enable-mca-no-build=pls-
slurm,ras-slurm" would inhibit building the SLURM RAS and PLS
components (note that the SLURM components currently do not require
any additional libraries, so a) there is no corresponding --with 
[out]-

slurm option, and b) they are usually always built).


Actually, there are --with-slurm/--without-slurm options. We  
default to
building slurm support automatically on linux and aix, but not on  
other

platforms.


On a mostly unrelated note...  We should probably also now build the  
SLURM component for OS X, since SLURM is now available for OS X as  
well.  And probably should also check for SLURM's srun and build if  
we find it even if we aren't on Linux, AIX, or OS X.


Brian

[OMPI devel] fake rdma flag again?

2007-07-09 Thread Brian Barrett


Hi all -

I've finally committed a version of the rdma one-sided component that  
1) works and 2) in certain situations actually does rdma.  I'll make  
it the default when the BTLs are used as soon as one last bug is  
fixed in the DDT engine.


However, there is still one outstanding issue.  Some BTLs (like  
Portals or MX) advertise the ability to do a put but place  
restrictions on the put that only work for OB1.  For example, both  
can only do an RDMA that starts where the prepare_dst() / prepare_src 
() call said the target buffer was.  This isn't a problem for OB1,  
but kind of defeats the purpose of one-sided ;). There's also a  
reference count (I believe) in the Portals put/get code that would  
make life interesting if a descriptor was doing multiple RDMA ops at  
once.


I was thinking that the easy way to solve this was to add a flag  
(FAKE_RDMA was the current running favorite, since we've used it  
before for different meaning :) ) to the components that have  
behaviors that work for OB1, but not a generalized rdma interface.  I  
was wondering what people thought of this idea and if they had any  
preference for naming the flag.


Brian

Re: [OMPI devel] One-sided operations with Portals

2007-07-05 Thread Brian Barrett


On Jul 5, 2007, at 4:16 PM, Glendenning, Lisa wrote:


Ron Brightwell at SNL has asked me to look into optimizing Open MPI's
one-sided operations over Portals.  Does anyone have any guidance or
thoughts for this?


Hi Lisa -

There are currently two implementations of the one-sided interface  
for Open MPI: pt2pt and rdma.


The pt2pt component is implemented entirely over the interfaces used  
to implement the MPI-1 point-to-point interface.  So it ends up doing  
lots of copies and is entirely two-sided.  It could support async  
progress with threads, but that doesn't help the XT platform all that  
much.  It was the first one-sided component implemented, mostly  
because we needed to support protocols like MX and PSM that don't  
really expose one-sided semantics, and I only wanted to support one  
new component per release.


The rdma component is implemented over our BTL (byte transport layer  
-- the device driver our communication is written over), and can  
either use call-back based send/receive or true rdma.  The true rdma  
is only for put/get for contiguous datatypes.  The performance on  
OpenIB is ok, but not great (I'll send you some more details off  
list).  I'd assume that the performance on Portals would be similar.   
However, the btl_put and btl_get implementation for the Portals BTL  
was implemented assuming it would only be used the way the PML (the  
MPI-1 point-to-point implementation) used it.  It won't work with the  
rdma one-sided component at this time.  I can go into more details if  
you decide that fixing the Portals BTL to support the rdma component  
is a path you want to look at.


Then, of course, there's the option of writing a Portals-specific one- 
sided component.  The component interface is pretty straight-forward  
-- it's the MPI-2 one-sided chapter interface functions, plus an  
initialization function.  This is the path towards best performance,  
but also means the most code to write.  The existing code in Open MPI  
handles the attribute management, but that's about it if you go this  
route.  Of course, you can always copy freely from the rdma and pt2pt  
components.  There used to be a document somewhere describing how to  
add a new component, but I think it is horribly out of date.  I'll  
see if I can find it and send it your way.


Of course, the first starting point is to get a checkout of the code  
and get it built.  There are instructions for getting an SVN checkout  
of Open MPI (and how to get it built from there) available on the web  
page:


http://www.open-mpi.org/svn/

Building on the XT platform (if you're going that route) is slightly  
more complicated, and you probably want to take a look at the  
horribly out of date wiki page on the subject here:


  https://svn.open-mpi.org/trac/ompi/wiki/CrayXT3


Hopefully, that's enough to get you started.  If you have any  
questions, ask away.


Brian

--
  Brian W. Barrett
  Networking Team, CCS-1
  Los Alamos National Laboratory

Re: [OMPI devel] Modex

2007-06-27 Thread Brian Barrett

THe function name changes are pretty obvious (s/mca_pml_base/ompi/),  
and I thought I'd try something new and actually document the  
interface in the header file :).  So we should be good on that front.


Brian

On Jun 27, 2007, at 6:38 AM, Terry D. Dontje wrote:


I am ok with the following as long as we can have some sort of
documenation describing what changed like which old functions
are replaced with newer functions and any description of changed
assumptions.

--td
Brian Barrett wrote:


On Jun 26, 2007, at 6:08 PM, Tim Prins wrote:




Some time ago you were working on moving the modex out of the pml
and cleaning
it up a bit. Is this work still ongoing? The reason I ask is that  
I am

currently working on integrating the RSL, and would rather build on
the new
code rather than the old...




Tim Prins brings up a point I keep meaning to ask the group about.  A
long time ago in a galaxy far, far away (aka, last fall), Galen and I
started working on the BTL / PML redesign that morphed into some
smaller changes, including some interesting IB work.  Anyway, I
rewrote large chunks of the modex, which did a couple of things:

* Moved the modex out of the pml base and into the general OMPI code
(renaming
  the functions in the process)
* Fixed the hang if a btl doesn't publish contact information (we
wait until we
  receive a key pushed into the modex at the end of MPI_INIT)
* Tried to reduce the number of required memory copies in the  
interface


It's a fairly big change, in that all the BTLs have to be updated due
to the function name differences.  It's fairly well tested, and would
be really nice for dealing with platforms where there are different
networks on different machines.  If no one has any objections, I'll
probably do this next week...

Brian

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] Modex

2007-06-26 Thread Brian Barrett


On Jun 26, 2007, at 6:08 PM, Tim Prins wrote:

Some time ago you were working on moving the modex out of the pml  
and cleaning

it up a bit. Is this work still ongoing? The reason I ask is that I am
currently working on integrating the RSL, and would rather build on  
the new

code rather than the old...


Tim Prins brings up a point I keep meaning to ask the group about.  A  
long time ago in a galaxy far, far away (aka, last fall), Galen and I  
started working on the BTL / PML redesign that morphed into some  
smaller changes, including some interesting IB work.  Anyway, I  
rewrote large chunks of the modex, which did a couple of things:


* Moved the modex out of the pml base and into the general OMPI code  
(renaming

  the functions in the process)
* Fixed the hang if a btl doesn't publish contact information (we  
wait until we

  receive a key pushed into the modex at the end of MPI_INIT)
* Tried to reduce the number of required memory copies in the interface

It's a fairly big change, in that all the BTLs have to be updated due  
to the function name differences.  It's fairly well tested, and would  
be really nice for dealing with platforms where there are different  
networks on different machines.  If no one has any objections, I'll  
probably do this next week...


Brian

Re: [OMPI devel] Patch to fix cross-compile failure

2007-06-11 Thread Brian Barrett

Argonne used AC_TRY_RUN instead of AC_TRY_COMPILE (I believe) because  
there are some places where aio functions behaved badly (returned  
ENOTIMPL or something).  I was going to make it call AC_TRY_RUN if we  
weren't cross-compiling and AC_TRY_COMPILE if we were.  I'll commit  
something this evening.


Brian


On Jun 11, 2007, at 6:07 AM, Jeff Squyres wrote:


Paul -- Excellent; many thanks!

Brian: this patch looks good to me, but I defer to the unOfficial
OMPI ROMIO Maintainer (uOORM)...


On Jun 8, 2007, at 3:33 PM, Paul Henning wrote:



I've attached a patch relative to the revision 14962 version of

ompi/mca/io/romio/romio/configure.in

that fixes configuration errors when doing a cross-compile.  It
simply changes some tests for the number of parameters to
aio_suspend and aio_write from AC_TRY_RUN to AC_TRY_COMPILE.

Paul



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread Brian Barrett


I'm available this afternoon...

Brian

On Jun 7, 2007, at 12:39 PM, George Bosilca wrote:


I'm available this afternoon.

  george.

On Jun 7, 2007, at 2:35 PM, Galen Shipman wrote:



Are people available today to discuss this over the phone?

- Galen



On Jun 7, 2007, at 11:28 AM, Gleb Natapov wrote:


On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote:

) I expect you to revise the patch in order to propose a generic
solution or I'll trigger a vote against the patch. I vote to be
backed out of the trunk as it export way to much knowledge from the
Open IB BTL into the PML layer.

The patch solves real problem. If we want to back it out we need to
find
another solution. I also didn't like this change too much, but I
thought
about other solutions and haven't found something better that what
Galen did. If you have something in mind lets discuss it.

As a general comment this kind of discussion is why I prefer to send
significant changes as a patch to the list for discussion before
committing.



  george.

PS: With Gleb changes the problem is the same. The following  
snippet

reflect exactly the same behavior as the original patch.

I didn't try to change the semantic. Just make the code to match the
semantic that Galen described.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [OMPI svn] svn:open-mpi r14897

2007-06-06 Thread Brian Barrett


Yup, thanks.

Brian

On Jun 6, 2007, at 2:27 AM, Bert Wesarg wrote:




+#ifdef HAVE_REGEXEC
+args_count = opal_argv_count(options_data[i].compiler_args);
+for (j = 0 ; j < args_count ; ++j) {
+if (0 != regcomp(, options_data[i].compiler_args 
[j], REG_NOSUB)) {

+return -1;
+}
+
+if (0 == regexec(, arg, (size_t) 0, NULL, 0)) {

missing regfree();?


+return i;
+}
+
+regfree();
+}
+#else


regards
Bert
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] undefined environ symbol on Darwin

2007-05-29 Thread Brian Barrett


On May 29, 2007, at 7:35 AM, Jack Howarth wrote:


and if you see environ undefined, identify which library
it is in and which object file it came from. I would also
note that my patch reveals that several instances of the
environ variable being declared that are missing the Windows
wrappers. So if anything, adding the Darwin patch will
increase the probability that both targets are properly
maintained.


Yes, there are significant portions of the code base that are "Unix- 
only" and not built on Windows.  There are regular builds of Open MPI  
on Windows to ensure that problems are quickly resolved when they  
creep into the code base.  The places where the Windows environ fixes  
are missing are likely that way because they are in parts of the code  
that doesn't build on Windows.


As I've said, I'd be happy to commit a Mac OS X-specific fix for the  
environ problem if we can find a test case where it fails without the  
fix.  I'm not going to commit portability fixes to Open MPI for a  
problem that we can't replicate.  Based on what Peter said on the  
apple list, there is no problem with having an undefined symbol in a  
shared library (other than the fact that *that* shared library must  
be built with a flat namespace).  I'm working with someone here to  
get ParaView built on my Mac so I can trace down the problem and  
figure out if Open MPI is responsible for the missing symbol.


Brian

Re: [OMPI devel] undefined environ symbol on Darwin

2007-05-28 Thread Brian Barrett


On May 28, 2007, at 4:57 PM, Jack Howarth wrote:


   I have been told that Paraview is one package that
exhibits this problem with undefined environ symbols.
This will occur in any package which creates its own
shared libraries that link in any openmpi shared library
that contains the undefined environ symbol. I think it
is unreasonably restrictive to force all the application
developers who use openmpi to avoid creating shared libs
that use openmpi shared libraries. Again from the
response on the Darwin mailing list this is expected
behavior on Darwin. I will send two patches shortly
that address this without needing to touch configure.


Have you tried it?  I ask because I have.  I created a shared library  
that called MPI_COMM_SPAWN (to make sure that it called a function  
that needed environ).  Then created an application that called the  
function in that shared library.  Both the new shared library and the  
application were able to link without problems.


Both the Fink page and the Apple list post indicate that there's a  
problem *creating* a shared library with undefined symbols.  There  
appears to be no evidence to date that there's a problem creating a  
shared library that itself does not have undefined symbols but links  
to an application that does.  Given that I was unable to make it  
fail, I question whether this is a problem.


I'm hesitant to make this change because these types of things are  
hard to maintain.  SInce we don't have a test case that fails, it's  
impossible to properly test.  And since it's obscure and works in the  
common case, it's unlikely to be properly maintained over the long  
run.  If an example where this fails is presented, I'm happy to make  
the changes.  Until then, it just doesn't make sense.  I'm not trying  
to be unreasonable, but I don't want to add unmaintainable code  
without at least a direct example of failure.


Brian

Re: [OMPI devel] [RFC] Send data from the end of a buffer during pipeline proto

2007-05-17 Thread Brian Barrett

On the other hand, since the MPI standard explicitly says you're not  
allowed to call fork() or system() during the MPI application and  
sense the network should really cope with this in some way, if it  
further complicates the code *at all*, I'm strongly against it.   
Especially since it won't really solve the problem.  For example,  
with one-sided, I'm not going to go out of my way to send the first  
and last bit of the buffer so the user can touch those pages while  
calling fork.


Also, if I understand the leave_pinned protocol, this still won't  
really solve anything for the general case -- leave pinned won't send  
any data eagerly if the buffer is already pinned, so there are still  
going to be situations where the user can cause problems.  Now we  
have a situation where sometimes it works and sometimes it doesn't  
and we pretend to support fork()/system() in certain cases.  Seems  
like actually fixing the problem the "right way" would be the right  
path forward...


Brian

On May 17, 2007, at 10:10 AM, Jeff Squyres wrote:


Moving to devel; this question seems worthwhile to push out to the
general development community.

I've been coming across an increasing number of customers and other
random OMPI users who use system().  So if there's zero impact on
performance and it doesn't make the code [more] incredibly horrible
[than it already is], I'm in favor of this change.



On May 17, 2007, at 7:00 AM, Gleb Natapov wrote:


Hi,

 I thought about changing pipeline protocol to send data from the
end of
the message instead of the middle like it does now. The rationale
behind
this is better fork() support. When application forks, child doesn't
inherit registered memory, so IB providers educate users to not touch
buffers that were owned by the MPI before fork in a child process.  
The

problem is that granularity of registration is HW page (4K), so last
page of the buffer may contain also other application's data and user
may be unaware of this and be very surprised by SIGSEGV. If pipeline
protocol will send data from the end of a buffer then the last  
page of

the buffer will not be registered (and first page is never registered
because we send beginning of the buffer eagerly with rendezvous
packet)
so this situation will be avoided. It should have zero impact on
performance. What do you think? How common for MPI applications to
fork()?

--
Gleb.
___
devel-core mailing list
devel-c...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel-core



--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

[OMPI devel] Autotools Upgrade Time

2007-05-08 Thread Brian Barrett


Hi all -

As was discussed on the telecon a couple of weeks ago, to try to  
lower the maintenance cost of the build system, starting this  
Saturday Autoconf 2.60 and Automake 1.10 will be required to  
successfully run autogen.sh on the trunk.  As I mentioned in a  
previous e-mail, the required versions of the autotools will be:


AutoconfAutomakeLibtool
v1.12.57-2.59   1.9.6   1.5.22
v1.22.57-new1.9.6-new   1.5.22-new
trunk   2.60-new1.10.0-new  1.5.22-new


This means that there's no set of autotools that will be able to  
build all three versions of Open MPI, but since very few people  
currently spend time on v1.1, this should not present a major problem.


Brian

Re: [OMPI devel] Fancy ORTE/MPIRUN bugs

2007-04-20 Thread Brian Barrett


On Apr 19, 2007, at 8:38 AM, Aurelien Bouteiller wrote:


Hi,
I am experiencing several fancy bugs with ORTE.

All bugs occur on Intel 32 bits architecture under Mac OS X using gcc
4.2. The tested version is todays trunk (it also have occured for at
least three weeks)

First occurs when compiling in "optimized"  mode (aka configure
--disable-debug --with-platform=optimized) and does not occur in debug
mode.


Fixed as of r14440.  Was caused by a faulty compiler hint that  
allowed the compiler to optimize out some much needed checks on the  
input.



The other one occurs when running MPI program without mpirun (I know
this is pretty useless but still ;) ). This bug does not require
specific compilation options to occur. Running mpirun -np 1  
mympiprogram

is fine, but running mympiprogram fails with segfault in MPI_Finalize:

~/ompi$ mpirun -np 1 mpiself
~/ompi$ gdb mpiself
(gdb) r
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x77767578
0x90002e46 in szone_malloc ()


As of r14440, I'm unable to replicate, but it could have been one of  
those getting lucky issues.  Can you see if the problem is still  
occurring?


Brian

Re: [OMPI devel] SOS!! Run-time error

2007-04-16 Thread Brian Barrett

Wow, it appears everything aborts when opal_event_loop() is called.   
Did you make any changes to the event library code in opal/event/?   
If not, that might indicate a mismatch between the binaries and  
libraries (ie, binaries from one build vs. libraries from another).   
This will cause random segfaults, possibly like this.


If that's no help, can you run ompi_info under gdb and generate a  
detailed stack trace?


Thanks,

Brian

On Apr 15, 2007, at 11:40 AM, chaitali dherange wrote:

  I have downloaded the developer version of source code by  
downloading a

nightly Subversion snapshot tarball.And have installed the openmpi.
Using

./configure --prefix=/net/hc293/chaitali/openmpi_dev
(lots of output... without errors)
make all install.
(lots of output... without errors)

then I have tried to run the example provided in this version of  
source code... the ring_c.c file... I first copied it to my home  
directory... /net/hc293/chaitali

now when inside my home directory... i did

set path=($path /net.hc293/chaitali/openmpi_dev/bin)
set $LD_LIBRARY_PATH = ( /net/hc293/chaitali/dev_openmpi/lib )
mpicc -o chaitali_test ring_c.c
(This gave no errors at all)
mpirun --prefix /net/hc293/chaitali/openmpi_dev -np 3 --hostfile / 
net/hc293/chaitali/machinefile ./test_chaitali

(This gave foll errors..)
[oolong:09783] *** Process received signal ***
[oolong:09783] Signal: Segmentation fault (11)
[oolong:09783] Signal code:  (128)
[oolong:09783] Failing at address: (nil)
[oolong:09783] [ 0] /lib64/tls/libpthread.so.0 [0x2a95e01430]
[oolong:09783] [ 1] /net/hc293/chaitali/openmpi_dev/lib/libopen- 
pal.so.0(opal_event_init+0x166) [0x2a957d9e16]
[oolong:09783] [ 2] /net/hc293/chaitali/openmpi_dev/lib/libopen- 
rte.so.0(orte_init_stage1+0x168) [0x2a95680638]
[oolong:09783] [ 3] /net/hc293/chaitali/openmpi_dev/lib/libopen- 
rte.so.0(orte_system_init+0xa) [0x2a9568375a]
[oolong:09783] [ 4] /net/hc293/chaitali/openmpi_dev/lib/libopen- 
rte.so.0(orte_init+0x49) [0x2a95680329]

[oolong:09783] [ 5] mpirun(orterun+0x155) [0x4029fd]
[oolong:09783] [ 6] mpirun(main+0x1b) [0x4028a3]
[oolong:09783] [ 7] /lib64/tls/libc.so.6(__libc_start_main+0xdb)  
[0x2a95f273fb]

[oolong:09783] [ 8] mpirun [0x4027fa]
[oolong:09783] *** End of error message ***
Segmentation fault

I understand that the [5] and [6] are the actual errors. But dont  
understand why? or how to overcome this error?


Please find attached the foll files:
- 'ring_c.c' file which I am trying to run.
- 'config.log' file from the openmpi-1.2.1a0r14362 folder
- 'ompi_info --all.txt' which is the the output of ompi_info -- 
all... This contains the above mentioned errors.


Thanks and Regards,
Chaitali

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] Is it possible to get BTL transport work directly with MPI level

2007-04-04 Thread Brian Barrett


On Apr 2, 2007, at 10:23 AM, Jeff Squyres wrote:


On Apr 1, 2007, at 3:12 PM, Ralph Castain wrote:


I can't help you with the BTL question. On the others:


2. Go through the BML instead -- the BTL Management Layer.  This is
essentially a multiplexor for all the BTLs that have been
instantiated.  I'm guessing that this is what you want to do
(remember that OMPI has true multi-device support; using the BML and
multiple BTLs is one of the ways that we do this).  Have a look at
ompi/mca/bml/bml.h for the interface.

There is also currently no mechanism to get the BML and BTL pointers
that were instantiated by the PML.  However, if you're just doing
proof-of-concept code, you can extract these directly from the MPI
layer's global variables to see how this stuff works.

To have full interoperability of the underlying BTLs and between
multiple upper-layer communication libraries (e.g., between OMPI and
something else) is something that we have talked about a little, but
have not done much work on.

To see the BTL interface (just for completeness), see ompi/mca/btl/
btl.h.


Jumping in late to the conversation, and on an unimportant point for  
what Pooja really wants to do, but...


The BTL really can't be used directly at this point -- you have to  
use the BML interface to get data pointers and the like.  There's  
never any need to grab anything from the PML or global structures.   
The BML information is contained on a pointer on the ompi_proc_t  
structure associated with each peer.  The list of peers can be  
accessed with the ompi_proc_world() call.


Hope this helps,

Brian

Re: [OMPI devel] comment on wiki/PrintfCodes

2007-02-26 Thread Brian Barrett


On Feb 26, 2007, at 1:54 PM, Bert Wesarg wrote:

I can only speak for a 3 year old linux system but I read evenly  
the wiki
page https://svn.open-mpi.org/trac/ompi/wiki/PrintfCodes and I  
wonder if

someone tried this code. On my system the PRId32 is defined as "d" for
example. so to use this you need to write something like this:

printf("foo: %" PRIu32 ", bar: %ld\n", foo, bar);
 ^
note this extra '%'.

on the other hand printf have an extra length specifier for size_t,  
its

'z', so a minimal size_t printf conversion is "%zu".


Thanks, I've fixed the PRI usage case.  Unfortunately, %zu isn't  
recognized by some versions of printf, so we can't use it in Open MPI.


BTW: are there any plans to provide mpi datatypes for these  
stdint.h types

like {,u}int{8,16,32,64,max,ptr}_t?


Not at this time.

Brian

Re: [OMPI devel] [PATCH] ompi_get_libtool_linker_flags.m4: fix $extra_ldflags detection

2007-02-26 Thread Brian Barrett


Very true, thanks.  I'll fix this evening.

Brian

On Feb 25, 2007, at 4:51 AM, Bert Wesarg wrote:


Hallo,

ok the sed should be even more portable. but the problem with a CC  
like
"gcc  -m32" isn't solved, so you should add this line and use the  
$tmpCC

in the sed expression, to get "gcc -m32" removed:

tmpCC=`echo $CC`

Bert

Brian W. Barrett wrote:

Thanks for the bug report and the patch.  Unfortunately, the remove
smallest prefix pattern syntax doesn't work with Solaris /bin/sh
(standards would be better if everyone followed them...), but I
committed something to our development trunk that handles the issue.
It should be releases as part of v1.2.1 (we're too far in testing to
make it part of v1.2).

Thanks,

Brian


On Feb 15, 2007, at 9:12 AM, Bert Wesarg wrote:


Hello,

when using a multi token CC variable (like "gcc -m32"), the logic to
extract $extra_ldflags from libtool don't work. So here is a little
hack
to remove the $CC prefix from the libtool-link cmd.

Bert Wesarg
diff -ur openmpi-1.1.4/config/ompi_get_libtool_linker_flags.m4
openmpi-1.1.4-extra_ldflags-fix/config/
ompi_get_libtool_linker_flags.m4
--- openmpi-1.1.4/config/ompi_get_libtool_linker_flags.m4   
2006-04-12 18:12:28.0 +0200
+++ openmpi-1.1.4-extra_ldflags-fix/config/
ompi_get_libtool_linker_flags.m42007-02-15 15:11:28.285844893 +0100
@@ -76,11 +76,15 @@
 cmd="$libtool --dry-run --mode=link --tag=CC $CC bar.lo libfoo.la -
o bar $extra_flags"
 ompi_check_linker_flags_work yes

+# use array initializer to remove multiple spaces in $CC
+tempCC=($CC)
+tempCC="${tempCC[@]}"
+output="${output#$tempCC}"
+unset tempCC
 eval "set $output"
 extra_ldflags=
 while test -n "[$]1"; do
 case "[$]1" in
-$CC) ;;
 *.libs/bar*) ;;
 bar*) ;;
 -I*) ;;
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] installed wrappers

2007-02-15 Thread Brian Barrett


On Feb 15, 2007, at 2:54 AM, Bert Wesarg wrote:

why are the mpiCC, mpif77, and mpif90 wrappers installed, when i  
specify

--disable-mpi-cxx, --disable-mpi-f77, and --disable-mpi-f90 for the
./configure?


The Fortran 77 and Fortran 90 compilers will be disabled and return  
an error if those language bindings are disabled.  This seemed to be  
easier for users to deal with than sometimes not having the wrapper  
compilers available.  And also made it more clear to users when they  
were using a build of Open MPI without those bindings, which removed  
support cost from us.


The C++ wrapper is a slightly more complicated issue.  Many users  
want to compile C++ code, but still use the C bindings.  So they  
expect mpiCC/mpic++ to work even when the C++ bindings aren't  
installed (just without linking in the C++ bindings).


Brian

Re: [OMPI devel] build problem with 1.1.4

2007-02-15 Thread Brian Barrett


On Feb 15, 2007, at 3:07 AM, Bert Wesarg wrote:

I encounter a build problem with openmpi 1.1.4, which don't show up  
with

version 1.1.2.

After a simple ./configure, the variable OPAL_DATADIR in
opal/include/opal/install_dirs.h shows this:

$ grep '^#define OPAL_DATADIR' openmpi-1.1.2/opal/include/opal/ 
install_dirs.h

#define OPAL_DATADIR "/usr/local/share"

$ grep '^#define OPAL_DATADIR' openmpi-1.1.4/opal/include/opal/ 
install_dirs.h

#define OPAL_DATADIR "${prefix}/share"

this results in the problem, that the opal_wrapper can't find the  
wrapper

data files in /share/openmpi/.


Is this with a SVN checkout or the release tarball?  The issue you  
are seeing is a known issue if you use Autoconf 2.60 or higher to  
create the build system for Open MPI 1.1.x.  The release tarball is  
built with Autoconf 2.59 and I just checked to verify that 1.1.4 was  
in fact using AC 2.59 and not creating the bad datadir defines.  You  
might want to make sure that some part of your build was not  
rerunning autoconf in the release source code.


Brian

[OMPI devel] (no subject)

2006-10-18 Thread Brian Barrett


Hi all -

I have four changes I'd like to make to the wrapper compilers (well,  
more the build system).  I figured I'd make them available for public  
comment before I did any of them, as they would change how things got  
installed and what the user sees:


1) Only install opal{cc,c++} and orte{cc,c++} if configured with -- 
with-devel-headers.  Right now, they are always installed, but there  
are no header files installed for either project, so there's really  
not much way for a user to actually compile an OPAL / ORTE application.


2) Drop support for opalCC and orteCC.  It's a pain to setup all the  
symlinks (indeed, they are currently done wrong for opalCC) and  
there's no history like there is for mpiCC.  This isn't a big deal,  
but would make two Makefiles easier to deal with.  And since about  
every 3 months, I have to fix the Makefiles after they get borked up  
a little bit, it makes my life easier.


3) Change what is currently opalcc.1 (name it something generic,  
probably opal_wrapper.1) and add some macros that get sed'ed so that  
the man pages appear to be customized for the given command.  Josh  
and I had talked about this long ago, but time prevented us from  
actually doing anything.


4) Install the wrapper data files even if we compiled with --disable- 
binaries.  This is for the use case of doing multi-lib builds, where  
one word size will only have the library built, but we need both set  
of wrapper data files to piece together to activate the multi-lib  
support in the wrapper compilers.


Comments?


Brian



--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/

[OMPI devel] configure changes (ooops!)

2006-10-13 Thread Brian Barrett


Hi all -

At the last minute last night I wanted to change one small detail in  
the wrapper compiler code.  Then, as is typical with me, I got  
distracted.  As some of you noticed, none of the configure changes  
made it into the trunk last night.  Should happen this weekend.   
Sorry about that!


Brian

Re: [OMPI devel] Buffer Overflow Error

2006-08-31 Thread Brian Barrett

What facilities are you using to detect the buffer overflow?  We've seen
no such issues in our testing and I'd be surprised if there was an issue
in that code path.  Valgrind and friends don't show any issues on our
test machines, so without more detail, I'm afraid we really can't fix
the issue you are seeing.

Brian


On Thu, 2006-08-24 at 13:53 -0400, Dave Rogers wrote:
> I just compiled the latest version on my machine and ran a dumb test -
> mpirun without any arguments.
> This generated a buffer overflow error!
> 
> Error message (reproducible with different mem. addr.s):
> [ /home/dave/rpmbuild ] $ mpirun 
> *** buffer overflow detected ***: mpirun terminated
> === Backtrace: =
> /lib64/libc.so.6(__chk_fail+0x2f)[0x31669dee3f]
> /lib64/libc.so.6[0x31669de69b]
> /lib64/libc.so.6(__snprintf_chk+0x7b)[0x31669de56b] 
> /usr/lib64/libopal.so.0(opal_cmd_line_get_usage_msg
> +0x20a)[0x2ac1088a]
> mpirun[0x403c53]
> mpirun(orterun+0xa0)[0x402798]
> mpirun(main+0x1b)[0x4026f3]
> /lib64/libc.so.6(__libc_start_main+0xf4)[0x316691d084] 
> mpirun[0x402649]
> === Memory map: 
> 0040-00408000 r-xp  09:01
> 2697992/usr/bin/orterun
> ...
> 7fff20e92000-7fff20ea8000 rw-p 7fff20e92000 00:00 0
> [stack] 
> ff60-ffe0 ---p  00:00 0
> [vdso]
> Aborted
> 
> Installation details: System: FC5 AMD Opteron x86_64
> downloaded SRPM version 1.1.1
> 
> rpm -ivh /usr/local/src/dist/libs/openmpi- 1.1-1.src.rpm
> rpmbuild -ba SPECS/openmpi-1.1.spec --target x86_64
>  - generates an error from check-rpaths stating that the /usr/lib64
> prefix is unnecessary and may cause problems
> QA_RPATHS=$[ 0x0001|0x0010 ] rpmbuild -ba SPECS/openmpi- 1.1.spec
> --target x86_64
>  - suggessted workaround - ignores as warnings
> rpm -ivh ~dave/rpmbuild/RPMS/x86_64/openmpi-1.1-1.x86_64.rpm
>  - generates a package conflict -- file /usr/lib64/libopal.so from
> install of openmpi-1.1-1 conflicts with file from package opal-2.2.1-1
>  - apparently, this comes from opal, the open phone abstraction
> library... so I uninstalled opal
> rpm -ivh ~dave/rpmbuild/RPMS/x86_64/openmpi-1.1-1.x86_64.rpm 
>  - worked! 
> 
> The strange thing is that mpirun with normal arguments works as
> expected without any sorts of mem. errors.
> mpirun with flags -h or --help also buffer overflows, but not mpirun
> with an unrecognized argument, to which it spits out a "you must
> specify how many processes to launch, via the -np argument." error. 
> 
> I hope this gets fixed soon, buffer overflows are potential security
> vulnerabilities.
> 
> ~ David Rogers
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] Stack trace printing

2006-08-30 Thread Brian Barrett

Yes.  It's always the trampoline, the signal handler, and the stack
trace printer.

Brian


On Wed, 2006-08-30 at 17:37 -0400, Jeff Squyres wrote:
> As long as it's always 3 function calls -- do we know that it will be?
> 
> 
> On 8/30/06 5:32 PM, "Brian Barrett" <brbar...@open-mpi.org> wrote:
> 
> > Hi all-
> > 
> > A question about stack tracing.  Currently, we have it setup so that,
> > say, a segfault results in:
> > 
> > [0]func:/u/jjhursey/local/odin/ompi/devel/lib/libopal.so.0(opal_backtrace_prin
> > t+0x2b) [0x2a959166ab]
> > [1] func:/u/jjhursey/local/odin/ompi/devel/lib/libopal.so.0 [0x2a959150bb]
> > [2] func:/lib64/tls/libpthread.so.0 [0x345cc0c420]
> > [3] 
> > func:/san/homedirs/jjhursey/local/odin//ompi/devel/lib/openmpi/mca_oob_tcp.so(
> > mca_oob_tcp_recv+0x480) [0x2a95fd6354]
> > [4] 
> > func:/u/jjhursey/local/odin/ompi/devel/lib/liborte.so.0(mca_oob_recv_packed+0x
> > 46) [0x2a957a96a3]
> > [5] 
> > func:/u/jjhursey/local/odin/ompi/devel/lib/libmpi.so.0(ompi_comm_connect_accep
> > t+0x1d8) [0x2a955a29dc]
> > [6] 
> > func:/u/jjhursey/local/odin/ompi/devel/lib/libmpi.so.0(ompi_comm_dyn_init+0x11
> > 0) [0x2a955a49e0]
> > 
> > This seems to result in confusion from some users (not josh, I was just
> > reading his latest bug when I thought of this) that the error must be in
> > OMPI because that's where it segfaulted.  It would be fairly trivial (at
> > least, on Linux and OS X) to not print the last 3 lines such that the
> > error looked like:
> > 
> > [0] 
> > func:/san/homedirs/jjhursey/local/odin//ompi/devel/lib/openmpi/mca_oob_tcp.so(
> > mca_oob_tcp_recv+0x480) [0x2a95fd6354]
> > [1] 
> > func:/u/jjhursey/local/odin/ompi/devel/lib/liborte.so.0(mca_oob_recv_packed+0x
> > 46) [0x2a957a96a3]
> > [2] 
> > func:/u/jjhursey/local/odin/ompi/devel/lib/libmpi.so.0(ompi_comm_connect_accep
> > t+0x1d8) [0x2a955a29dc]
> > [3] 
> > func:/u/jjhursey/local/odin/ompi/devel/lib/libmpi.so.0(ompi_comm_dyn_init+0x11
> > 0) [0x2a
> > 
> > Would anyone object to such a change?
> > 
> > Brian
> > 
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
>

[OMPI devel] Stack trace printing

2006-08-30 Thread Brian Barrett

Hi all-

A question about stack tracing.  Currently, we have it setup so that,
say, a segfault results in:

[0]func:/u/jjhursey/local/odin/ompi/devel/lib/libopal.so.0(opal_backtrace_print+0x2b)
 [0x2a959166ab]
[1] func:/u/jjhursey/local/odin/ompi/devel/lib/libopal.so.0 [0x2a959150bb]
[2] func:/lib64/tls/libpthread.so.0 [0x345cc0c420]
[3] 
func:/san/homedirs/jjhursey/local/odin//ompi/devel/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_recv+0x480)
 [0x2a95fd6354]
[4] 
func:/u/jjhursey/local/odin/ompi/devel/lib/liborte.so.0(mca_oob_recv_packed+0x46)
 [0x2a957a96a3]
[5] 
func:/u/jjhursey/local/odin/ompi/devel/lib/libmpi.so.0(ompi_comm_connect_accept+0x1d8)
 [0x2a955a29dc]
[6] 
func:/u/jjhursey/local/odin/ompi/devel/lib/libmpi.so.0(ompi_comm_dyn_init+0x110)
 [0x2a955a49e0]

This seems to result in confusion from some users (not josh, I was just
reading his latest bug when I thought of this) that the error must be in
OMPI because that's where it segfaulted.  It would be fairly trivial (at
least, on Linux and OS X) to not print the last 3 lines such that the
error looked like:

[0] 
func:/san/homedirs/jjhursey/local/odin//ompi/devel/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_recv+0x480)
 [0x2a95fd6354]
[1] 
func:/u/jjhursey/local/odin/ompi/devel/lib/liborte.so.0(mca_oob_recv_packed+0x46)
 [0x2a957a96a3]
[2] 
func:/u/jjhursey/local/odin/ompi/devel/lib/libmpi.so.0(ompi_comm_connect_accept+0x1d8)
 [0x2a955a29dc]
[3] 
func:/u/jjhursey/local/odin/ompi/devel/lib/libmpi.so.0(ompi_comm_dyn_init+0x110)
 [0x2a

Would anyone object to such a change?

Brian

Re: [OMPI devel] OpenRTE and Threads

2006-08-25 Thread Brian Barrett

In general, I think making the Public interface to OpenRTE not thread
safe is a reasonable thing to do.  However, I have some concern over how
this would work with the event library.  When the project is compiled
with progress threads, the event library runs in its own thread.  More
important to this discussion, all callbacks from the event library are
triggered in the callback thread (not the thread that registered the
event), meaning that it's very likely the GPR could get a callback from
a non-blocking OOB receive in a thread that is other than the main
thread of the application and that it could happen while the main thread
of the application is already in the GPR.

Not sure what the best way to handle this would be, but I don't think
you could do it from the event level without making adjustments that
would prohibit concurrency at the MPI layer, which is obviously
sub-optimal.

Of course, we could modify the code so that non-OMPI applications didn't
start the event progress thread, but that wouldn't solve the MPI-layer
issues.

Brian

On Fri, 2006-08-25 at 14:14 -0600, Ralph Castain wrote:

> There has been ongoing discussion for some time about the thread safety of
> OpenRTE. This note proposes a solution to that problem that has been kicked
> around for the last several months, and that Jeff and I feel makes a certain
> degree of sense.
> 
> Short description
> -
> We propose to make OpenRTE appear "single-threaded" to outside users. By
> that we do not mean that OpenRTE may not have some internal threads in
> operation. Instead, we mean that thread locking would be the responsibility
> of anyone calling an OpenRTE function - as opposed to built into the OpenRTE
> system itself.
> 
> Explanation
> -
> Most of the logic inside of OpenRTE is serial in nature and therefore
> resistant to the use of threads. Accordingly, we find ourselves putting
> giant thread locks around virtually every function in the code base. This
> wastes our time, complicates the code (we all keep forgetting to unlock when
> exiting due to errors), and basically eliminates any benefits from threading
> anyway.
> 
> Those few places where threading is possible are actually involved in
> OpenRTE-internal operations. For example, we now use a thread to accept
> out-of-band communication socket connections. These operations, however, are
> transparent to the user level (i.e., any code that calls OpenRTE).
> 
> It seems, therefore, that the simplest solution is to place the
> responsibility for thread locking onto the calling programs. Unthreaded
> programs need do nothing. Programs utilizing threads, however, would need to
> thread lock prior to calling OpenRTE functions.
> 
> Any comments on this idea? If not, or if there is general consensus on this
> approach, then we would gradually remove the current thread locks as code is
> revised - this isn't a high priority issue requiring an immediate scrub of
> the code.

[OMPI devel] LANL ORTE todo / milestones

2006-08-24 Thread Brian Barrett

Hi all -

LANL had an internal meeting yesterday trying to classify a number of
issues we're having with the run-time environment for Open MPI and how
to best prioritize team resources.  We thought it would be good to
both share the list (with priorities) with the group and to ask the
group if there were other issues that need to be addressed (either
short or long term).  We've categorized the issues as performance
related, robustness, and feature / platform support.  The numbers are
the current priority on our list, and items within a category are
sorted by priority.


PERFORMANCE:

5) 50% scale factor in process startup

   Start-up of non-MPI jobs has a strange bend in the timing curve
   when the number of processes we are trying to start is greater than
   or equal to 50% of the current allocation.  It appears that
   starting a 16 process (1 ppn) job takes longer if there are 32
   nodes in the allocation than if there are 64 nodes in the
   allocation.

   Assigned to: Galen

6) MPI_INIT startup timings

   In addition to seeming to suffer from the same 50% issue as the
   previous issue, there also appears to be a number of places in
   MPI_INIT where we spend a considerable amount of time when at
   scale, leading to startup times much worse than LA-MPI or
   MPIEXEC/MVAPICH.

   Assigned to: Galen


ROBUSTNESS:

1) MPI process aborting issue

   This is the orted spin, MPI processes don't die, etc. issue that
   occurs when some process dies unexpectedly.  Ralph has already sent
   a detailed e-mail to devel about this issue.

   Assigned to: Ralph

1.5) MPI_ABORT rework

   The MPI process aborting issue is going to require a rework of
   MPI_ABORT so that it uses the error manager instead of calling
   terminate_proc/terminate_job.

   Assigned to: Brian

2) ORTE hangs when start-up fails

   If an orted fails to start or fails to connect back to the HNP, the
   system hangs waiting for the callback.  If a orted process fails to
   start entirely, we sometimes catch this.  But we need a better
   mechanism for handling the general failure case.

   Assigned to: Ralph

3) Hardened cleanup of session directory

   While #1 should greatly help in ensuring that the session directory
   is cleaned up every time, there are still a number of race
   conditions that need to be sorted out.  The goal is to develop a
   plan that ensures files that need to be removed are removed
   automatically a high percentage of the time, that there is a way to
   allow a tool like orte_clean to clean up everything it should clean
   up, and that there is a way to make sure files that should not be
   automatically removed aren't automatically removed.

   Assigned to: Brian

3.5) Process not found hangs

   See https://svn.open-mpi.org/trac/ompi/ticket/245

   Assigned to: Ralph

7) Node death failures / hangs

   With the exception of BProc, if a node fails, we don't detect the
   failure.  Even if we did detect the failure, we have no general
   mechanism for dealing with that failure.  The bulk of this project
   is going to be adding a general SOH/SMR component that uses the OOB
   for timeout pings.

   Assigned to: Brian

15) More friendly error messages

   There are situations where we give something south of a useful
   error message when an error is found.  We should play nicer with
   users.

   Assigned to:

16) Consistent error checking

   We've had a number of recent instances of errors occuring, but not
   being propogated / returned to the user simply because no one ever
   checked the return code.  We need to audit most of ORTE to always
   check return codes.

   Assigned to:


FEATURE / PLATFORM SUPPORT:

4) TM error handling

   TM, while used on a number of large systems LANL needs to support,
   is not exactly friendly to usage at scale.  It seems that it likes
   to go away and cry to mamma for a couple seconds, returning system
   error messages, only to come back and be ok a second later.  This
   means that every TM call needs to be handled as if it's going to
   fail, and we need to be prepared to re-initialize the system (if
   possible) when failures occur.  In testing on t-bird, launching was
   usually pretty stable, but the calls to get the node allocations
   tended to result in the strange behavior.  These should definitely
   be re-startable type errors

   Assigned to: Brian

8) Hetergeneous Issues

   Assigned to:

9) External connections

   This covers issues like those the Eclipse team is experiencing.
   If, for example, a TCP connection to the seed is severed, it causes
   Open RTE to call abort, which means Eclipse just aborted.  That's
   not so good.  There are other naming / status issues that also need
   to be handled here.

   Assigned to:

9.5) Fix/Complete orte-ps and friends

orte-ps / orte-clean / etc. all depend on being able to make a
connection to the orte universe that doesn't result in bad things
happening.  We should finish these things for

Re: [OMPI devel] exit declaration in configure tests

2006-08-21 Thread Brian Barrett

On Mon, 2006-08-21 at 09:38 +0200, Ralf Wildenhues wrote:
> Revision 11268 makes me curious:
> 
> |M /trunk/config/ompi_setup_cxx.m4
> | 
> | Reorder the C++ compiler discovery stages. Check first the compiler vendor
> | before checking if we are able to compile the test program. This is required
> | for windows as the C++ conftest.c file generated by configure cannot be
> | compiled with the Microsoft cl.exe compiler (because of the exit function
> | prototype). So if we detect a vendor equal to microsoft we will assume
> | that the compiler is correctly installed (which is true on Windows most
> | of the time anyway).
> 
> I believe to have killed all problematic exit cases from the OpenMPI
> configury some time ago.  Did I miss any, or did the remaining ones
> come from some other package (so we can fix that one)?  Which Autoconf
> version was used (2.60 should not use any exit declarations itself any
> more)?

For one, I think I forgot to commit the patch you sent (shame on me!).
But I know George wasn't using AC 2.60 at the time.  He was going to try
that and see if it helped.

Brian

Re: [OMPI devel] one-sided communication implementation

2006-08-14 Thread Brian Barrett

On Thu, 2006-07-20 at 11:56 +1000, gh rory wrote:

> In the process of trying to create a wrapper for open mpi to another
> language.  Specifically, I am trying to understand how the remote
> memory access/one-sided communication works in open mpi 1.1, and I am
> having some trouble.  
> 
> I have begun by trying to trace the steps in a simple MPI_Get call.
> It seems that ompi_osc_pt2pt_replyreq_recv in
> ompi/mca/osc/pt2pt/osc_pt2pt_data_move.c is the function that receives
> the data for the requesting process, however I have not been able to
> find the part of the code that receives the request at the other end.
> It looks like ompi_osc_pt2pt_component_fragment_cb in
> osc_pt2pt_component.c sends the data back to the requesting process,
> but I can't see where the data is actually copied.  
> 
> Can someone please point me in the right direction?  Is there any
> documentation on the one-sided communication implementation that I
> should be reading?  

The one-sided component is layered on top of our BTL transport layer,
which uses an active message callback on message arrival.  The
ompi_osc_pt2pt_component_fragment_cb() call is called whenever a new
message has arrived.  The function then dispatches based on message
type.  If you look at the case for OMPI_OSC_PT2PT_HDR_PUT, you see a
call to ompi_osc_pt2pt_sendreq_recv_put(), which either uses the
convertor (our datatype engine) to unpack the data in the
ompi_convertor_unpack() call or posts a long message to receive the
data.

Hope this helps,

Brian

[OMPI devel] trunk changes: F90 shared libraries / New one-sided component

2006-08-02 Thread Brian Barrett

Hi all -

Two large changes to the SVN trunk just occurred which require an
autogen.sh on your part.  First, we now (mostly) support building the
Fortran 90 MPI bindings library as a shared library.  This has been
something Dan and I have been working on since the Burlington meeting,
and it's ready for wider testing.  There are some things to pay
attention to with this change:

  1) If your Fortran 77 and Fortran 90 compilers have different names,
 you *MUST* update to libtool 2.0 or disable F90 support.

  2) If your Fortran 77 and Fortran 90 compilers have the same name,
 you can continue using Libtool 1.5.22

  3) On all platforms other than OS X, the f90 support library is built
 as a shared library by default (following the way the other
 libraries are built).  OS X always builds a static library due to
 common block issues.

Configure will determine if you are using an older version of libtool
and the Fortran compilers will cause problem.  Libtool 2.0 isn't at a
stable release yet, but we need to provide a shared library for the
bindings as part of the 1.2 release, so we'll have to deal with the
pre-releases of Libtool.  The nightly tarballs of the SVN trunk have
been created using a pre-release of LT for about the last 2 weeks, so we
don't anticipate any problems with this.

Second, there are now two one-sided communication components.  The one
previously known as "pt2pt" has been renamed "rdma" and there is now a
new component "pt2pt".  The new "pt2pt" component is entirely (and
somewhat inefficiently) implemented over the PML (two-sided) interface
and was added to support the use of the CM PML / MTLs, which will be
part of the 1.2 release.  The "rdma" component will be preferred over
the "pt2pt" component, but will only allow itself to be activated when a
PML using the BML/BTL infrastructure is being used.  While the "rdma"
component doesn't use any of the BTL rdma interface at the moment, this
is something I will be changing in the near future.  So eventually, the
name will be more fitting than it is right now.

Both of these changes will require a full autogen.sh ; configure ; make
cycle when you next SVN update.

Brian

Re: [OMPI devel] progress thread check

2006-07-31 Thread Brian Barrett

On Thu, 2006-07-27 at 07:49 -0400, Graham E Fagg wrote:
> Hi all
>   is there a single function call that components can use to check that the 
> progress thread is up and running ?

Not really.  But if the define OMPI_ENABLE_PROGRESS_THREADS is 1 and
opal_using_threads() returns true, then it can be assumed the event
progress thread is running.

brian

Re: [OMPI devel] universal / "fat" binary support?

2006-07-31 Thread Brian Barrett

On Thu, 2006-07-27 at 15:21 -0700, Ben Byer wrote:
> I'd like to be able to build OpenMPI "fat" -- for multiple  
> architectures in one pass, so I can create a universal binary for  
> OSX.  I see that it was mentioned last year, at http://www.open- 
> mpi.org/community/lists/users/2005/06/0087.php as something that was  
> "a ways off".
> 
> Has any progress been made on that front, or do you still plan to  
> support this?

We currently can't build a universal binary in "one pass".  We actually
had a long discussion with some Apple engineers about the issue and what
it came down to was that supporting a "one pass" build of Open MPI as a
universal binary would take lots of development effort in making
Autoconf, Automake, and Libtool smarter.  We don't have the resources to
do that work on the autotools and it doesn't sound like there is enough
demand on the autotools authors for them to do the work, so it's
unlikely we'll progress on that front for some time.

We do provide a script in contrib/dist/macosx/ that will take a tarball
and build a universal binary .pkg file.  It ends up running the
configure / compile sequence three times (PPC, PPC64, and x86), but it
works quite well.  Mostly because it works so well, it is very difficult
to make further work on our build system to support a "one pass" build
of a Universal Binary a high priority.

Brian

[OMPI devel] SVN breakage / new event library committed

2006-07-27 Thread Brian Barrett


Hi all -

I just finished committing the event library into the trunk.   
Unfortunately, because the event library was not imported using a  
vendor import 2 years ago, I had to do some things that made SVN a  
little unhappy.  The good news is that the next libevent update will  
not require these changes.  The bad news is that you have to follow  
some special instructions to properly update your SVN checkout.  In  
particular, you need to completely delete the opal/event directory,  
then run svn up.  If you svn up'ed before reading this e-mail, just  
rm -rf opal/event and svn up again.  All should be good.


After updating, you *MUST* re-run autogen.sh and configure (sorry!).

Because I was already making everyone re-run autogen.sh, I also  
committed some code to opal that made the code to print a backtrace  
from some #ifs in opal/util/stacktrace.c to a full-blown framework.   
Terry added support for Solaris the other day, and I figured out how  
to support OS X.  This made three possible setups, and OS X required  
a bunch of files, so it seemed that a framework was needed.


Two notes about the OS X stacktrace support.  First, it doesn't print  
a useful stack for 64 bit binaries yet, but I'm working on it.   
Second, there are some warnings about C++ comments in the code.   
PLEASE DO NOT FIX THESE.  I will be fixing them shortly, but need to  
find a way that doesn't make future updates impossibly difficult.


Brian

Re: [OMPI devel] Problem compiling openmpi 1.1

2006-07-11 Thread Brian Barrett

On Mon, 2006-07-10 at 17:44 +0200, Pierre wrote:

> rtsig.c:365: error: `EV_SIGNAL' undeclared (first use in this function)
> rtsig.c:392: error: dereferencing pointer to incomplete type
> rtsig.c:392: error: `EV_PERSIST' undeclared (first use in this function)
> make[3]: *** [rtsig.lo] Error 1
> make[3]: Leaving directory `/tmp/openmpi-1.1/opal/event'
> make[2]: *** [all-recursive] Error 1
> make[2]: Leaving directory `/tmp/openmpi-1.1/opal/event'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory `/tmp/openmpi-1.1/opal'
> make: *** [all-recursive] Error 1

That's a bit unexpected.  Can you please send us the information
requested in our "Getting Help" section of the web page:

   http://www.open-mpi.org/community/help/

It will help immensely in determining what went wrong.

Thanks,

Brian

1 2 >

1 - 100 of 147 matches

Mail list logo