Re: [OMPI devel] 1.3.1rc3 was borked; 1.3.1rc4 is out

2009-03-03 Thread Brian W. Barrett

On Tue, 3 Mar 2009, Brian W. Barrett wrote:


On Tue, 3 Mar 2009, Jeff Squyres wrote:

1.3.1rc3 had a race condition in the ORTE shutdown sequence.  The only 
difference between rc3 and rc4 was a fix for that race condition.  Please 
test ASAP:


  http://www.open-mpi.org/software/ompi/v1.3/


I'm sorry, I've failed to test rc1 & rc2 on Catamount.  I'm getting a compile 
failure in the ORTE code.  I'll do a bit more testing and send Ralph an 
e-mail this afternoon.



Attached is a patch against v1.3 branch that makes it work on Red Storm. 
I'm not sure it's right, so I'm just e-mailing it rather than committing.. 
Sorry Ralph, but can you take a look? :(


BrianIndex: orte/mca/odls/base/base.h
===
--- orte/mca/odls/base/base.h	(revision 20705)
+++ orte/mca/odls/base/base.h	(working copy)
@@ -29,9 +29,10 @@
 #include "opal/mca/mca.h"
 #include "opal/class/opal_list.h"
 
+#if !ORTE_DISABLE_FULL_SUPPORT
 #include "orte/mca/odls/odls.h"
+#endif
 
-
 BEGIN_C_DECLS
 
 /**
Index: orte/mca/grpcomm/grpcomm.h
===
--- orte/mca/grpcomm/grpcomm.h	(revision 20705)
+++ orte/mca/grpcomm/grpcomm.h	(working copy)
@@ -44,7 +44,6 @@
 
 #include "orte/mca/rmaps/rmaps_types.h"
 #include "orte/mca/rml/rml_types.h"
-#include "orte/mca/odls/odls_types.h"
 
 #include "orte/mca/grpcomm/grpcomm_types.h"
 
Index: orte/runtime/orte_globals.c
===
--- orte/runtime/orte_globals.c	(revision 20705)
+++ orte/runtime/orte_globals.c	(working copy)
@@ -40,11 +40,11 @@
 #include "orte/runtime/runtime_internals.h"
 #include "orte/runtime/orte_globals.h"
 
+#if !ORTE_DISABLE_FULL_SUPPORT
+
 /* need the data type support functions here */
 #include "orte/runtime/data_type_support/orte_dt_support.h"
 
-#if !ORTE_DISABLE_FULL_SUPPORT
-
 /* globals used by RTE */
 bool orte_timing;
 bool orte_debug_daemons_file_flag = false;
@@ -135,7 +135,8 @@
 opal_output_set_verbosity(orte_debug_output, 1);
 }
 }
-
+
+#if !ORTE_DISABLE_FULL_SUPPORT
 /** register the base system types with the DSS */
 tmp = ORTE_STD_CNTR;
 if (ORTE_SUCCESS != (rc = opal_dss.register_type(orte_dt_pack_std_cntr,
@@ -192,7 +193,6 @@
 return rc;
 }
 
-#if !ORTE_DISABLE_FULL_SUPPORT
 /* get a clean output channel too */
 {
 opal_output_stream_t lds;
Index: orte/runtime/data_type_support/orte_dt_support.h
===
--- orte/runtime/data_type_support/orte_dt_support.h	(revision 20705)
+++ orte/runtime/data_type_support/orte_dt_support.h	(working copy)
@@ -30,7 +30,9 @@
 
 #include "opal/dss/dss_types.h"
 #include "orte/mca/grpcomm/grpcomm_types.h"
+#if !ORTE_DISABLE_FULL_SUPPORT
 #include "orte/mca/odls/odls_types.h"
+#endif
 #include "orte/mca/plm/plm_types.h"
 #include "orte/mca/rmaps/rmaps_types.h"
 #include "orte/mca/rml/rml_types.h"


Re: [OMPI devel] calling sendi earlier in the PML

2009-03-03 Thread Eugene Loh

Jeff Squyres wrote:

How about an MCA parameter to switch between this mechanism (early  
sendi) and the original behavior (late sendi)?


This is the usual way that we resolve "I want to do X / I want to do  
Y" disputes.  :-)


I see the smiley face, but am unsure how much of the message to apply it to.

Assuming the MCA proposal is genuine (easy for implementor consensus? or 
easy for the user?  gee, lemme see, that's a tough choice...), I'll note 
that it's easy enough to do.  I've implemented the early-sendi-check by 
adding a function to ob1 to do the right thing.  So, I can call that 
function as soon as one enters the PML send.  The "late sendi" call is 
at a different call site.  So, one call site for "early sendi" and 
another for "late sendi".  Easy to turn on/off.  We're not talking about 
many small codes changes pervading the source base.


Re: [OMPI devel] calling sendi earlier in the PML

2009-03-03 Thread Eugene Loh

Brian W. Barrett wrote:


On Tue, 3 Mar 2009, Eugene Loh wrote:

First, this behavior is basically what I was proposing and what 
George didn't feel comfortable with.  It is arguably no compromise at 
all.  (Uggh, why must I be so honest?)  For eager messages, it favors 
BTLs with sendi functions, which could lead to those BTLs becoming 
overloaded.  I think favoring BTLs with sendi for short messages is 
good.  George thinks that load balancing BTLs is good.


I have two thoughts on the issue:


I'll see your two thoughts and raise you...  Oh wait.  Maybe I'll win 
more consensus if I were less shrill/insistent!  :^)


1) How often are a btl with a sendi and a btl without a sendi going to 
be used together?  Keep in mind, this is two BTLs with the same 
priority and connectivity to the same peer.  My thought is that given 
the very few heterogeneous networked machines (yes, I know UTK has 
one, but we're talking percentages), optimizing for that case at the 
cost of the much more common case is a poor choice.


Today, the only sendi code I see is:

*) mx (could potentially coexist with another BTL)
*) sm (was turned off, but I turned it back on... anyhow sm never 
coexists with another BTL)

*) portals (turned off, and presumably unlikely to coexist with another BTL)

2) It seems like a much better idea would be to add sendi calls to all 
btls that are likely to be used at the same priority.  This seems like 
good long-term form anyway, so why not optimize the PML for the long 
term rather than the short term and assume all BTLs will have a sendi 
function?


I wouldn't assume all BTLs will have a sendi function, but only that 
low-latency BTLs would.  But, maybe that's what you meant.


Re: [OMPI devel] calling sendi earlier in the PML

2009-03-03 Thread Brian W. Barrett

On Tue, 3 Mar 2009, Jeff Squyres wrote:


On Mar 3, 2009, at 3:31 PM, Eugene Loh wrote:

First, this behavior is basically what I was proposing and what George 
didn't feel comfortable with.  It is arguably no compromise at all.  (Uggh, 
why must I be so honest?)  For eager messages, it favors BTLs with sendi 
functions, which could lead to those BTLs becoming overloaded.  I think 
favoring BTLs with sendi for short messages is good.  George thinks that 
load balancing BTLs is good.


Second, the implementation can be simpler than you suggest:

*) You don't need a separate list since testing for a sendi-enabled BTL is 
relatively cheap (I think... could verify).
*) You don't need to shuffle the list.  The mechanism used by ob1 just 
resumes the BTL search from the last BTL used.  E.g., check 
https://svn.open-mpi.org/source/xref/ompi_1.3/ompi/mca/pml/ob1/pml_ob1_sendreq.h#mca_pml_ob1_send_request_start 
.  You use mca_bml_base_btl_array_get_next(&btl_eager) to roundrobin over 
BTLs in a totally fair manner (remembering where the last loop left off), 
and using mca_bml_base_btl_array_get_size(&btl_eager) to make sure you 
don't loop endlessly.


Cool / fair enough.

How about an MCA parameter to switch between this mechanism (early sendi) and 
the original behavior (late sendi)?


This is the usual way that we resolve "I want to do X / I want to do Y" 
disputes.  :-)


Of all the options presented, this is the one I dislike most :).

This is *THE* critical path of the OB1 PML.  It's already horribly complex 
and hard to follow (as Eugene is finding out the hard way).  Making it 
more complex as a way to settle this argument is pain and suffering just 
to avoid conflict.


However, one possible option that just occurred to me.  I propose yet 
another option.  If (AND ONLY IF) ob1/r2 detects that there are at least 
two BTLs to the same peer at the same priority and at least one has a 
sendi and at least one does not have a sendi, what about an MCA parameter 
to disable all sendi functions to that peer?


There's only a 1% gain in the FAIR protocol Euegene proposed, so we'd lose 
that 1% in the heterogeneous multi-nic case (the least common case). 
There would be a much bigger gain for the sendi homogeneous multi-nic / 
all single-nic cases (much more common), because the FAST protocol would 
be used.


That way, we get the FAST protocol in all cases for sm, which is what I 
really want ;).


Brian


Re: [OMPI devel] calling sendi earlier in the PML

2009-03-03 Thread Eugene Loh

Terry Dontje wrote:


Eugene Loh wrote:

I'm on the verge of giving up moving the sendi call in the PML.  I 
will try one or two last things, including this e-mail asking for 
feedback.


The idea is that when a BTL goes over a very low-latency interconnect 
(like sm), we really want to shave off whatever we can from the 
software stack.  One way of doing so is to use a "send-immediate" 
function, which a few BTLs (like sm) provide.  The problem is 
avoiding a bunch of overhead introduced by the PML before checking 
for a "sendi()" call.


Currently, the PML does something like this:

   for ( btl = ... ) {
   if ( SUCCESS == btl->sendi() ) return SUCCESS;
   if ( SUCCESS == btl->send() ) return SUCCESS;
   }
   return ERROR;

That is, it roundrobins over all available BTLs, for each one trying 
sendi() and then send().  If ever a sendi or send completes 
successfully, we exit the loop successfully.


The problem is that this loop is buried several functioncalls deep in 
the PML.  Before it reaches this far, the PML has initialized a large 
"send request" data structure while traversing some (to me) 
complicated call graph of functions.  This introduces a lot of 
overhead that mitigates much of the speedup we might hope to see with 
the sendi function.  That overhead is unnecessary for a sendi call, 
but necessary for a send call.  I've tried reorganizing the code to 
defer as much of that work as possible -- performing that overhead 
only if it's need to perform a send call -- but I've gotten 
braincramp every time I've tried this reorganization.


I think these are the options:

Option A) Punt!

Option B) Have someone more familiar with the PML make these changes.

Option C) Have Eugene keep working at this because he'll learn more 
about the PML and it's good for his character.


Option D) Go to a strategy in which all BTLs are tried for sendi 
before any of them is tried for a send.  The code would look like this:


   for ( BTL = ... ) if ( SUCCESS == btl_sendi() ) return SUCCESS;
   for ( BTL = ... ) if ( SUCCESS == btl_send() ) return SUCCESS;
   return ERROR;

The reason this is so much easier to achieve is that we can put that 
first loop way up high in the PML (as soon as a send enters the PML, 
avoiding all that expensive overhead) and leave the second loop 
several layers down, where it is today.  George is against this new 
loop structure because he thinks round robin selection of BTLs is 
most fair and distributes the load over BTLs as evenly as possible.  
(In contrast, the proposed loop would favor BTLs with sendi 
functions.)  It seems to me, however, that favoring BTLs that have 
sendi functions is exactly the right thing to do!  I'm not even 
convinced that the conditions he's worried about are that common:  
multiple eager BTLs to poll, one has a sendi, and that sendi is not 
very good or that BTL is getting overloaded.


I guess I agree with Eugene's points above.  Since we are dealing 
mainly with latency bound messages and not bandwidth spreading the 
messages among btls really shouldn't provide much/any advantage.


I think that's right, but to be fair to George, I think his point is 
that even short messages can congest a BTL.


Maybe there is a range of sizes that could provide more bandwidth with 
striped IB or RNIC  connections.  But with the OpenIB multi-frags is 
there a way to section out that message size such that it wouldn't be 
considered for sendi?


I'm not sure I understand the question.  A message longer than the eager 
size automatically does not qualify for sendi.


Also, the existence of a sendi path has to do with the BTL component, 
not with a particular NIC or something.  Not sure if that's relevent or not.


So lets say we are still inclined to write fastpath messages to BTLs 
evenly.  Maybe one modification to the above is check to see if the 
connection we are writing does only have one BTL and try the btl_sendi 
for that case higher in the stack.  This would help with the SM BTL 
but certainly striped OpenIB connections would not gain.  I don't 
believe other BTLs like TCP would matter either way.


One can special-case sm.  E.g., if there is only one BTL, try sendi 
early.  Or, try sendi (early) only for the next BTL... if none, then 
dive down into the rest of the code.


I'm still not sure about the striped-openib point you're making.  The 
following may or may not make sense depending on how ridiculously off 
base my understanding or nomenclature is.  Let's start with an example 
Jeff brought up recently:


Jeff Squyres wrote:

Example: if I have a dual-port IB HCA, Open MPI will make 2 different  
openib BTL modules.  In this case, the openib BTL will need to know  
exactly which module the PML is trying to sendi on.


So, here there are two modules the PML could send on.  They're both 
openib modules.  So, we either define an openib sendi function (in which 
case short messages will be distributed equally over both connections) 
or we don't (i

Re: [OMPI devel] calling sendi earlier in the PML

2009-03-03 Thread Jeff Squyres

On Mar 3, 2009, at 3:31 PM, Eugene Loh wrote:

First, this behavior is basically what I was proposing and what  
George didn't feel comfortable with.  It is arguably no compromise  
at all.  (Uggh, why must I be so honest?)  For eager messages, it  
favors BTLs with sendi functions, which could lead to those BTLs  
becoming overloaded.  I think favoring BTLs with sendi for short  
messages is good.  George thinks that load balancing BTLs is good.


Second, the implementation can be simpler than you suggest:

*) You don't need a separate list since testing for a sendi-enabled  
BTL is relatively cheap (I think... could verify).
*) You don't need to shuffle the list.  The mechanism used by ob1  
just resumes the BTL search from the last BTL used.  E.g., check https://svn.open-mpi.org/source/xref/ompi_1.3/ompi/mca/pml/ob1/pml_ob1_sendreq.h 
#mca_pml_ob1_send_request_start .  You use  
mca_bml_base_btl_array_get_next(&btl_eager) to roundrobin over BTLs  
in a totally fair manner (remembering where the last loop left off),  
and using mca_bml_base_btl_array_get_size(&btl_eager) to make sure  
you don't loop endlessly.


Cool / fair enough.

How about an MCA parameter to switch between this mechanism (early  
sendi) and the original behavior (late sendi)?


This is the usual way that we resolve "I want to do X / I want to do  
Y" disputes.  :-)


I've been toying with two implementations.  The one I described in  
San Jose was called FAST, so let's still call it that.  It tests for  
sendi early in the PML, calling traditional send only if no sendi is  
found for any BTL.  To preserve the BTL ordering George favors  
(always roundrobinning over BTLs, looking only secondarily for  
sendi), I tried another implementation I'll call FAIR.  It attempts  
to initialize the send request only very minimally.  One still makes  
a number of function calls and goes "deep" into the PML, but defers  
as much send-request initialization as late as possible.  I can't  
promise that both implementations FAST and FAIR are equally rock  
solid or optimized, but this is where I am so far.  The differences  
are:


*) FAST involves far fewer code changes.
*) FAST produces faster latencies.  E.g., for 0-byte OSU latencies,  
FAST is 8-10% better than OMPI while FAIR is only 1-3% (or 2-3%...  
something like that).  (The improvements I showed in San Jose for  
FAST were more dramatic than 8-10%, but that's because there were  
optimizations on the receive side and in the data convertors as  
well.  For the e-mail you're reading right now, I'm talking just  
about send-request optimizations.)
*) Theoretically, FAIR is broader reaching.  E.g., if persistent  
sends can always use a sendi path, they will all potentially  
benefit.  (This is theory.  I haven't actually observed such a speed- 
up yet and it might just end up getting lost in the noise.)

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] calling sendi earlier in the PML

2009-03-03 Thread Brian W. Barrett

On Tue, 3 Mar 2009, Eugene Loh wrote:

First, this behavior is basically what I was proposing and what George didn't 
feel comfortable with.  It is arguably no compromise at all.  (Uggh, why must 
I be so honest?)  For eager messages, it favors BTLs with sendi functions, 
which could lead to those BTLs becoming overloaded.  I think favoring BTLs 
with sendi for short messages is good.  George thinks that load balancing 
BTLs is good.


I have two thoughts on the issue:

1) How often are a btl with a sendi and a btl without a sendi going to be 
used together?  Keep in mind, this is two BTLs with the same priority and 
connectivity to the same peer.  My thought is that given the very few 
heterogeneous networked machines (yes, I know UTK has one, but we're 
talking percentages), optimizing for that case at the cost of the much 
more common case is a poor choice.


2) It seems like a much better idea would be to add sendi calls to all 
btls that are likely to be used at the same priority.  This seems like 
good long-term form anyway, so why not optimize the PML for the long term 
rather than the short term and assume all BTLs will have a sendi function?


Brian


Re: [OMPI devel] calling sendi earlier in the PML

2009-03-03 Thread Eugene Loh

Jeff Squyres wrote:


How about a compromise...

Keep a separate list somewhere of the sendi-enabled BTLs (this avoids  
looping over all the btl's and testing -- you can just loop over the  
btl's that you *know* have a sendi).  Put that at the top of the PML  
and avoid the costly overhead, yadda yadda yadda.


But instead of having a static list of sendi-enabled BTLs, rotate 
them  if there's >1.  For example, say I have 3 sendi-enabled BTL 
modules:  A, B, C.


In the first send, A->sendi() is used and it succeeds, so we shuffle  
the list and return.
In the next send, B->sendi() is used and it succeeds, so we shuffle  
the list and return.
In the next send, C->sendi() is used but it fails, so we shuffle the  
list and fall through to normal ->send() processing.


"shuffle the list" can be as simple as opal_list_remove_first() and  
opal_list_append() -- both of which should be O(1).


This should distribute the load across sendi-enabled BTLs, and if  
those ever get "overloaded" (such that sendi fails), we fall through  
to normal load-balanced PML sending.


First, this behavior is basically what I was proposing and what George 
didn't feel comfortable with.  It is arguably no compromise at all.  
(Uggh, why must I be so honest?)  For eager messages, it favors BTLs 
with sendi functions, which could lead to those BTLs becoming 
overloaded.  I think favoring BTLs with sendi for short messages is 
good.  George thinks that load balancing BTLs is good.


Second, the implementation can be simpler than you suggest:

*) You don't need a separate list since testing for a sendi-enabled BTL 
is relatively cheap (I think... could verify).
*) You don't need to shuffle the list.  The mechanism used by ob1 just 
resumes the BTL search from the last BTL used.  E.g., check 
https://svn.open-mpi.org/source/xref/ompi_1.3/ompi/mca/pml/ob1/pml_ob1_sendreq.h#mca_pml_ob1_send_request_start 
.  You use mca_bml_base_btl_array_get_next(&btl_eager) to roundrobin 
over BTLs in a totally fair manner (remembering where the last loop left 
off), and using mca_bml_base_btl_array_get_size(&btl_eager) to make sure 
you don't loop endlessly.


I've been toying with two implementations.  The one I described in San 
Jose was called FAST, so let's still call it that.  It tests for sendi 
early in the PML, calling traditional send only if no sendi is found for 
any BTL.  To preserve the BTL ordering George favors (always 
roundrobinning over BTLs, looking only secondarily for sendi), I tried 
another implementation I'll call FAIR.  It attempts to initialize the 
send request only very minimally.  One still makes a number of function 
calls and goes "deep" into the PML, but defers as much send-request 
initialization as late as possible.  I can't promise that both 
implementations FAST and FAIR are equally rock solid or optimized, but 
this is where I am so far.  The differences are:


*) FAST involves far fewer code changes.
*) FAST produces faster latencies.  E.g., for 0-byte OSU latencies, FAST 
is 8-10% better than OMPI while FAIR is only 1-3% (or 2-3%... something 
like that).  (The improvements I showed in San Jose for FAST were more 
dramatic than 8-10%, but that's because there were optimizations on the 
receive side and in the data convertors as well.  For the e-mail you're 
reading right now, I'm talking just about send-request optimizations.)
*) Theoretically, FAIR is broader reaching.  E.g., if persistent sends 
can always use a sendi path, they will all potentially benefit.  (This 
is theory.  I haven't actually observed such a speed-up yet and it might 
just end up getting lost in the noise.)


Re: [OMPI devel] 1.3.1rc3 was borked; 1.3.1rc4 is out

2009-03-03 Thread Brian W. Barrett

On Tue, 3 Mar 2009, Jeff Squyres wrote:

1.3.1rc3 had a race condition in the ORTE shutdown sequence.  The only 
difference between rc3 and rc4 was a fix for that race condition.  Please 
test ASAP:


  http://www.open-mpi.org/software/ompi/v1.3/


I'm sorry, I've failed to test rc1 & rc2 on Catamount.  I'm getting a 
compile failure in the ORTE code.  I'll do a bit more testing and send 
Ralph an e-mail this afternoon.


Brian


[OMPI devel] 1.3.1rc3 was borked; 1.3.1rc4 is out

2009-03-03 Thread Jeff Squyres
1.3.1rc3 had a race condition in the ORTE shutdown sequence.  The only  
difference between rc3 and rc4 was a fix for that race condition.   
Please test ASAP:


http://www.open-mpi.org/software/ompi/v1.3/

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] How to configure Open MPI on multi-port IB HCA cluster

2009-03-03 Thread Jeff Squyres

On Mar 3, 2009, at 2:48 AM, Jie Cai wrote:

We have installed a dual-port ConnectX HCA cluster with PIC-E 2.0  
slots,

and each port represented as individual interface.

How to configure the Open MPI and hardware system
to correctly use the both ports for communication?


Open MPI should just see and use both ports automatically (assuming  
that they are ACTIVE).



Are we expecting to see wider bandwidth with Open MPI?


It depends on both your server and network setup.

The transfer rate across PCIe 2.0 cannot send enough data to keep 2  
DDR HCA ports full.  So it is unlikely that you will see much of a  
bandwidth improvement.


Assuming that your 2 HCA ports are plugged into either 2 separate IB  
subnets or different locations in the same subnet, you'll get a wider  
dispersion of fragments across your network, potentially avoiding some  
network congestion.  But this behavior is very much dependent on what  
else is occurring simultaneously elsewhere in your IB subnet, which is  
likely to be application- / cluster-specific behavior.



In order to see the improvement of bandwidth, do I need to specific
configure Open MPI and the hardware?



To really get 2 x DDR performance, you likely need two separate busses  
with two separate HCAs.


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] calling sendi earlier in the PML

2009-03-03 Thread Jeff Squyres

How about a compromise...

Keep a separate list somewhere of the sendi-enabled BTLs (this avoids  
looping over all the btl's and testing -- you can just loop over the  
btl's that you *know* have a sendi).  Put that at the top of the PML  
and avoid the costly overhead, yadda yadda yadda.


But instead of having a static list of sendi-enabled BTLs, rotate them  
if there's >1.  For example, say I have 3 sendi-enabled BTL modules:  
A, B, C.


In the first send, A->sendi() is used and it succeeds, so we shuffle  
the list and return.
In the next send, B->sendi() is used and it succeeds, so we shuffle  
the list and return.
In the next send, C->sendi() is used but it fails, so we shuffle the  
list and fall through to normal ->send() processing.


"shuffle the list" can be as simple as opal_list_remove_first() and  
opal_list_append() -- both of which should be O(1).


This should distribute the load across sendi-enabled BTLs, and if  
those ever get "overloaded" (such that sendi fails), we fall through  
to normal load-balanced PML sending.


Howzat?



On Mar 2, 2009, at 1:37 PM, Eugene Loh wrote:

I'm on the verge of giving up moving the sendi call in the PML.  I  
will try one or two last things, including this e-mail asking for  
feedback.


The idea is that when a BTL goes over a very low-latency  
interconnect (like sm), we really want to shave off whatever we can  
from the software stack.  One way of doing so is to use a "send- 
immediate" function, which a few BTLs (like sm) provide.  The  
problem is avoiding a bunch of overhead introduced by the PML before  
checking for a "sendi()" call.


Currently, the PML does something like this:

  for ( btl = ... ) {
  if ( SUCCESS == btl->sendi() ) return SUCCESS;
  if ( SUCCESS == btl->send() ) return SUCCESS;
  }
  return ERROR;

That is, it roundrobins over all available BTLs, for each one trying  
sendi() and then send().  If ever a sendi or send completes  
successfully, we exit the loop successfully.


The problem is that this loop is buried several functioncalls deep  
in the PML.  Before it reaches this far, the PML has initialized a  
large "send request" data structure while traversing some (to me)  
complicated call graph of functions.  This introduces a lot of  
overhead that mitigates much of the speedup we might hope to see  
with the sendi function.  That overhead is unnecessary for a sendi  
call, but necessary for a send call.  I've tried reorganizing the  
code to defer as much of that work as possible -- performing that  
overhead only if it's need to perform a send call -- but I've gotten  
braincramp every time I've tried this reorganization.


I think these are the options:

Option A) Punt!

Option B) Have someone more familiar with the PML make these changes.

Option C) Have Eugene keep working at this because he'll learn more  
about the PML and it's good for his character.


Option D) Go to a strategy in which all BTLs are tried for sendi  
before any of them is tried for a send.  The code would look like  
this:


  for ( BTL = ... ) if ( SUCCESS == btl_sendi() ) return SUCCESS;
  for ( BTL = ... ) if ( SUCCESS == btl_send() ) return SUCCESS;
  return ERROR;

The reason this is so much easier to achieve is that we can put that  
first loop way up high in the PML (as soon as a send enters the PML,  
avoiding all that expensive overhead) and leave the second loop  
several layers down, where it is today.  George is against this new  
loop structure because he thinks round robin selection of BTLs is  
most fair and distributes the load over BTLs as evenly as possible.   
(In contrast, the proposed loop would favor BTLs with sendi  
functions.)  It seems to me, however, that favoring BTLs that have  
sendi functions is exactly the right thing to do!  I'm not even  
convinced that the conditions he's worried about are that common:   
multiple eager BTLs to poll, one has a sendi, and that sendi is not  
very good or that BTL is getting overloaded.


Anyhow, I like Option D, but George does not.

Option E) Go to a strategy in which the next BTL is tested for a  
sendi function.  If there is one, use it.  If not, just continue  
with the usual heavyweight PML procedure.  This feels a little  
hackish to me, but it'll mean that most of the time that sendi can  
be called, the heavyweight PML overhead will be avoided, while at  
the same time "fair" roundrobin polling over the BTLs is maintained.


I'll proceed with Option C for the time being.  If I don't announce  
success or surrender in the next few days, please write to me at the  
insane asylum.

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



[OMPI devel] Writeup of new release methodology

2009-03-03 Thread Jeff Squyres

Sorry I missed the call this morning.

I wrote up the new release methodology, including the bootstrapping-to- 
the-v1.3-series stuff on this wiki page:


https://svn.open-mpi.org/trac/ompi/wiki/ReleaseMethodology

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] PML/ob1 problem

2009-03-03 Thread Lenny Verkhovsky
sorry, missed this commit.
Thanks, George,

On 3/3/09, George Bosilca  wrote:
> Which solution seems to be working ?
>
>  This bug was fixed a while ago in the trunk
> (https://svn.open-mpi.org/trac/ompi/changeset/20591) and in
> the 1.3 branch. It even made it in the 1.3.2.
>
>   george.
>
>
>  On Mar 3, 2009, at 05:01 , Lenny Verkhovsky wrote:
>
>
> > Seems to be working.
> > George, can you commit it, pls.
> >
> > Thanks
> > Lenny.
> >
> >
> > On Thu, Feb 19, 2009 at 3:05 PM, Jeff Squyres  wrote:
> >
> > > George -- any thoughts on this one?
> > >
> > > On Feb 11, 2009, at 1:01 AM, Mike Dubman wrote:
> > >
> > >
> > > >
> > > > Hello guys,
> > > >
> > > > I'm running some experimental tcp btl which implements rdma GET method
> and
> > > > advertises it in its flags of the btl API.
> > > > The btl`s send() method returns rc=1 to select fast path for PML.
> (this
> > > > optimization was added in revision 18551 in v1.3)
> > > >
> > > > It seems that in PML/ob1,
> mca_pml_ob1_send_request_start_rdma() function
> > > > does not treat right such combination (btl GET + fastpath rc>0) and
> going
> > > > into deadlock, i.e.
> > > >
> > > > +++ pml_ob1_sendreq.c +670
> > > > At this line, sendreq->req_state is 0
> > > >
> > > > +++ pml_ob1_sendreq.c +800
> > > > At this line, if btl has GET method and btl`s send() returned fastpath
> > > > hint - the call to
> mca_pml_ob1_rndv_completion_request() will decrement
> > > > sendreq->req_state by one, leaving it to -1.
> > > >
> > > > This value of -1 will keep
> send_request_pml_complete_check() from
> > > > completing request on PML level.
> > > >
> > > > The PML logic (in
> mca_pml_ob1_send_request_start_rdma) for PUT operation
> > > > initializes req_state to "2" in pml_ob1_sendreq.c +791, but leaves
> req_state
> > > > to 0 for GET operations.
> > > >
> > > > Please suggest.
> > > >
> > > > Thanks
> > > >
> > > > Mike.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ___
> > > > devel mailing list
> > > > de...@open-mpi.org
> > > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > > >
> > >
> > >
> > > --
> > > Jeff Squyres
> > > Cisco Systems
> > >
> > > ___
> > > devel mailing list
> > > de...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > >
> > >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
>
>  ___
>  devel mailing list
>  de...@open-mpi.org
>  http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] [PATCH 3/4] opal-ps: Use the return value from asprintf as the header length.

2009-03-03 Thread Jeff Squyres

Done.

On Feb 19, 2009, at 7:29 AM, Bert Wesarg wrote:


From: Bert Wesarg 

asprintf returns the length of the written header, use this as the  
length.


Regards,
Bert Wesarg

---

orte/tools/orte-ps/orte-ps.c |3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --quilt old/orte/tools/orte-ps/orte-ps.c new/orte/tools/orte-ps/ 
orte-ps.c

--- old/orte/tools/orte-ps/orte-ps.c
+++ new/orte/tools/orte-ps/orte-ps.c
@@ -392,8 +392,7 @@ static int pretty_print(orte_ps_mpirun_i
/*
 * Print header
 */
-asprintf(&header, "Information from mpirun %s",  
ORTE_JOBID_PRINT(hnpinfo->hnp->name.jobid));

-len_hdr = strlen(header);
+len_hdr = asprintf(&header, "Information from mpirun %s",  
ORTE_JOBID_PRINT(hnpinfo->hnp->name.jobid));


printf("\n\n%s\n", header);
free(header);
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] [PATCH 1/4] opal-ps: fix memory leak

2009-03-03 Thread Jeff Squyres

I committed the rest of these in 20697.

Thanks!

On Feb 19, 2009, at 7:29 AM, Bert Wesarg wrote:


From: Bert Wesarg 


---

orte/tools/orte-ps/orte-ps.c |4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --quilt old/orte/tools/orte-ps/orte-ps.c new/orte/tools/orte-ps/ 
orte-ps.c

--- old/orte/tools/orte-ps/orte-ps.c
+++ new/orte/tools/orte-ps/orte-ps.c
@@ -392,10 +392,10 @@ static int pretty_print(orte_ps_mpirun_i
/*
 * Print header
 */
-asprintf(&header, "\n\nInformation from mpirun %s",  
ORTE_JOBID_PRINT(hnpinfo->hnp->name.jobid));
+asprintf(&header, "Information from mpirun %s",  
ORTE_JOBID_PRINT(hnpinfo->hnp->name.jobid));

len_hdr = strlen(header);

-printf("%s\n", header);
+printf("\n\n%s\n", header);
free(header);
for (i=0; i < len_hdr; i++) {
printf("%c", '-');
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] PML/ob1 problem

2009-03-03 Thread George Bosilca

Which solution seems to be working ?

This bug was fixed a while ago in the trunk (https://svn.open-mpi.org/trac/ompi/changeset/20591 
) and in the 1.3 branch. It even made it in the 1.3.2.


  george.

On Mar 3, 2009, at 05:01 , Lenny Verkhovsky wrote:


Seems to be working.
George, can you commit it, pls.

Thanks
Lenny.


On Thu, Feb 19, 2009 at 3:05 PM, Jeff Squyres   
wrote:

George -- any thoughts on this one?

On Feb 11, 2009, at 1:01 AM, Mike Dubman wrote:



Hello guys,

I'm running some experimental tcp btl which implements rdma GET  
method and

advertises it in its flags of the btl API.
The btl`s send() method returns rc=1 to select fast path for PML.  
(this

optimization was added in revision 18551 in v1.3)

It seems that in PML/ob1, mca_pml_ob1_send_request_start_rdma()  
function
does not treat right such combination (btl GET + fastpath rc>0)  
and going

into deadlock, i.e.

+++ pml_ob1_sendreq.c +670
At this line, sendreq->req_state is 0

+++ pml_ob1_sendreq.c +800
At this line, if btl has GET method and btl`s send() returned  
fastpath
hint - the call to mca_pml_ob1_rndv_completion_request() will  
decrement

sendreq->req_state by one, leaving it to -1.

This value of -1 will keep send_request_pml_complete_check() from
completing request on PML level.

The PML logic (in mca_pml_ob1_send_request_start_rdma) for PUT  
operation
initializes req_state to "2" in pml_ob1_sendreq.c +791, but leaves  
req_state

to 0 for GET operations.

Please suggest.

Thanks

Mike.







___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [PATCH 1/4] opal-ps: fix memory leak

2009-03-03 Thread Jeff Squyres

Oops; this got missed.

Thanks for the reminder; applied in r20694.


On Mar 3, 2009, at 2:26 AM, Bert Wesarg wrote:


2009/2/19 Bert Wesarg :

From: Bert Wesarg 

Free the memory alocated by the call to asprintf.

Regards,
Bert Wesarg

---

 orte/tools/orte-ps/orte-ps.c |1 +
 1 file changed, 1 insertion(+)

diff --quilt old/orte/tools/orte-ps/orte-ps.c new/orte/tools/orte- 
ps/orte-ps.c

--- old/orte/tools/orte-ps/orte-ps.c
+++ new/orte/tools/orte-ps/orte-ps.c
@@ -396,6 +396,7 @@ static int pretty_print(orte_ps_mpirun_i
len_hdr = strlen(header);

printf("%s\n", header);
+free(header);
for (i=0; i < len_hdr; i++) {
printf("%c", '-');
}



Ping.

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] help-orte-top.txt: add missing [

2009-03-03 Thread Jeff Squyres

Done; thanks.

On Mar 3, 2009, at 2:17 AM, Bert Wesarg wrote:


Regards,
Bert

Index: orte/tools/orte-top/help-orte-top.txt
===
--- orte/tools/orte-top/help-orte-top.txt   (revision 20692)
+++ orte/tools/orte-top/help-orte-top.txt   (working copy)
@@ -46,7 +46,7 @@
keyword "file". Please use the --help option for more information on
the correct format for this command line option.
#
-orte-top:hnp-filename-access]
+[orte-top:hnp-filename-access]
We are unable to access the filename where contact info for the
mpirun to be contacted was to be found. The filename we were given  
was:


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



[OMPI devel] 1.3.1rc3 escapes

2009-03-03 Thread Jeff Squyres

The only difference between 1.3.1rc2 and rc3 is George's datatype fix:

https://svn.open-mpi.org/trac/ompi/changeset/20684

Please test it ASAP:

http://www.open-mpi.org/software/ompi/v1.3/

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] calling sendi earlier in the PML

2009-03-03 Thread Terry Dontje

Eugene Loh wrote:
I'm on the verge of giving up moving the sendi call in the PML.  I 
will try one or two last things, including this e-mail asking for 
feedback.


The idea is that when a BTL goes over a very low-latency interconnect 
(like sm), we really want to shave off whatever we can from the 
software stack.  One way of doing so is to use a "send-immediate" 
function, which a few BTLs (like sm) provide.  The problem is avoiding 
a bunch of overhead introduced by the PML before checking for a 
"sendi()" call.


Currently, the PML does something like this:

   for ( btl = ... ) {
   if ( SUCCESS == btl->sendi() ) return SUCCESS;
   if ( SUCCESS == btl->send() ) return SUCCESS;
   }
   return ERROR;

That is, it roundrobins over all available BTLs, for each one trying 
sendi() and then send().  If ever a sendi or send completes 
successfully, we exit the loop successfully.


The problem is that this loop is buried several functioncalls deep in 
the PML.  Before it reaches this far, the PML has initialized a large 
"send request" data structure while traversing some (to me) 
complicated call graph of functions.  This introduces a lot of 
overhead that mitigates much of the speedup we might hope to see with 
the sendi function.  That overhead is unnecessary for a sendi call, 
but necessary for a send call.  I've tried reorganizing the code to 
defer as much of that work as possible -- performing that overhead 
only if it's need to perform a send call -- but I've gotten braincramp 
every time I've tried this reorganization.


I think these are the options:

Option A) Punt!

Option B) Have someone more familiar with the PML make these changes.

Option C) Have Eugene keep working at this because he'll learn more 
about the PML and it's good for his character.


Option D) Go to a strategy in which all BTLs are tried for sendi 
before any of them is tried for a send.  The code would look like this:


   for ( BTL = ... ) if ( SUCCESS == btl_sendi() ) return SUCCESS;
   for ( BTL = ... ) if ( SUCCESS == btl_send() ) return SUCCESS;
   return ERROR;

The reason this is so much easier to achieve is that we can put that 
first loop way up high in the PML (as soon as a send enters the PML, 
avoiding all that expensive overhead) and leave the second loop 
several layers down, where it is today.  George is against this new 
loop structure because he thinks round robin selection of BTLs is most 
fair and distributes the load over BTLs as evenly as possible.  (In 
contrast, the proposed loop would favor BTLs with sendi functions.)  
It seems to me, however, that favoring BTLs that have sendi functions 
is exactly the right thing to do!  I'm not even convinced that the 
conditions he's worried about are that common:  multiple eager BTLs to 
poll, one has a sendi, and that sendi is not very good or that BTL is 
getting overloaded.


I guess I agree with Eugene's points above.  Since we are dealing mainly 
with latency bound messages and not bandwidth spreading the messages 
among btls really shouldn't provide much/any advantage.   Maybe there is 
a range of sizes that could provide more bandwidth with striped IB or 
RNIC  connections.  But with the OpenIB multi-frags is there a way to 
section out that message size such that it wouldn't be considered for sendi?


So lets say we are still inclined to write fastpath messages to BTLs 
evenly.  Maybe one modification to the above is check to see if the 
connection we are writing does only have one BTL and try the btl_sendi 
for that case higher in the stack.  This would help with the SM BTL but 
certainly striped OpenIB connections would not gain.  I don't believe 
other BTLs like TCP would matter either way.


--td

Anyhow, I like Option D, but George does not.

Option E) Go to a strategy in which the next BTL is tested for a sendi 
function.  If there is one, use it.  If not, just continue with the 
usual heavyweight PML procedure.  This feels a little hackish to me, 
but it'll mean that most of the time that sendi can be called, the 
heavyweight PML overhead will be avoided, while at the same time 
"fair" roundrobin polling over the BTLs is maintained.


I'll proceed with Option C for the time being.  If I don't announce 
success or surrender in the next few days, please write to me at the 
insane asylum.

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] PML/ob1 problem

2009-03-03 Thread Lenny Verkhovsky
Seems to be working.
George, can you commit it, pls.

Thanks
Lenny.


On Thu, Feb 19, 2009 at 3:05 PM, Jeff Squyres  wrote:
> George -- any thoughts on this one?
>
> On Feb 11, 2009, at 1:01 AM, Mike Dubman wrote:
>
>>
>> Hello guys,
>>
>> I'm running some experimental tcp btl which implements rdma GET method and
>> advertises it in its flags of the btl API.
>> The btl`s send() method returns rc=1 to select fast path for PML. (this
>> optimization was added in revision 18551 in v1.3)
>>
>> It seems that in PML/ob1, mca_pml_ob1_send_request_start_rdma() function
>> does not treat right such combination (btl GET + fastpath rc>0) and going
>> into deadlock, i.e.
>>
>> +++ pml_ob1_sendreq.c +670
>> At this line, sendreq->req_state is 0
>>
>> +++ pml_ob1_sendreq.c +800
>> At this line, if btl has GET method and btl`s send() returned fastpath
>> hint - the call to mca_pml_ob1_rndv_completion_request() will decrement
>> sendreq->req_state by one, leaving it to -1.
>>
>> This value of -1 will keep send_request_pml_complete_check() from
>> completing request on PML level.
>>
>> The PML logic (in mca_pml_ob1_send_request_start_rdma) for PUT operation
>> initializes req_state to "2" in pml_ob1_sendreq.c +791, but leaves req_state
>> to 0 for GET operations.
>>
>> Please suggest.
>>
>> Thanks
>>
>> Mike.
>>
>>
>>
>>
>>
>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


[OMPI devel] How to configure Open MPI on multi-port IB HCA cluster

2009-03-03 Thread Jie Cai

We have installed a dual-port ConnectX HCA cluster with PIC-E 2.0 slots,
and each port represented as individual interface.

How to configure the Open MPI and hardware system
to correctly use the both ports for communication?

Are we expecting to see wider bandwidth with Open MPI?
In order to see the improvement of bandwidth, do I need to specific
configure
Open MPI and the hardware?

Thanks

--
Mr. Jie Cai





Re: [OMPI devel] [PATCH 1/4] opal-ps: fix memory leak

2009-03-03 Thread Bert Wesarg
2009/2/19 Bert Wesarg :
> From: Bert Wesarg 
>
> Free the memory alocated by the call to asprintf.
>
> Regards,
> Bert Wesarg
>
> ---
>
>  orte/tools/orte-ps/orte-ps.c |    1 +
>  1 file changed, 1 insertion(+)
>
> diff --quilt old/orte/tools/orte-ps/orte-ps.c new/orte/tools/orte-ps/orte-ps.c
> --- old/orte/tools/orte-ps/orte-ps.c
> +++ new/orte/tools/orte-ps/orte-ps.c
> @@ -396,6 +396,7 @@ static int pretty_print(orte_ps_mpirun_i
>     len_hdr = strlen(header);
>
>     printf("%s\n", header);
> +    free(header);
>     for (i=0; i < len_hdr; i++) {
>         printf("%c", '-');
>     }
>

Ping.



[OMPI devel] help-orte-top.txt: add missing [

2009-03-03 Thread Bert Wesarg
Regards,
Bert

Index: orte/tools/orte-top/help-orte-top.txt
===
--- orte/tools/orte-top/help-orte-top.txt   (revision 20692)
+++ orte/tools/orte-top/help-orte-top.txt   (working copy)
@@ -46,7 +46,7 @@
 keyword "file". Please use the --help option for more information on
 the correct format for this command line option.
 #
-orte-top:hnp-filename-access]
+[orte-top:hnp-filename-access]
 We are unable to access the filename where contact info for the
 mpirun to be contacted was to be found. The filename we were given was:



Re: [OMPI devel] ompi v1.3 compilation problem on ia64/gcc/rhel4.7

2009-03-03 Thread Mike Dubman
thanks.we will test it and update you promptly

On Mon, Mar 2, 2009 at 10:28 PM, Jeff Squyres  wrote:

> Disregard -- it looks like the VT guys have fixed this issue.
>
> Can you test 1.3.1rc2 or later?
>
>
>
> On Feb 24, 2009, at 2:02 AM, Mike Dubman wrote:
>
>  I searched for similar problems reported to the list and have not found
>> any. (only related to icc compiler found, which is unrelevant)
>> What discussed problems you referencing to?
>>
>> regards
>>
>> Mike
>>
>>
>> On Thu, Feb 19, 2009 at 3:04 PM, Jeff Squyres  wrote:
>> Could this pertain to the other itanium compilation problems that were
>> discussed (and not yet resolved) earlier?
>>
>>
>>
>> On Feb 19, 2009, at 3:52 AM, Mike Dubman wrote:
>>
>>
>> Hello guys,
>>
>> We have compilation problem of ompi v1.3 on Itanium ia64 + gcc + rhel 4.7.
>> It seems that vt_pform_linux.c:46 includes asm/intrinsics.h which is
>> unavailable on rhel47/ia64 in /usr/include/asm but is a part of
>> kernel-headers rpm
>> (in /usr/src/kernels/2.6.9-78.EL-ia64/include/asm-ia64/)
>>
>>
>> We compile ompi v1.3 from srpm with a command:
>>
>> configure_options="--define 'configure_options
>> --enable-orterun-prefix-by-default --with-openib
>> --enable-mpirun-prefix-by-default'"
>> rpmbuild_options="--define 'install_in_opt 1' --define
>> 'use_default_rpm_opt_flags 0' --define 'ofed 1' --define 'mflags -j4'
>> --define '_vendor Voltaire' --define 'packager Voltaire'"
>> rpmbuild --rebuild $configure_options $rpmbuild_options
>> /path/to/openmpi_v1.3_src.rpm
>>
>> and getting the following error:
>>
>> tlib/otf/otflib -D_GNU_SOURCE -DBINDIR=\"/opt/openmpi/1.3/bin\"
>> -DDATADIR=\"
>> /opt/openmpi/1.3/share\" -DRFG -DVT_BFD -DVT_MEMHOOK -DVT_IOWRAP   -MT
>> vt_pform_
>> linux.o -MD -MP -MF .deps/vt_pform_linux.Tpo -c -o vt_pform_linux.o
>> vt_pform_lin ux.c
>> vt_pform_linux.c:46:31: asm/intrinsics.h: No such file or directory
>> vt_pform_linux.c: In function `vt_pform_wtime':
>> vt_pform_linux.c:172: error: `_IA64_REG_AR_ITC' undeclared (first use in
>> this fu
>> nction)
>> vt_pform_linux.c:172: error: (Each undeclared identifier is reported only
>> once
>> vt_pform_linux.c:172: error: for each function it appears in.)
>> make[5]: *** [vt_pform_linux.o] Error 1
>> make[5]: *** Waiting for unfinished jobs
>> mv -f .deps/vt_otf_trc.Tpo .deps/vt_otf_trc.Po
>> make[5]: *** Waiting for unfinished jobs
>> mv -f .deps/vt_otf_gen.Tpo .deps/vt_otf_gen.Po mv -f .deps/vt_iowrap.Tpo
>> .deps/vt_iowrap.Po
>> make[5]: Leaving directory
>> `/tmp/buildopenmpi-30371/BUILD/openmpi-1.3/ompi/contr
>> ib/vt/vt/vtlib'
>> make[4]: make[4]: Leaving directory
>> `/tmp/buildopenmpi-30371/BUILD/openmpi-1.3/o
>> mpi/contrib/vt/vt'
>> *** [all-recursive] Error 1
>>
>>
>> Please suggest.
>>
>> Thanks
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>