Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r24395

2011-02-16 Thread Don Kerr

You would think but I did not want to speculate what OFED might do.

I'm fine skipping the Solaris check, if OFED does include things may 
have to change at that point anyway.


On 02/16/11 09:41, Jeff Squyres wrote:

If OFED includes that constant, wouldn't we want to use it?

PCI ordering is PCI ordering (i.e., unreliable) on all hardware -- or am I 
wrong?

On Feb 16, 2011, at 8:59 AM, Don Kerr wrote:


I considered that but I wanted to guard against future OFED inclusion. Removing 
the Solaris check is easy enough.


On 02/16/11 08:49, Jeff Squyres wrote:

On Feb 16, 2011, at 8:29 AM, Don Kerr wrote:

Yes this is Solaris only. OFED has not bought back the IBV_ACCESS_SO flag. Not 
sure they ever will.

It should be sufficient to AC_CHECK_DECLS then -- no need for the additional 
Solaris check.





Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r24395

2011-02-16 Thread Don Kerr
Yes this is Solaris only. OFED has not bought back the IBV_ACCESS_SO 
flag. Not sure they ever will.


On 02/16/11 08:15, Jeff Squyres wrote:

Oracle --

Is this really only specific to Solaris?  More comments below about 
configure.m4.


On Feb 16, 2011, at 12:37 AM, dk...@osl.iu.edu wrote:


Author: dkerr
Date: 2011-02-16 00:37:22 EST (Wed, 16 Feb 2011)
New Revision: 24395
URL: https://svn.open-mpi.org/trac/ompi/changeset/24395

Log:
on Solaris, when IBV_ACCESS_SO is available, use strong ordered memory region 
for eager rdma connection
Text files modified: 
  trunk/ompi/mca/btl/openib/btl_openib_component.c |13 ++---   
  trunk/ompi/mca/btl/openib/btl_openib_endpoint.c  |19 +-- 
  trunk/ompi/mca/btl/openib/configure.m4   |16 +++-
  3 files changed, 42 insertions(+), 6 deletions(-)


Modified: trunk/ompi/mca/btl/openib/btl_openib_component.c
==
--- trunk/ompi/mca/btl/openib/btl_openib_component.c(original)
+++ trunk/ompi/mca/btl/openib/btl_openib_component.c2011-02-16 00:37:22 EST 
(Wed, 16 Feb 2011)
@@ -15,7 +15,7 @@
 * Copyright (c) 2006-2007 Los Alamos National Security, LLC.  All rights
 * reserved.
 * Copyright (c) 2006-2007 Voltaire All rights reserved.
- * Copyright (c) 2009-2010 Oracle and/or its affiliates.  All rights reserved.
+ * Copyright (c) 2009-2011 Oracle and/or its affiliates.  All rights reserved.
 * $COPYRIGHT$
 *
 * Additional copyrights may follow
@@ -527,9 +527,16 @@
{
mca_btl_openib_device_t *device = (mca_btl_openib_device_t*)reg_data;
mca_btl_openib_reg_t *openib_reg = (mca_btl_openib_reg_t*)reg;
+enum ibv_access_flags access_flag = IBV_ACCESS_LOCAL_WRITE |
+IBV_ACCESS_REMOTE_WRITE | IBV_ACCESS_REMOTE_READ;

-openib_reg->mr = ibv_reg_mr(device->ib_pd, base, size, 
IBV_ACCESS_LOCAL_WRITE |
-IBV_ACCESS_REMOTE_WRITE | IBV_ACCESS_REMOTE_READ);
+#if defined(HAVE_IBV_ACCESS_SO)
+if (reg->flags & MCA_MPOOL_FLAGS_SO_MEM) {
+access_flag |= IBV_ACCESS_SO;
+}
+#endif
+
+openib_reg->mr = ibv_reg_mr(device->ib_pd, base, size, access_flag);

if (NULL == openib_reg->mr) {
return OMPI_ERR_OUT_OF_RESOURCE;

Modified: trunk/ompi/mca/btl/openib/btl_openib_endpoint.c
==
--- trunk/ompi/mca/btl/openib/btl_openib_endpoint.c (original)
+++ trunk/ompi/mca/btl/openib/btl_openib_endpoint.c 2011-02-16 00:37:22 EST 
(Wed, 16 Feb 2011)
@@ -16,7 +16,7 @@
 * Copyright (c) 2006-2007 Voltaire All rights reserved.
 * Copyright (c) 2006-2009 Mellanox Technologies, Inc.  All rights reserved.
 * Copyright (c) 2010  IBM Corporation.  All rights reserved.
- * Copyright (c) 2010  Oracle and/or its affiliates.  All rights reserved
+ * Copyright (c) 2010-2011 Oracle and/or its affiliates.  All rights reserved
 *
 * $COPYRIGHT$
 *
@@ -911,6 +911,7 @@
char *buf;
mca_btl_openib_recv_frag_t *headers_buf;
int i;
+uint32_t flag = MCA_MPOOL_FLAGS_CACHE_BYPASS;

/* Set local rdma pointer to 1 temporarily so other threads will not try
 * to enter the function */
@@ -925,11 +926,25 @@
if(NULL == headers_buf)
   goto unlock_rdma_local;

+#if defined(HAVE_IBV_ACCESS_SO)
+/* Solaris implements the Relaxed Ordering feature defined in the
+   PCI Specification. With this in mind any memory region which
+   relies on a buffer being written in a specific order, for
+   example the eager rdma connections created in this routinue,
+   must set a strong order flag when registering the memory for
+   rdma operations.
+
+   The following flag will be interpreted and the appropriate
+   steps will be taken when the memory is registered in
+   openib_reg_mr(). */
+flag |= MCA_MPOOL_FLAGS_SO_MEM;
+#endif
+
buf = (char *) 
openib_btl->super.btl_mpool->mpool_alloc(openib_btl->super.btl_mpool,
openib_btl->eager_rdma_frag_size *
mca_btl_openib_component.eager_rdma_num,
mca_btl_openib_component.buffer_alignment,
-MCA_MPOOL_FLAGS_CACHE_BYPASS,
+flag,
(mca_mpool_base_registration_t**)>eager_rdma_local.reg);

if(!buf)

Modified: trunk/ompi/mca/btl/openib/configure.m4
==
--- trunk/ompi/mca/btl/openib/configure.m4  (original)
+++ trunk/ompi/mca/btl/openib/configure.m4  2011-02-16 00:37:22 EST (Wed, 
16 Feb 2011)
@@ -12,6 +12,7 @@
# All rights reserved.
# Copyright (c) 2007-2010 Cisco Systems, Inc.  All rights reserved.
# Copyright (c) 2008  Mellanox Technologies.  All rights reserved.
+# Copyright (c) 2011  Oracle and/or its affiliates.  All rights reserved.
# $COPYRIGHT$
# 
# Additional copyrights may 

Re: [OMPI devel] trac #2034 : single rail openib btl shows better bandwidth than dual rail (12k< x < 128k)

2009-10-09 Thread Don Kerr

On 10/08/09 17:14, Don Kerr wrote:

George,

This is an interesting approach although I am guessing the changes 
would be wide spread and have many performance implications. Am I 
wrong in this belief?
My point here is that if this is going to have as many performance 
implications as I think it will, it probably makes sense to investigate 
the potential bigger dual-rail issue and consider the "never share" 
approach in the larger context.


-DON



-DON

On 10/08/09 11:45, George Bosilca wrote:

Don,

I think we can do something slightly different that will satisfy 
everybody.


How about a solution where each BTL will define a limit where a 
message will never be shared with another BTL? We can have two such 
limits, one for the send protocol and one for the RMA (it will apply 
either to PUT or GET operations based on the BTL support and PML 
decision).


  george.

On Oct 8, 2009, at 11:01 , Don Kerr wrote:




On 10/07/09 13:52, George Bosilca wrote:

Don,

The problem is that a particular BTL doesn't have the knowledge 
about the other selected BTL, so allowing the BTLs to set this 
limit is not as easy as it sound. However, in the case two 
identical BTLs are selected and that they are the only ones, this 
clearly is a better approach.


If this parameter is set at the PML level, I can't imagine how we 
figure out the correct value depending on the BTLs.


I see this as a pretty strong restriction. How do we know we set a 
value that make sense?
OK, I now see why setting at btl level is difficult. And for the 
case of multiple btls which are also different component types, 
however unlikely that is,  a pml setting will not be optimal for both.


-DON




 george.

On Oct 7, 2009, at 10:19 , Don Kerr wrote:


George,

Were you suggesting that the proposed new parameter 
"max_rdma_single_rget" be set by the individual btls similar to 
"btl_eager_limit"?  Seems to me to that is the better approach if 
I am to move forward with this.


-DON

On 10/06/09 11:14, Don Kerr wrote:
I agree there is probably a larger issue here and yes this is 
somewhat specific but where as OB1 appears to have multiple 
protocols depending on the capabilities of the BTLs I would not 
characterize as an IB centric problem. Maybe OB1 RDMA problem. 
There is a clear benefit from modifying this specific case. Do 
you think its not worth making incremental improvements while 
also attacking a potential bigger issue?


-DON

On 10/06/09 10:52, George Bosilca wrote:

Don,

This seems a very IB centric problem (and solution) going up in 
the PML. Moreover, I noticed that independent on the BTL we have 
some problems with the multi-rail performance. As an example on 
a cluster with 3 GB cards we get the same performance is I 
enable 2 or 3. Didn't had time to look into the details, but 
this might be a more general problem.


george.

On Oct 6, 2009, at 09:51 , Don Kerr wrote:



I intend to make the change suggested in this ticket to the 
trunk.  The change does not impact single rail, tested with 
openib btl, case and does improve dual rail case. Since it does 
involve performance and I am adding a OB1 mca parameter just 
wanted to check if anyone was interested or had an issue with 
it before I committed the change.


-DON
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [OMPI devel] trac #2034 : single rail openib btl shows better bandwidth than dual rail (12k< x < 128k)

2009-10-08 Thread Don Kerr

George,

This is an interesting approach although I am guessing the changes would 
be wide spread and have many performance implications. Am I wrong in 
this belief?


-DON

On 10/08/09 11:45, George Bosilca wrote:

Don,

I think we can do something slightly different that will satisfy 
everybody.


How about a solution where each BTL will define a limit where a 
message will never be shared with another BTL? We can have two such 
limits, one for the send protocol and one for the RMA (it will apply 
either to PUT or GET operations based on the BTL support and PML 
decision).


  george.

On Oct 8, 2009, at 11:01 , Don Kerr wrote:




On 10/07/09 13:52, George Bosilca wrote:

Don,

The problem is that a particular BTL doesn't have the knowledge 
about the other selected BTL, so allowing the BTLs to set this limit 
is not as easy as it sound. However, in the case two identical BTLs 
are selected and that they are the only ones, this clearly is a 
better approach.


If this parameter is set at the PML level, I can't imagine how we 
figure out the correct value depending on the BTLs.


I see this as a pretty strong restriction. How do we know we set a 
value that make sense?
OK, I now see why setting at btl level is difficult. And for the case 
of multiple btls which are also different component types, however 
unlikely that is,  a pml setting will not be optimal for both.


-DON




 george.

On Oct 7, 2009, at 10:19 , Don Kerr wrote:


George,

Were you suggesting that the proposed new parameter 
"max_rdma_single_rget" be set by the individual btls similar to 
"btl_eager_limit"?  Seems to me to that is the better approach if I 
am to move forward with this.


-DON

On 10/06/09 11:14, Don Kerr wrote:
I agree there is probably a larger issue here and yes this is 
somewhat specific but where as OB1 appears to have multiple 
protocols depending on the capabilities of the BTLs I would not 
characterize as an IB centric problem. Maybe OB1 RDMA problem. 
There is a clear benefit from modifying this specific case. Do you 
think its not worth making incremental improvements while also 
attacking a potential bigger issue?


-DON

On 10/06/09 10:52, George Bosilca wrote:

Don,

This seems a very IB centric problem (and solution) going up in 
the PML. Moreover, I noticed that independent on the BTL we have 
some problems with the multi-rail performance. As an example on a 
cluster with 3 GB cards we get the same performance is I enable 2 
or 3. Didn't had time to look into the details, but this might be 
a more general problem.


george.

On Oct 6, 2009, at 09:51 , Don Kerr wrote:



I intend to make the change suggested in this ticket to the 
trunk.  The change does not impact single rail, tested with 
openib btl, case and does improve dual rail case. Since it does 
involve performance and I am adding a OB1 mca parameter just 
wanted to check if anyone was interested or had an issue with it 
before I committed the change.


-DON
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [OMPI devel] trac #2034 : single rail openib btl shows better bandwidth than dual rail (12k< x < 128k)

2009-10-08 Thread Don Kerr



On 10/07/09 13:52, George Bosilca wrote:

Don,

The problem is that a particular BTL doesn't have the knowledge about 
the other selected BTL, so allowing the BTLs to set this limit is not 
as easy as it sound. However, in the case two identical BTLs are 
selected and that they are the only ones, this clearly is a better 
approach.


If this parameter is set at the PML level, I can't imagine how we 
figure out the correct value depending on the BTLs.


I see this as a pretty strong restriction. How do we know we set a 
value that make sense?
OK, I now see why setting at btl level is difficult. And for the case of 
multiple btls which are also different component types, however unlikely 
that is,  a pml setting will not be optimal for both.


-DON




  george.

On Oct 7, 2009, at 10:19 , Don Kerr wrote:


George,

Were you suggesting that the proposed new parameter 
"max_rdma_single_rget" be set by the individual btls similar to 
"btl_eager_limit"?  Seems to me to that is the better approach if I 
am to move forward with this.


-DON

On 10/06/09 11:14, Don Kerr wrote:
I agree there is probably a larger issue here and yes this is 
somewhat specific but where as OB1 appears to have multiple 
protocols depending on the capabilities of the BTLs I would not 
characterize as an IB centric problem. Maybe OB1 RDMA problem. There 
is a clear benefit from modifying this specific case. Do you think 
its not worth making incremental improvements while also attacking a 
potential bigger issue?


-DON

On 10/06/09 10:52, George Bosilca wrote:

Don,

This seems a very IB centric problem (and solution) going up in the 
PML. Moreover, I noticed that independent on the BTL we have some 
problems with the multi-rail performance. As an example on a 
cluster with 3 GB cards we get the same performance is I enable 2 
or 3. Didn't had time to look into the details, but this might be a 
more general problem.


 george.

On Oct 6, 2009, at 09:51 , Don Kerr wrote:



I intend to make the change suggested in this ticket to the 
trunk.  The change does not impact single rail, tested with openib 
btl, case and does improve dual rail case. Since it does involve 
performance and I am adding a OB1 mca parameter just wanted to 
check if anyone was interested or had an issue with it before I 
committed the change.


-DON
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [OMPI devel] trac #2034 : single rail openib btl shows better bandwidth than dual rail (12k< x < 128k)

2009-10-07 Thread Don Kerr

George,

Were you suggesting that the proposed new parameter 
"max_rdma_single_rget" be set by the individual btls similar to 
"btl_eager_limit"?  Seems to me to that is the better approach if I am 
to move forward with this.


-DON

On 10/06/09 11:14, Don Kerr wrote:
I agree there is probably a larger issue here and yes this is somewhat 
specific but where as OB1 appears to have multiple protocols depending 
on the capabilities of the BTLs I would not characterize as an IB 
centric problem. Maybe OB1 RDMA problem. There is a clear benefit from 
modifying this specific case. Do you think its not worth making 
incremental improvements while also attacking a potential bigger issue?


-DON

On 10/06/09 10:52, George Bosilca wrote:

Don,

This seems a very IB centric problem (and solution) going up in the 
PML. Moreover, I noticed that independent on the BTL we have some 
problems with the multi-rail performance. As an example on a cluster 
with 3 GB cards we get the same performance is I enable 2 or 3. 
Didn't had time to look into the details, but this might be a more 
general problem.


  george.

On Oct 6, 2009, at 09:51 , Don Kerr wrote:



I intend to make the change suggested in this ticket to the trunk.  
The change does not impact single rail, tested with openib btl, case 
and does improve dual rail case. Since it does involve performance 
and I am adding a OB1 mca parameter just wanted to check if anyone 
was interested or had an issue with it before I committed the change.


-DON
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


[OMPI devel] trac #2034 : single rail openib btl shows better bandwidth than dual rail (12k< x < 128k)

2009-10-06 Thread Don Kerr


I intend to make the change suggested in this ticket to the trunk.  The 
change does not impact single rail, tested with openib btl, case and 
does improve dual rail case. Since it does involve performance and I am 
adding a OB1 mca parameter just wanted to check if anyone was interested 
or had an issue with it before I committed the change.


-DON


Re: [OMPI devel] BTL receive callback

2009-07-22 Thread Don Kerr

Hello Sebastian,

Sounds like you are using the openib btl as a starting point, which is a 
good place to start. I am curious if you are indeed using a new 
interconnect (new hardware and protocol) or if it is requirements of the 
3D-torus network that are not addressed by the openib btl that are 
driving the need for a new btl?


-DON

On 07/21/09 11:55, Sebastian Rinke wrote:

Hello,
I am developing a new BTL component (Open MPI v1.3.2) for a new 
3D-torus interconnect. During a simple message transfer of 16362 B 
between two nodes with MPI_Send(), MPI_Recv() I encounter the following:


The sender:
---

1. prepare_src() size: 16304 reserve: 32
-> alloc() size: 16336
-> ompi_convertor_pack(): 16304
2. send()
3. component_progress()
-> send cb ()
-> free()
4. component_progress()
-> recv cb ()
-> prepare_src() size: 58 reserve: 32
-> alloc() size: 90
-> ompi_convertor_pack(): 58
-> free() size: 90 Send is missing !!!
5. NO PROGRESS

The receiver:
-

1. component_progress()
-> recv cb ()
-> alloc() size: 32
-> send()
2. component_progress()
-> send cb ()
-> free() size: 32
3. component_progress() for ever !!!

The problem is that after prepare_src() for the 2nd fragment, the
sender calls free() instead of send() in its recv cb. Thus, the 2nd
fragment is not being transmitted.
As a consequence, the receiver waits for the 2nd fragment.

I have found that mca_pml_ob1_recv_frag_callback_ack() is the
corresponding recv cb. Before diving into the ob1 code,
could you tell me under which conditions this cb calls free() instead 
of send()
so that I can get an idea of where to look for errors in my BTL 
component.


Thank you very much in advance.

Sebastian Rinke

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [OMPI devel] Trunk Heads Up

2008-08-07 Thread Don Kerr

Sorry. I missed my lashing while on the phone.  Thanks George, Thanks Jeff.

George Bosilca wrote:

r19218 fixes this problem. I couldn't wait so I fix it myself.

  george.

On Aug 7, 2008, at 7:38 PM, Jeff Squyres wrote:

There's a missing $2 in the configure.m4.  Don actually did ask for a 
review from Brian and me, and I gave a quick one.  My bad for missing 
it.


I'm testing to ensure the fix is right, and then I'll commit.


On Aug 7, 2008, at 1:05 PM, George Bosilca wrote:

Well, the commit itself doesn't modify the build process, as you 
just added a new component. However, if people autogen, you 
component doesn't correctly disable itself when not on Solaris. As a 
result, the build fails on MAC OS X.


Here is the error I get at build time:

ranlib: file: .libs/libmca_memchecker.a(memchecker_base_wrappers.o) 
has no symbols
../../../../../ompi/opal/mca/memory/malloc_solaris/memory_malloc_solaris_component.c:94: 
error: conflicting types for ‘munmap’
/usr/include/sys/mman.h:212: error: previous declaration of ‘munmap’ 
was here
../../../../../ompi/opal/mca/memory/malloc_solaris/memory_malloc_solaris_component.c:118:6: 
error: #error "Can not determine how to call munmap"


And here is a snippet from the config.log:

configure:78271: checking for Solaris
configure:78988: result: no
configure:79050: checking if MCA component memory:malloc_solaris can 
compile

configure:79052: result: yes

george.

On Aug 7, 2008, at 6:07 PM, Jeff Squyres wrote:


Eh.  Damage is done.  Leave it in.

We'll whip you later.  ;-)


On Aug 7, 2008, at 12:04 PM, Don Kerr wrote:


All,

I just did a commit (-r19217) which I believe will require an 
autogen. Since I was reminded that this is not good citizen 
behavior for the middle of the day I will now start figuring out 
how to back this out unless someone beats me to it.


-DON (with head hung low)
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
  


[OMPI devel] Trunk Heads Up

2008-08-07 Thread Don Kerr

All,

I just did a commit (-r19217) which I believe will require an autogen. 
Since I was reminded that this is not good citizen behavior for the 
middle of the day I will now start figuring out how to back this out 
unless someone beats me to it.


-DON (with head hung low)


Re: [OMPI devel] IBCM error

2008-07-16 Thread Don Kerr



Jeff Squyres wrote:

On Jul 15, 2008, at 7:30 AM, Ralph Castain wrote:


Minor clarification: we did not test RDMACM on RoadRunner.


Just for further clarification - I did, and it wasn't a particularly 
good
experience. Encountered several problems, none of them overwhelming, 
hence

my comments.


Ah -- I didn't know this.  What went wrong?  We need to fix it if 
there are problems.



RDMACM, on the other hand, is *necessary* for iWARP connections.  We
know it won't scale well because of ARP issues, to which the iWARP
vendors are publishing their own solutions (pre-populating ARP caches,
etc.).  Even when built and installed, RDMACM will not be used by
default for IB hardware (you have to specifically ask for it).  Since
it's necessary for iWARP, I think we need to build and install it by
default.  Most importantly: production IB users won't be disturbed.


If it is necessary for iWARP, then fine - so long as it is only used if
specifically requested.

However, I would also ask that we be able to -not- build it upon 
request so
we can be certain a user doesn't attempt to use it by mistake ("gee, 
that

looks interesting - let Mikey try it!"). Ditto for ibcm support.


Pasha added configure switches for this about a week ago:

--en|disable-openib-ibcm
--en|disable-openib-rdmacm
I like these flags but I thought there was going to be a run time check 
for cases where Open MPI is built on a system that has ibcm support but 
is later run on a system without ibcm support.


-DON



Re: [OMPI devel] PLM consistency: launch agent param

2008-07-11 Thread Don Kerr
For something as fundamental as launch do we still need to specify the 
component, could it just be "launch_agent"?


Jeff Squyres wrote:
Sounds good to me.  We've done similar things in other frameworks -- 
put in MCA base params for things that all components could use.  How 
about plm_base_launch_agent?



On Jul 11, 2008, at 10:17 AM, Ralph H Castain wrote:


Since the question of backward compatibility of params came up... ;-)

I've been perusing the various PLM modules to check consistency. One 
thing I
noted right away is that -every- PLM module registers an MCA param to 
let

the user specify an orted cmd. I believe this specifically was done so
people could insert their favorite debugger in front of the "orted" 
on the

spawned command line - e.g., "valgrind orted".

The problem is that this forces the user to have to figure out the 
name of
the PLM module being used as the param is called "-mca 
plm_rsh_agent", or

"-mca plm_lsf_orted", or...you name it.

For users that only ever operate in one environment, who cares. However,
many users (at least around here) operate in multiple environments, 
and this

creates confusion.

I propose to create a single MCA param name for this value - 
something like
"-mca plm_launch_agent" or whatever - and get rid of all these 
individual

registrations to reduce the user confusion.

Comments? I'll put my helmet on
Ralph


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





[OMPI devel] Open IB BTL and iWARP

2008-07-09 Thread Don Kerr
Last I looked the OpenIB BTL relied on the short eager rdma buffers 
being written in order?   Is this still the case?


If so, how is this handled when iWARP is underneath the User Verb API 
and not Mellonox IB HCAs?


Re: [OMPI devel] open ib dependency question

2008-07-03 Thread Don Kerr

capturing in the bug is good enough for me at this point, thanks Jeff

Jeff Squyres wrote:

Ok:

https://svn.open-mpi.org/trac/ompi/ticket/1375

I think any of us could do this -- it's pretty straightforward. No 
guarantees on when I can get to it; my 1.3 list is already pretty long...



On Jul 3, 2008, at 6:20 AM, Pavel Shamis (Pasha) wrote:


Jeff Squyres wrote:

Do you need configury to disable building ibcm / rdmacm support?

The more I think about it, the more I think that these would be good 
features to have for v1.3...
I had similar issue recently. It will be nice to have option to 
disable/enable *CM via config flags.



On Jul 3, 2008, at 2:52 AM, Don Kerr wrote:

I did not think it was required but it hung me up when I built ompi 
on one system which had the ibcm libraries and then ran on a system 
without the ibcm libs. I had another issue on the system without 
ibcm libs which prevented my building there but I will go down that 
path again. Thanks.


Jeff Squyres wrote:
That is the IBCM library for the IBCM CPC -- IB connection manager 
stuff.


It's not *necessary*; you could use the OOB CPC if you want to.

That being said, OMPI currently builds support for it (and links 
it in) if it finds the right headers and library files. We don't 
currently have configury to disable this behavior (and *not* build 
RDMACM and/or IBCM support).


Do you have a problem / need to disable building support for IBCM?



On Jul 2, 2008, at 12:02 PM, Don Kerr wrote:



It appears that the mca_btl_openib.so has a dependency on 
libibcm.so. Is this necessary?



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [OMPI devel] open ib dependency question

2008-07-03 Thread Don Kerr
I did not think it was required but it hung me up when I built ompi on 
one system which had the ibcm libraries and then ran on a system without 
the ibcm libs. I had another issue on the system without ibcm libs which 
prevented my building there but I will go down that path again. Thanks.


Jeff Squyres wrote:

That is the IBCM library for the IBCM CPC -- IB connection manager stuff.

It's not *necessary*; you could use the OOB CPC if you want to.

That being said, OMPI currently builds support for it (and links it 
in) if it finds the right headers and library files. We don't 
currently have configury to disable this behavior (and *not* build 
RDMACM and/or IBCM support).


Do you have a problem / need to disable building support for IBCM?



On Jul 2, 2008, at 12:02 PM, Don Kerr wrote:



It appears that the mca_btl_openib.so has a dependency on libibcm.so. 
Is this necessary?



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





[OMPI devel] open ib dependency question

2008-07-02 Thread Don Kerr


It appears that the mca_btl_openib.so  has a dependency on libibcm.so.  
Is this necessary?  






[OMPI devel] Open MPI Linux Expectations

2008-05-22 Thread Don Kerr
Can anyone set my expectations with their real world experiences 
regarding building Open MPI on one release of Linux and running on another.


If I were to...
Build OMPI on Redhat 4, will it run on later releases of Redhat, e.g. 
Redhat 5?

Build OMPI on Suse 9, will it run on later releases of Suse, e.g. Suse 10?
Build OMPI on Redhat, will it run on Suse?
Build OMPI on Suse, will it run on Redhat?

Thanks in advance for your insights.
-DON


Re: [OMPI devel] openib btl build question

2008-05-22 Thread Don Kerr

Thanks Jeff. Thanks Brian.

I ran into this because I was specifically trying to configure with 
"--disable-progress-threads --disable-mpi-threads" at which point I 
figured, might as well turn off all threads so I added 
"--without-threads" as well. But can't live without mpi_leave_pinned so 
threads are back.



Jeff Squyres wrote:

On May 21, 2008, at 4:37 PM, Brian W. Barrett wrote:

  

ptmalloc2 is not *required* by the openib btl.  But it is required on
Linux if you want to use the mpi_leave_pinned functionality.  I see
one function call to __pthread_initialize in the ptmalloc2 code -- it
*looks* like it's a function of glibc, but I don't know for sure.
  
There's actually more than that, it's just buried a bit.  There's a  
whole
bunch of thread-specific data stuff, which is wrapped so that  
different
thread packages can be used (although OMPI only supports pthreads).   
The

wrappers are in ptmalloc2/sysdeps/pthreads.




Doh!  I didn't "grep -r"; my bad...

  


[OMPI devel] openib btl build question

2008-05-21 Thread Don Kerr


Just want to make sure what I think I see is true:

Linux build.  openib btl requires ptmalloc2 and ptmalloc2 requires posix 
threads, is that correct?


[OMPI devel] btl_openib_iwarp.c : making platform specific calls

2008-05-13 Thread Don Kerr
I believe btl_open_iwarp.c is making platform specific calls. I don't 
have jdmason's email and wanted to send this note out before todays con 
call.


btl_openib_iwarp.c
#include 
getifaddrs()


Re: [OMPI devel] 32 bit udapl warnings

2008-01-31 Thread Don Kerr
This was brought to my attention once before but I don't see this 
message so I just plain forgot about it. :-(
uDAPL defines its pointers as uint64, "typedef DAT_UINT64 DAT_VADDR", 
and pval is a "void *" which is why the message comes up.  If I remove 
the cast I believe I get a different warning and I just haven't stopped 
to think of a way around this.


Tim Prins wrote:

Hi,

I am seeing some warnings on the trunk when compiling udapl in 32 bit 
mode with OFED 1.2.5.1:


btl_udapl.c: In function 'udapl_reg_mr':
btl_udapl.c:95: warning: cast from pointer to integer of different size
btl_udapl.c: In function 'mca_btl_udapl_alloc':
btl_udapl.c:852: warning: cast from pointer to integer of different size
btl_udapl.c: In function 'mca_btl_udapl_prepare_src':
btl_udapl.c:959: warning: cast from pointer to integer of different size
btl_udapl.c:1008: warning: cast from pointer to integer of different size
btl_udapl_component.c: In function 'mca_btl_udapl_component_progress':
btl_udapl_component.c:871: warning: cast from pointer to integer of 
different size

btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_write_eager':
btl_udapl_endpoint.c:130: warning: cast from pointer to integer of 
different size

btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_finish_max':
btl_udapl_endpoint.c:775: warning: cast from pointer to integer of 
different size

btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_post_recv':
btl_udapl_endpoint.c:864: warning: cast from pointer to integer of 
different size
btl_udapl_endpoint.c: In function 
'mca_btl_udapl_endpoint_initialize_control_message':
btl_udapl_endpoint.c:1012: warning: cast from pointer to integer of 
different size



Thanks,

Tim
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
  


Re: [OMPI devel] open ib btl and xrc

2008-01-18 Thread Don Kerr

Those pointers were perfect thanks.

It easy to see the benefit of fewer qps (per node instead of per peer) 
and less consumption of resources the better but I am curious about the 
actual percentage of memory footprint decrease. I am thinking that the 
largest portion of the footprint comes from the fragments. Do you have 
any numbers showing the actual memory footprint savings when using xrc? 
Just to be clear, I am not asking for you or anyone else to generate 
these numbers, but if you had them already I would be curious to know 
the over all savings.


-DON

Pavel Shamis (Pasha) wrote:

Here is paper from openib http://www.openib.org/archives/nov2007sc/XRC.pdf
and here is mvapich presentation 
http://mvapich.cse.ohio-state.edu/publications/ofa_nov07-mvapich-xrc.pdf


Button line: XRC decrease number of QPs that ompi opens and as result 
decrease ompi's memory footprint.
In the openib paper you may see more details about XRC. If you need more 
details about XRC implemention

in openib blt , please let me know.


Instead 
Don Kerr wrote:
  

Hi,

After searching, about the only thing I can find on xrc is what it 
stands for, can someone explain the benefits of open mpi's use of xrc, 
maybe point me to a paper, or both?


TIA
-DON

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




  


[OMPI devel] open ib btl and xrc

2008-01-17 Thread Don Kerr

Hi,

After searching, about the only thing I can find on xrc is what it 
stands for, can someone explain the benefits of open mpi's use of xrc, 
maybe point me to a paper, or both?


TIA
-DON



Re: [OMPI devel] Open IB BTL development question

2008-01-17 Thread Don Kerr
Thanks Steve, Jeff, Pasha, this is the kind of information I was looking 
for.


-DON

Pavel Shamis (Pasha) wrote:

I plan to add IB APM support  (not something specific to OFED)

Don Kerr wrote:
  
Looking at the list of new features for OFED 1.3 and seeing that support 
for XRC went into the trunk I am curious if support for additional OFED 
1.3 features will be included, or plan to be included in Open MPI? 

I am looking at the list of features here: 
http://64.233.167.104/search?q=cache:RXXOrY36QHcJ:www.openib.org/archives/nov2007sc/OFED%25201.3%2520status.ppt+ofed+1.3+feature=en=clnk=3=us=firefox-a
but I do not have any specific feature in mind, just wanted to get an 
idea what others are planning.


Thanks
-DON
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




  


[OMPI devel] Open IB BTL development question

2008-01-16 Thread Don Kerr


Looking at the list of new features for OFED 1.3 and seeing that support 
for XRC went into the trunk I am curious if support for additional OFED 
1.3 features will be included, or plan to be included in Open MPI? 

I am looking at the list of features here: 
http://64.233.167.104/search?q=cache:RXXOrY36QHcJ:www.openib.org/archives/nov2007sc/OFED%25201.3%2520status.ppt+ofed+1.3+feature=en=clnk=3=us=firefox-a
but I do not have any specific feature in mind, just wanted to get an 
idea what others are planning.


Thanks
-DON


Re: [OMPI devel] Multi-Rail and Open IB BTL

2007-11-14 Thread Don Kerr



Jeff Squyres wrote:


On Nov 9, 2007, at 1:24 PM, Don Kerr wrote:

 

both, I was thinking of listing what I think are multi-rail  
requirements

but wanted to understand what the current state of things are
   



I believe the OF portion of the FAQ describes what we do in the v1.2  
series (right Gleb?); I honestly don't remember what we do today on  
the trunk (I'm pretty sure that Gleb has tweaked it recently).
 


Gleb's response answered this.


As for what we *should* do, it's a very complicated question.  :-\
 

OK. I knew the "close to NIC" was a concern but was not aware an attempt 
to tackle this began. I will look at the "carto" framework.


Thanks
-DON

This is where all these discussions regarding affinity, NUMA, and NUNA  
(non uniform network architecture) come into play.  A "very simple"  
scenario may be something like this:


- host A is UMA (perhaps even a uniprocessor) with 2 ports that are  
equidistant from the 1 MPI process on that host
- host B is the same, except it only has 1 active port on the same IB  
subnet as host A's 2 ports

- the ports on both hosts are all the same speed (e.g., DDR)
- the ports all share a single, common, non-blocking switch

But even with this "simple" case, the answer as to what you should do  
is still unclear.  If host A is able to drive both of its DDR links at  
full speed, you're could cause congestion at the link to host B if the  
MPI process on host A opens two connections.  But if host A is only  
able to drive the same effective bandwidth out of its two ports as it  
is through a single port, then the end effect is probably fairly  
negligible -- it might not make much of a difference at all as to  
whether the MPI process A opens 1 or 2 connections to host B.


But then throw in other effects that I mentioned above (NUMA, NUNA,  
etc.), and the equation becomes much more complex.  In some cases, it  
may be good to open 1 connection (e.g., bandwidth load balancing); in  
other cases it may be good to open 2 (e.g., congestion avoidance /  
spreading traffic around the network, particularly in the presence of  
other MPI jobs on the network).  :-\


Such NUNA architectures may sound unusual to some, but both IBM and HP  
sell [many] blade-based HPC solutions with NUNA internal IB networks.   
Specifically: this is a fairly common scenario.


So this is a difficult question without a great answer.  The hope is  
that the new carto framework that Sharon sent requirements around for  
will be able to at least make topology information available from both  
the host and the network so that BTLs can possibly make some  
intelligent decisions about what to do in these kinds of scenarios.


 



Re: [OMPI devel] Multi-Rail and Open IB BTL

2007-11-09 Thread Don Kerr
both, I was thinking of listing what I think are multi-rail requirements 
but wanted to understand what the current state of things are


Jeff Squyres wrote:


Don --

Are you asking what *does* it do, or what *should* a BTL do?

On Nov 9, 2007, at 1:09 PM, Don Kerr wrote:

 


Gleb,

Another question.  What about the case of one node with 2 ports and  
one
node with one port.  Does the open ib btl allow the side with 2  
ports to

establish two  endpoints to the single remote port?

-DON

Gleb Natapov wrote:

   


On Thu, Nov 01, 2007 at 11:15:21AM -0400, Don Kerr wrote:


 


How would the openib btl handle the following scenario:
Two nodes, each with two ports, all ports are on the same subnet  
and switch.


Would striping occur over 4 connections or 2?


   


Only two connections will be created.



 

If 2 is it equal distribution or are both local ports connected to  
the

same remote port?



   


Equal distribution.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


 


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
   




 



Re: [OMPI devel] Multi-Rail and Open IB BTL

2007-11-09 Thread Don Kerr

Gleb,

Another question.  What about the case of one node with 2 ports and one 
node with one port.  Does the open ib btl allow the side with 2 ports to 
establish two  endpoints to the single remote port?


-DON

Gleb Natapov wrote:


On Thu, Nov 01, 2007 at 11:15:21AM -0400, Don Kerr wrote:
 


How would the openib btl handle the following scenario:
Two nodes, each with two ports, all ports are on the same subnet and switch.

Would striping occur over 4 connections or 2?
   


Only two connections will be created.

 

If 2 is it equal distribution or are both local ports connected to the 
same remote port?


   


Equal distribution.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
 



Re: [OMPI devel] openib currently broken

2007-11-02 Thread Don Kerr

Rich,

Do the ompi_free_list changes impact the sm btl?   Solaris SPARC sm btl 
seems to have an issue starting with last nights put back but I have not 
looked into it yet.


-DON

Richard Graham wrote:

R16641 should have fixed the regression.  Anyone using 
ompi_free_list_t_ex() and providing
 a memory allocator would have been bitten by this, since I did not 
update this function
 (which will be deprecated in favor of a version parallel to 
ompi_free_list_t_new) to initialize
 the new fields defined.  From looking through the btls, this seems to 
be only the openib btl.


Rich


On 11/2/07 12:31 PM, "Richard Graham"  wrote:




On 11/2/07 12:21 PM, "Jeff Squyres"  wrote:

The freelist changes from yesterday appear to have broken the
openib
btl.  We didn't get lots of test failures in MTT last night only
because there was a separate (unrelated) typo in the ofud BTL
that  
prevented the nightly tarball from building on any IB-capable

machines.  :-)

Rich hopes to look into fixing the openib BTL problem today; he
thinks it's a case of a simple oversight: the openib BTL is
not using
the new freelist init functions.

Rich: are there other places that are not using the new init
functions that need to?


the ompi free list has two init functions, I changed just

one.  The IB btl uses the

one I have not yet changed, but the pml uses the one I did

change.


rich


--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
 



[OMPI devel] Multi-Rail and Open IB BTL

2007-11-01 Thread Don Kerr

How would the openib btl handle the following scenario:
Two nodes, each with two ports, all ports are on the same subnet and switch.

Would striping occur over 4 connections or 2?

If 2 is it equal distribution or are both local ports connected to the 
same remote port?


Thanks
-DON


[OMPI devel] v1.2 branch mpi_preconnect_all

2007-10-17 Thread Don Kerr

All,

I have noticed an issue in the 1.2 branch when mpi_preconnect_all=1. The 
one way communication pattern (ranks either send or receive from each 
other) may not fully establish connection with peers. Example, if I have 
a 3 process mpi job and rank 0 does not do any mpi communication after 
MPI_Init() the other ranks attempts to connect will not be progressed (I 
have seen this with tcp and udapl).  
The preconnect pattern has changed slightly in the trunk but essentially 
it is still a one way communication, either send or receive with each 
rank. So although the issue I see in the 1.2 branch does not appear in 
the trunk I wonder if this will show up again.


An alternative to the preconnect pattern that comes to mind would be to 
perform a send and receive between all ranks to ensure that connections 
have been fully established.


Does anyone have thoughts or comments on this, or reasons not to have 
all ranks send and receive from all?


-DON


Re: [OMPI devel] OpenIB BTL and SRQs

2007-07-13 Thread Don Kerr



Jeff Squyres wrote:


On Jul 12, 2007, at 1:18 PM, Don Kerr wrote:

 


- So if you want to simply eliminate the flow control, choose M high
enough (or just a total number of receive buffers to post to the SRQ)
that you won't ever run out of resources and you should see some
speedup from lack of flow control.  This obviously mainly helps apps
with lots of small messages; it may not help in many other cases.

 


Is there any distinction by the size of the message. If the "M"
parameter is set high does the openib btl post this many recv buffers
for the SRQ on both QPs?  Or are SRQs only created on one of the QPs?
   



Keep in mind that the SRQs are only for send/receive messages, not  
RDMA messages.
 

That is obviously enough but isn't there a window for MPI messages that 
are greater than the eager limit but less than where the rdma protocol 
kicks in and fragments for this size message use fragments larger than 
than the eager size.


Maybe this is where openib's high and low priority qp differ from udapl 
which makes a choice of which endpoint to use based on the size of the 
fragment. That is why I was curious if openib was using SRQs on both 
queue pairs.


Each receive buffer has a max size (the eager limit, IIRC).  So if  
the message is larger than that, we'll fragment per the pipeline  
protocol, possibly subject to doing RDMA if the message is large  
enough, yadda yadda yadda.  More specifically, the size of the buffer  
is not dependent upon an individual message that is being sent or  
received (since they're pre-posted -- we have no idea what the  
message sizes will be).


As for whether the SRQ is on both QP's, this is a Galen/George/Gleb  
(G^3) question...


 



Re: [OMPI devel] OpenIB BTL and SRQs

2007-07-12 Thread Don Kerr



Jeff Squyres wrote:


There's a few benefits:

- Remember that you post a big pool of buffers instead of num_peers  
individual sets of receive buffers.  Hence, if you post M buffers for  
each of N peers, each peer -- due to flow control -- can only have M  
outstanding sends at a time.  So if you have apps sending lots of  
small messages, you can get better utilization of buffer space  
because a single peer has more than M buffers to receive into.


- You can also post less than M*N buffers by playing the statistics  
of your app -- if you know that you won't have more than M*N messages  
outstanding at any given time, you can post fewer receive buffers.
 

- At the same time, there's a problem with flow control (meaning that  
there is none): how can a sender know when they have overflowed the  
receiver (other than an RNR)?  So it's not necessarily as safe.


- So if you want to simply eliminate the flow control, choose M high  
enough (or just a total number of receive buffers to post to the SRQ)  
that you won't ever run out of resources and you should see some  
speedup from lack of flow control.  This obviously mainly helps apps  
with lots of small messages; it may not help in many other cases. 
 

Is there any distinction by the size of the message. If the "M" 
parameter is set high does the openib btl post this many recv buffers 
for the SRQ on both QPs?  Or are SRQs only created on one of the QPs?




On Jul 12, 2007, at 12:29 PM, Don Kerr wrote:

 


Through mca parameters one can select the use of shared receive queues
in the openib btl, other than having fewer queues I am wondering what
are the benefits of using this option. Can anyone eleborate on using
them vs the default?

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
   




 



Re: [OMPI devel] OpenIB BTL and SRQs

2007-07-12 Thread Don Kerr
Interesting. So with SRQs there is no flow control, I am guessing the 
btl sets some reasonable default but essentially is relying on the user 
to adjust other parameters so the buffers are not over run.


And yes Galen I would like to read your paper.

Jeff Squyres wrote:


There's a few benefits:

- Remember that you post a big pool of buffers instead of num_peers  
individual sets of receive buffers.  Hence, if you post M buffers for  
each of N peers, each peer -- due to flow control -- can only have M  
outstanding sends at a time.  So if you have apps sending lots of  
small messages, you can get better utilization of buffer space  
because a single peer has more than M buffers to receive into.


- You can also post less than M*N buffers by playing the statistics  
of your app -- if you know that you won't have more than M*N messages  
outstanding at any given time, you can post fewer receive buffers.


- At the same time, there's a problem with flow control (meaning that  
there is none): how can a sender know when they have overflowed the  
receiver (other than an RNR)?  So it's not necessarily as safe.


- So if you want to simply eliminate the flow control, choose M high  
enough (or just a total number of receive buffers to post to the SRQ)  
that you won't ever run out of resources and you should see some  
speedup from lack of flow control.  This obviously mainly helps apps  
with lots of small messages; it may not help in many other cases.



On Jul 12, 2007, at 12:29 PM, Don Kerr wrote:

 


Through mca parameters one can select the use of shared receive queues
in the openib btl, other than having fewer queues I am wondering what
are the benefits of using this option. Can anyone eleborate on using
them vs the default?

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
   




 



[OMPI devel] OpenIB BTL and SRQs

2007-07-12 Thread Don Kerr
Through mca parameters one can select the use of shared receive queues 
in the openib btl, other than having fewer queues I am wondering what 
are the benefits of using this option. Can anyone eleborate on using 
them vs the default?




Re: [OMPI devel] opal_output_verbose usage guidelines

2007-07-09 Thread Don Kerr
Yes I use opal_show_help in other places but that is an all or nothing 
proposition. I think the ability to be verbose or quiet can be very 
usefull to end users and that is what I need at the moment. 


-DON

Jeff Squyres wrote:


On Jul 9, 2007, at 9:58 AM, Don Kerr wrote:

 


You want a warning to show when:

1. the udapl btl is used
2. --enable-debug was not configured
3. the user specifies btl_*_verbose (or btl_*_debug) >= some_value

Is that right?  If so, is the intent to warn that somen checks are
not being performed that one would otherwise assume are being
performed (because of #3)?

 


#1 and #2 is just to convey the environment I expect the user to be
running in, not the error case. Interpretation of #3 is a little  
askew.

uDAPL gets its HCA information from  /etc/dat.conf. This file has an
entry for each HCA, even those that are potentially not "UP". Also it
appears the OFED stack includes by default an entry for "OpenIB-bond"
which I have not figured out what it is yet.  In anycase uDAPL has
trouble distinguishing if an HCA is down intentionally or if is down
because something is wrong. So the uDAPL BTL attempts to open all  
of the

entries in this file.
   



You might want to ping the OFA general mailing list or the DAT  
mailing lists with these kinds of questions...?


 


And the issues becomes how much information to
toss back to the user. If a node has two IB interfaces but only one is
up, do they want see a warning message about one of the interfaces  
being
down when they already know this by looking at "ifconfig"?  I think  
not.

But this could be valueable information if there is a real problem.
   



True.  FWIW, in the openib btl, we only use HCA ports that are active  
(i.e., have a link signal and have been recognized/allowed on the  
network by the SM); we silently ignore those that are not active.  We  
do not currently have a diagnostic that shows which ports are ignored  
because they are not active, IIRC.


 

Since its just one message at this point I think I will go with the  
base
output_id and if I need more I will look to create a component  
specific

id.  Thanks Jeff.
   



FWIW, we always treat the opal_output_verbose output as optional  
output.  If there's something that you definitely want to toss back  
to the user, use opal_show_help.


 


I expect to pursue this in order to find a better way to distinguish
between an interface that is up or down but I don't have a solution at
the moment.

-DON


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
   




 



[OMPI devel] udapl v1.2.4 merge

2007-06-18 Thread Don Kerr

Just a heads up.

I have merged the uDAPL BTL from the trunk to a tmp repository of v1.2 
branch. Can be found in 
https://svn.open-mpi.org/svn/ompi/tmp/dkerr_udaplv1.2_rdma
if anyone is interested in testing before I submit the CMR to bring into 
1.2.4.


Main goal of CMR: Improve uDAPL BTL performance by adding rdma 
capabilities to the 1.2 branch.


-DON




Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread Don Kerr
It would be difficult for me to attend this afternoon.  Tomorrow is much 
better for me.


-DON

George Bosilca wrote:


I'm available this afternoon.

   george.

On Jun 7, 2007, at 2:35 PM, Galen Shipman wrote:



Are people available today to discuss this over the phone?

- Galen



On Jun 7, 2007, at 11:28 AM, Gleb Natapov wrote:


On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote:


) I expect you to revise the patch in order to propose a generic
solution or I'll trigger a vote against the patch. I vote to be
backed out of the trunk as it export way to much knowledge from the
Open IB BTL into the PML layer.


The patch solves real problem. If we want to back it out we need to
find
another solution. I also didn't like this change too much, but I
thought
about other solutions and haven't found something better that what
Galen did. If you have something in mind lets discuss it.

As a general comment this kind of discussion is why I prefer to send
significant changes as a patch to the list for discussion before
committing.



  george.

PS: With Gleb changes the problem is the same. The following snippet
reflect exactly the same behavior as the original patch.


I didn't try to change the semantic. Just make the code to match the
semantic that Galen described.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel