Re: [OMPI devel] Infiniband memory usage with XRC

2010-05-23 Thread Pavel Shamis (Pasha)

~ 2300 KB - is it difference per machine or per MPI process ?
In OMPI XRC mode we allocate some additional resources that may consume 
some memory (the hash table), but even so ~2M sounds too much for me. 
When I will have time I will try to calculate the "resonable" difference.


Pasha

Sylvain Jeaugey wrote:

On Mon, 17 May 2010, Pavel Shamis (Pasha) wrote:


Sylvain Jeaugey wrote:
The XRC protocol seems to create shared receive queues, which is a 
good thing. However, comparing memory used by an "X" queue versus 
and "S" queue, we can see a large difference. Digging a bit into 
the code, we found some

So, do you see that X consumes more that S ? This is really odd.

Yes, but that's what we see. At least after MPI_Init.

What is the difference (in Kb)?
At 32 nodes x 32 cores (1024 MPI processes), I get a difference of 
~2300 KB in favor of "S,65536,16,4,1" versus "X,65536,16,4,1".


The proposed patch doesn't seem to solve the problem however, there's 
still something that's taking more memory than expected.


Sylvain





Re: [OMPI devel] Infiniband memory usage with XRC

2010-05-17 Thread Pavel Shamis (Pasha)

Sylvain Jeaugey wrote:
The XRC protocol seems to create shared receive queues, which is a 
good thing. However, comparing memory used by an "X" queue versus 
and "S" queue, we can see a large difference. Digging a bit into the 
code, we found some

So, do you see that X consumes more that S ? This is really odd.

Yes, but that's what we see. At least after MPI_Init.

What is the difference (in Kb)?


strange things, like the completion queue size not being the same as 
"S" queues (the patch below would fix it, but the root of the 
problem may be elsewhere).


Is anyone able to comment on this ?

The fix looks ok, please submit it to trunk.
I don't have an account to do this, so I'll let maintainers push it 
into SVN.

Ok, I will push it.


BTW do you want to prepare the patch for send queue size factor ? It 
should be quite simple.
Maybe we can do this. However, we are a little playing with parameters 
and code without really knowing the deep consequences of what we do. 
Therefore, I would feel more confortable if someone who knows much on 
the openib btl confirms it's not breaking everything.
Well, please feel free to submit a patch for review. Also if you see any 
other issues with XRC, MLNX will be happy to help.


Regards,
Pasha.


Re: [OMPI devel] Infiniband memory usage with XRC

2010-05-17 Thread Pavel Shamis (Pasha)

Please see below.



When using XRC queues, Open MPI is indeed creating only one XRC queue 
per node (instead of per-host). The problem is that the number of send 
elements in this queue is multiplied by the number of processes on the 
remote host.


So, what are we getting from this ? Not much, except that we can 
reduce the sd_max parameter to 1 element, and still have 8 elements in 
the send queue (on 8 cores machines), which may still be ok on the 
performance side.
Don't forget the the QP object itself consume some memory on 
driver/verbs level.
BUT , but I agree that we need to provide more flexibility and it will 
be nice that default multiply coefficient will be  smaller , as well I 
think we need to make it user tunable parameter (yep, one more parameter).


Send queues are created lazily, so having a lot of memory for send 
queues is not necessary blocking. What's


blocking is the receive queues, because they are created during 
MPI_Init, so in a way, they are the "basic fare" of MPI.
BTW SRQ resources are also allocated on demand. We start with very small 
SRQ and it is increased on SRQ limit event.




The XRC protocol seems to create shared receive queues, which is a 
good thing. However, comparing memory used by an "X" queue versus and 
"S" queue, we can see a large difference. Digging a bit into the code, 
we found some

So, do you see that X consumes more that S ? This is really odd.
strange things, like the completion queue size not being the same as 
"S" queues (the patch below would fix it, but the root of the problem 
may be elsewhere).


Is anyone able to comment on this ?

The fix looks ok, please submit it to trunk.
BTW do you want to prepare the patch for send queue size factor ? It 
should be quite simple.


Regards,
Pasha


Re: [OMPI devel] v1.4 broken

2010-02-17 Thread Pavel Shamis (Pasha)

I'm checking this issue.

Pasha

Joshua Hursey wrote:

I just noticed that the nightly tarball of v1.4 failed to build in the OpenIB 
BTL last night. The error was:

-
btl_openib_component.c: In function 'init_one_device':
btl_openib_component.c:2089: error: 'mca_btl_openib_component_t' has no member 
named 'default_recv_qps'
-

It looks like CMR #2251 is the problem.

-- Josh
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




Re: [OMPI devel] #2163: 1.5 blocker

2010-01-14 Thread Pavel Shamis (Pasha)

Jeff Squyres wrote:

Note that I just moved #2163 into "blocker" state because the bug breaks any 
non-SRQ-capable OpenFabrics device (e.g., both Chelsio and Neteffect RNICs).  The bug came in 
recently with the "resize SRQ" patches.

Mellanox / IBM: we are branching for 1.5 tomorrow.  Can this bug be fixed 
before then?

https://svn.open-mpi.org/trac/ompi/ticket/2163
  
Vasily will submit the patch in few minutes. Can you please test it on 
Iwarp devices ?

Pasha.



Re: [OMPI devel] Question about ompi_proc_t

2009-12-08 Thread Pavel Shamis (Pasha)



Both of these types (mca_pml_endpoint_t and mca_pml_base_endpoint_t) are 
meaningless, they can safely be replaced by void*. We have them clearly typed 
(but with just for the sake of understanding, so one can easily figure out what 
is supposed to be stored in this specific field. As such, we can remove one of 
them (mca_pml_base_endpoint_t) and use the other one (mca_pml_endpoint_t) 
everywhere.
  
George, thank you for clarification! For me it sound like a good idea to 
leave only one of them.

I wonder what is exactly the reason that drives your questions?
  

The question was raised during internal top-down code review.

Regards,
Pasha

  

George,
Actually My original question was correct.

In the ompi code base I found ONLY two places where we "use" the structure.
Actually we only assign values for the pointer in DR and CM PML:

ompi/mca/pml/cm/pml_cm.c:145:procs[i]->proc_pml = (struct 
mca_pml_base_endpoint_t*) endpoints[i];
ompi/mca/pml/dr/pml_dr.c:264:procs[i]->proc_pml = (struct 
mca_pml_base_endpoint_t*) endpoint;

I do not see any struct definiton/declaration mca_pml_base_endpoint_t in the 
OMPI code at all.

But I do see the "struct mca_pml_endpoint_t;" declaration in pml.h. As well, I comment 
that says: "A pointer to an mca_pml_endpoint_t is maintained on each ompi_proc_t". So it 
looks that the idea was to use use mca_pml_endpoint_t on the ompi_proc_t and not 
mca_pml_base_endpoint_t, is not it ?

Thanks !

Pasha

George Bosilca wrote:


Actually your answer is correct. The endpoint is defined down below in the PML. 
In addition, I think only the MTL and the DR PML use it, all OB1 derivative 
completely ignore it.

 george.

On Dec 7, 2009, at 08:30 , Timothy Hayes wrote:

 
  

Sorry, I think I read your question too quickly. Ignore me. :-)

2009/12/7 Timothy Hayes <haye...@tcd.ie>
Is it not a forward definition and then defined in the PML components 
individually based on their own requirements?

2009/12/7 Pavel Shamis (Pasha) <pash...@gmail.com>

In the ompi_proc_t structure (ompi/proc/proc.h:54) we keep pointer to proc_pml - "struct 
mca_pml_base_endpoint_t* proc_pml" . I tired to find definition for "struct 
mca_pml_base_endpoint_t" , but I failed. Does somebody know where is it defined ?

Regards,
Pasha
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
   

 
  



  




Re: [OMPI devel] Question about ompi_proc_t

2009-12-08 Thread Pavel Shamis (Pasha)

George,
Actually My original question was correct.

In the ompi code base I found ONLY two places where we "use" the structure.
Actually we only assign values for the pointer in DR and CM PML:

ompi/mca/pml/cm/pml_cm.c:145:procs[i]->proc_pml = (struct 
mca_pml_base_endpoint_t*) endpoints[i];
ompi/mca/pml/dr/pml_dr.c:264:procs[i]->proc_pml = (struct 
mca_pml_base_endpoint_t*) endpoint;


I do not see any struct definiton/declaration mca_pml_base_endpoint_t in 
the OMPI code at all.


But I do see the "struct mca_pml_endpoint_t;" declaration in pml.h. As 
well, I comment that says: "A pointer to an mca_pml_endpoint_t is 
maintained on each ompi_proc_t". So it looks that the idea was to use 
use mca_pml_endpoint_t on the ompi_proc_t and not 
mca_pml_base_endpoint_t, is not it ?


Thanks !

Pasha

George Bosilca wrote:

Actually your answer is correct. The endpoint is defined down below in the PML. 
In addition, I think only the MTL and the DR PML use it, all OB1 derivative 
completely ignore it.

  george.

On Dec 7, 2009, at 08:30 , Timothy Hayes wrote:

  

Sorry, I think I read your question too quickly. Ignore me. :-)

2009/12/7 Timothy Hayes <haye...@tcd.ie>
Is it not a forward definition and then defined in the PML components 
individually based on their own requirements?

2009/12/7 Pavel Shamis (Pasha) <pash...@gmail.com>

In the ompi_proc_t structure (ompi/proc/proc.h:54) we keep pointer to proc_pml - "struct 
mca_pml_base_endpoint_t* proc_pml" . I tired to find definition for "struct 
mca_pml_base_endpoint_t" , but I failed. Does somebody know where is it defined ?

Regards,
Pasha
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




  




[OMPI devel] Question about ompi_proc_t

2009-12-07 Thread Pavel Shamis (Pasha)
In the ompi_proc_t structure (ompi/proc/proc.h:54) we keep pointer to 
proc_pml - "struct mca_pml_base_endpoint_t* proc_pml" . I tired to find 
definition for "struct mca_pml_base_endpoint_t" , but I failed. Does 
somebody know where is it defined ?


Regards,
Pasha


Re: [OMPI devel] [PATCH] Not optimal SRQ resource allocation

2009-12-06 Thread Pavel Shamis (Pasha)

Jeff,
During the original code review we found ,that by default we allocate 
SRQ with size "rd_num + sd_max" but
on the SRQ we post only rd_num receive entries. It means that we do not 
fill the queue completely. Looks like a bug.


Pasha

Jeff Squyres wrote:

SRQ hardware vendors -- please review and reply...

More below.


On Dec 2, 2009, at 10:20 AM, Vasily Philipov wrote:

  

diff -r a5938d9dcada ompi/mca/btl/openib/btl_openib.c
--- a/ompi/mca/btl/openib/btl_openib.c  Mon Nov 23 19:00:16 2009 -0800
+++ b/ompi/mca/btl/openib/btl_openib.c  Wed Dec 02 16:24:55 2009 +0200
@@ -214,6 +214,7 @@
static int create_srq(mca_btl_openib_module_t *openib_btl)
{
int qp;
+int32_t rd_num, rd_curr_num; 


/* create the SRQ's */
for(qp = 0; qp < mca_btl_openib_component.num_qps; qp++) {
@@ -242,6 +243,24 @@
   
ibv_get_device_name(openib_btl->device->ib_dev));
return OMPI_ERROR;
}
+
+rd_num = mca_btl_openib_component.qp_infos[qp].rd_num;
+rd_curr_num = openib_btl->qps[qp].u.srq_qp.rd_curr_num = 
mca_btl_openib_component.qp_infos[qp].u.srq_qp.rd_init;
+
+if(true == mca_btl_openib_component.enable_srq_resize) {
+if(0 == rd_curr_num) {
+openib_btl->qps[qp].u.srq_qp.rd_curr_num = 1;
+}
+
+openib_btl->qps[qp].u.srq_qp.rd_low_local = rd_curr_num - 
(rd_curr_num >> 2);
+openib_btl->qps[qp].u.srq_qp.srq_limit_event_flag = true;
+} else {
+openib_btl->qps[qp].u.srq_qp.rd_curr_num = rd_num;
+openib_btl->qps[qp].u.srq_qp.rd_low_local = 
mca_btl_openib_component.qp_infos[qp].rd_low;
+/* Not used in this case, but we don't need a garbage */
+mca_btl_openib_component.qp_infos[qp].u.srq_qp.srq_limit = 0;
+openib_btl->qps[qp].u.srq_qp.srq_limit_event_flag = false;
+}
}
}

diff -r a5938d9dcada ompi/mca/btl/openib/btl_openib.h
--- a/ompi/mca/btl/openib/btl_openib.h  Mon Nov 23 19:00:16 2009 -0800
+++ b/ompi/mca/btl/openib/btl_openib.h  Wed Dec 02 16:24:55 2009 +0200
@@ -87,6 +87,12 @@

struct mca_btl_openib_srq_qp_info_t {
int32_t sd_max;
+/* The init value for rd_curr_num variables of all SRQs */
+int32_t rd_init;
+/* The watermark, threshold - if the number of WQEs in SRQ is less then this 
value =>
+   the SRQ limit event (IBV_EVENT_SRQ_LIMIT_REACHED) will be generated on 
corresponding SRQ.
+   As result the maximal number of pre-posted WQEs on the SRQ will be 
increased */
+int32_t srq_limit;
}; typedef struct mca_btl_openib_srq_qp_info_t mca_btl_openib_srq_qp_info_t;

struct mca_btl_openib_qp_info_t {
@@ -254,6 +260,8 @@
ompi_free_list_t recv_user_free;
/**< frags for coalesced massages */
ompi_free_list_t send_free_coalesced;
+/**< Whether we want a dynamically resizing srq, enabled by default */
+bool enable_srq_resize;



/**< means that the comment refers to the field above.  I think you mean /* or /** 
here (although we haven't used doxygen for a long, long time).  I see that other 
fields are incorrectly marked /**<, but please don't propagate the badness. ;-)


  

}; typedef struct mca_btl_openib_component_t mca_btl_openib_component_t;

OMPI_MODULE_DECLSPEC extern mca_btl_openib_component_t mca_btl_openib_component;
@@ -348,6 +356,16 @@
int32_t sd_credits;  /* the max number of outstanding sends on a QP when 
using SRQ */
 /*  i.e. the number of frags that  can be outstanding 
(down counter) */
opal_list_t pending_frags[2];/**< list of high/low prio frags */
+/**< The number of max rd that we can post in the current time.
+ The value may be increased in the IBV_EVENT_SRQ_LIMIT_REACHED
+ event handler. The value starts from (rd_num / 4) and increased up to 
rd_num */
+int32_t rd_curr_num;



The comment says "max", but the field name is "curr"[ent]?  Seems a little odd.

  

+/**< We post additional WQEs only if a number of WQEs (in specific SRQ) is 
less of this value.
+ The value increased together with rd_curr_num. The value is unique 
for every SRQ. */
+int32_t rd_low_local;
+/**< The flag points if we want to get the 
+ IBV_EVENT_SRQ_LIMIT_REACHED events for dynamically resizing SRQ */

+bool srq_limit_event_flag;



Can you explain how this is different than enable_srq_resize?

  

}; typedef struct mca_btl_openib_module_srq_qp_t mca_btl_openib_module_srq_qp_t;

struct mca_btl_openib_module_qp_t {
diff -r a5938d9dcada ompi/mca/btl/openib/btl_openib_async.c
--- a/ompi/mca/btl/openib/btl_openib_async.cMon Nov 23 19:00:16 2009 -0800
+++ b/ompi/mca/btl/openib/btl_openib_async.cWed Dec 02 16:24:55 2009 +0200
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2008 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2008-2009 Mellanox Technologies. All 

Re: [OMPI devel] [Fwd: strong ordering for data registered memory]

2009-11-13 Thread Pavel Shamis (Pasha)

Jeff Squyres wrote:

On Nov 11, 2009, at 8:13 AM, Terry Dontje wrote:


Sun's IB group has asked me to forward the following email to see if
anyone has any comments on this email.



Tastes great / less filling. :-)

I think (assume) we'll be happy to implement changes like this that 
come from the upstream OpenFabrics verbs API (I see that this mail was 
first directed to the linux-rdma list, which is where the OF verbs API 
is discussed).


Sounds good. Now it is only left to convince OFA community to make this 
changes.


Pasha


Re: [OMPI devel] Adding support for RDMAoE devices.

2009-11-01 Thread Pavel Shamis (Pasha)

Jeff,
Can you please check that we do not brake iwarp devices functionality ?

Pasha

Vasily Philipov wrote:
The attached patch adds support for RDMAoE (RDMA over Ethernet) 
devices to Openib BTL. The code changes are very minimal, actually we 
only modified the RDMACM code to provide better support for IB and 
RDMAoE devices. Please let me know if you have any comments.


Regards,Vasily.



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Trunk is brokem ?

2009-10-21 Thread Pavel Shamis (Pasha)

It was broken :-(
I fixed it - r22119

Pasha

Pavel Shamis (Pasha) wrote:

On my systems I see follow error:

gcc -DHAVE_CONFIG_H -I. -I../../../../opal/include 
-I../../../../orte/include -I../../../../ompi/include 
-I../../../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../../.. 
-O3 -DNDEBUG -Wall -Wundef -Wno-long-long -Wsign-compare 
-Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic 
-Werror-implicit-function-declaration -finline-functions 
-fno-strict-aliasing -pthread -fvisibility=hidden -MT sensor_pru.lo 
-MD -MP -MF .deps/sensor_pru.Tpo -c sensor_pru.c -fPIC -DPIC -o 
.libs/sensor_pru.o

sensor_pru_component.c: In function 'orte_sensor_pru_open':
sensor_pru_component.c:77: error: implicit declaration of function 
'opal_output'


Looks like the sensor code is broken.

Thanks,
Pasha
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





[OMPI devel] Trunk is brokem ?

2009-10-21 Thread Pavel Shamis (Pasha)

On my systems I see follow error:

gcc -DHAVE_CONFIG_H -I. -I../../../../opal/include 
-I../../../../orte/include -I../../../../ompi/include 
-I../../../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../../.. 
-O3 -DNDEBUG -Wall -Wundef -Wno-long-long -Wsign-compare 
-Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic 
-Werror-implicit-function-declaration -finline-functions 
-fno-strict-aliasing -pthread -fvisibility=hidden -MT sensor_pru.lo -MD 
-MP -MF .deps/sensor_pru.Tpo -c sensor_pru.c  -fPIC -DPIC -o 
.libs/sensor_pru.o

sensor_pru_component.c: In function 'orte_sensor_pru_open':
sensor_pru_component.c:77: error: implicit declaration of function 
'opal_output'


Looks like the sensor code is broken.

Thanks,
Pasha


Re: [OMPI devel] Device failover on ob1

2009-08-03 Thread Pavel Shamis (Pasha)

Rolf,
Did you compare latency/bw for failover-enabled code VS trunk ?

Pasha.

Rolf Vandevaart wrote:

Hi folks:

As some of you know, I have also been looking into implementing 
failover as well.  I took a different approach as I am solving the 
problem within the openib BTL itself.  This of course means that this 
only works for failing from one openib BTL to another but that was our 
area of interest.  This also means that we do not need to keep track 
of fragments as we get them back from the completion queue upon 
failure. We then extract the relevant information and repost on the 
other working endpoint.


My work has been progressing at http://bitbucket.org/rolfv/ompi-failover.

This only currently works for send semantics so you have to run with 
-mca btl_openib_flags 1.


Rolf

On 07/31/09 05:49, Mouhamed Gueye wrote:

Hi list,

Here is an update on our work concerning device failover.

As many of you suggested, we reoriented our work on ob1 rather than 
dr and we now have a working prototype on top of ob1. The approach is 
to store btl descriptors sent to peers and delete them when we 
receive proof of delivery. So far, we rely on completion callback 
functions, assuming that the message is delivered when the completion 
function is called, that is the case of openib. When a btl module 
fails, it is removed from the endpoint's btl list and the next one is 
used to retransmit stored descriptors. No extra-message is 
transmitted, it only consists in additions to the header. It has been 
mainly tested with two IB modules, in both multi-rail (two separate 
networks) and multi-path (a big unique network).


You can grab and test the patch here (applies on top of the trunk) :
http://bitbucket.org/gueyem/ob1-failover/

To compile with failover support, just define 
--enable-device-failover at configure. You can then run a benchmark, 
disconnect a port and see the failover operate.


A little latency increase (~ 2%) is induced by the failover layer 
when no failover occurs. To accelerate the failover process on 
openib, you can try to lower the btl_openib_ib_timeout openib 
parameter to 15 for example instead of 20 (default value).


Mouhamed
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel







Re: [OMPI devel] selectively bind MPI to one HCA out of available ones

2009-07-16 Thread Pavel Shamis (Pasha)

Hi,
You can select ib device used with openib btl by using follow parametres:
MCA btl: parameter "btl_openib_if_include" (current value: , data 
source: default value)
 Comma-delimited list of devices/ports to be 
used (e.g. "mthca0,mthca1:2"; empty value means to
 use all ports found).  Mutually exclusive with 
btl_openib_if_exclude.
MCA btl: parameter "btl_openib_if_exclude" (current value: , data 
source: default value)
 Comma-delimited list of device/ports to be 
excluded (empty value means to not exclude any
 ports).  Mutually exclusive with 
btl_openib_if_include.


For example, if you want to use first port on mthc0 you command line 
will look like:


mpirun -np. --mca btl_openib_if_include mthca0:1 

Pasha

nee...@crlindia.com wrote:


Hi all,

I have a cluster where both HCA's of blade are active, but 
connected to different subnet.
Is there an option in MPI to select one HCA out of available 
one's? I know it can be done by making changes in openmpi code, but i 
need clean interface like option during mpi launch time to select 
mthca0 or mthca1?


Any help is appreciated. Btw i just checked Mvapich and 
feature is there inside.


Regards

Neeraj Chourasia (MTS)
Computational Research Laboratories Ltd.
(A wholly Owned Subsidiary of TATA SONS Ltd)
B-101, ICC Trade Towers, Senapati Bapat Road
Pune 411016 (Mah) INDIA
(O) +91-20-6620 9863  (Fax) +91-20-6620 9862
M: +91.9225520634

=-=-= Notice: The information contained in this 
e-mail message and/or attachments to it may contain confidential or 
privileged information. If you are not the intended recipient, any 
dissemination, use, review, distribution, printing or copying of the 
information contained in this e-mail message and/or attachments to it 
are strictly prohibited. If you have received this communication in 
error, please notify us by reply e-mail or telephone and immediately 
and permanently delete the message and any attachments. Internet 
communications cannot be guaranteed to be timely, secure, error or 
virus-free. The sender does not accept liability for any errors or 
omissions.Thank you =-=-=




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Multi-rail on openib

2009-06-14 Thread Pavel Shamis (Pasha)

Nifty Tom Mitchell wrote:

On Tue, Jun 09, 2009 at 04:33:51PM +0300, Pavel Shamis (Pasha) wrote:
  
Open MPI currently needs to have connected fabrics, but maybe that's  
something we will like to change in the future, having two separate  
rails. (Btw Pasha, will your current work enable this ?)
  
I do not completely understand what do you mean here under two separate  
rails ...
Already today you may connect each port to different subnet, and ports  
in the same

subnet may talk to each other.




Subnet?  (subnet .vs. fabric)
  

About subnet id definition you may read here.
http://www.open-mpi.org/faq/?category=openfabrics#ofa-set-subnet-id


Does this imply tcp/ip
What IB protocols are involved and
Is there any agent that notices the disconnect and will trigger the switch?

  
In Open MPI we use RC (connected) protocol. On connection failure we get 
error
and handle it. If APM is enabled, Open MPI will try to migrate to other 
path , otherway we will fail.


Pasha.


Re: [OMPI devel] Multi-rail on openib

2009-06-09 Thread Pavel Shamis (Pasha)


Open MPI currently needs to have connected fabrics, but maybe that's 
something we will like to change in the future, having two separate 
rails. (Btw Pasha, will your current work enable this ?)
I do not completely understand what do you mean here under two separate 
rails ...
Already today you may connect each port to different subnet, and ports 
in the same

subnet may talk to each other.

Pasha.


Re: [OMPI devel] Open MPI mirrors list

2009-04-26 Thread Pavel Shamis (Pasha)

Thanks Jeff !

Jeff Squyres wrote:
We just added a mirror web site in Israel.  That one was fun because 
they requested to have their web tagline be in Hebrew.


Fun fact: with this addition, Open MPI now has 14 mirror web sites 
around the world.


http://www.open-mpi.org/community/mirrors/

Interested in becoming an Open MPI mirror?  See this web page:

http://www.open-mpi.org/community/mirrors/become-a-mirror.php






Re: [OMPI devel] ***SPAM*** Re: [ewg] Seg fault running OpenMPI-1.3.1rc4

2009-03-30 Thread Pavel Shamis (Pasha)

Steve,
If you will compile OMPI code with CFLAGS="-g" ,generate segfault 
core_file and send the core + IMB-MPI1 to me I will be able to 
understand the problem better.


Regards,
Pasha

Steve Wise wrote:


Hey Pasha,


I just applied r20872 and retested, and I still hit this seg fault.  
So I think this is a new bug.


Lemme pull the trunk and try that.



Pavel Shamis (Pasha) wrote:
I think you problem is related to this bug: 
https://svn.open-mpi.org/trac/ompi/ticket/1823


And it is resolved on the ompi-trunk.

Pasha.

Steve Wise wrote:
When this happens, that node logs this type of message also in 
/var/log/messages:


IMB-MPI1[8859]: segfault at 0018 rip 2b7bfc880800 
rsp 7fffb1021330 error 4


Steve Wise wrote:

Hey Jeff,

Have you seen this?  I'm hitting this regularly running on 
ofed-1.4.1-rc2.


Test:
[ompi@vic12 ~]$ cat doit-ompi
#!/bin/sh
while : ; do
   mpirun -np 16 --host vic12-10g,vic20-10g,vic9-10g,vic21-10g 
--mca btl openib,self,sm  --mca btl_openib_max_btls 1 
/usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 -npmin 16  
bcast scatter sendrecv exchange 
done


Seg Fault output:

[vic21:04047] *** Process received signal ***
[vic21:04047] Signal: Segmentation fault (11)
[vic21:04047] Signal code: Address not mapped (1)
[vic21:04047] Failing at address: 0x18
[vic21:04047] [ 0] /lib64/libpthread.so.0 [0x3dde20e4c0]
[vic21:04047] [ 1] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
[0x2b911bc33800]
[vic21:04047] [ 2] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
[0x2b911bc38c2d]
[vic21:04047] [ 3] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
[0x2b911bc33fcb]
[vic21:04047] [ 4] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
[0x2b911bc22af8]
[vic21:04047] [ 5] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libopen-pal.so.0(mca_base_components_close+0x83) 
[0x2b911933da33]
[vic21:04047] [ 6] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0(mca_btl_base_close+0xe0) 
[0x2b9118ea3fb0]
[vic21:04047] [ 7] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_bml_r2.so 
[0x2b911ba1938f]
[vic21:04047] [ 8] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_pml_ob1.so 
[0x2b911b601cde]
[vic21:04047] [ 9] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0 
[0x2b9118e7241b]
[vic21:04047] [10] 
/usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1(main+0x178) 
[0x403498]
[vic21:04047] [11] /lib64/libc.so.6(__libc_start_main+0xf4) 
[0x3ddd61d974]
[vic21:04047] [12] 
/usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 [0x403269]

[vic21:04047] *** End of error message ***

___
ewg mailing list
e...@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
ewg mailing list
e...@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


___
ewg mailing list
e...@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg





Re: [OMPI devel] [ewg] Seg fault running OpenMPI-1.3.1rc4

2009-03-30 Thread Pavel Shamis (Pasha)
I think you problem is related to this bug: 
https://svn.open-mpi.org/trac/ompi/ticket/1823


And it is resolved on the ompi-trunk.

Pasha.

Steve Wise wrote:
When this happens, that node logs this type of message also in 
/var/log/messages:


IMB-MPI1[8859]: segfault at 0018 rip 2b7bfc880800 rsp 
7fffb1021330 error 4


Steve Wise wrote:

Hey Jeff,

Have you seen this?  I'm hitting this regularly running on 
ofed-1.4.1-rc2.


Test:
[ompi@vic12 ~]$ cat doit-ompi
#!/bin/sh
while : ; do
   mpirun -np 16 --host vic12-10g,vic20-10g,vic9-10g,vic21-10g 
--mca btl openib,self,sm  --mca btl_openib_max_btls 1 
/usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 -npmin 16  bcast 
scatter sendrecv exchange 
done


Seg Fault output:

[vic21:04047] *** Process received signal ***
[vic21:04047] Signal: Segmentation fault (11)
[vic21:04047] Signal code: Address not mapped (1)
[vic21:04047] Failing at address: 0x18
[vic21:04047] [ 0] /lib64/libpthread.so.0 [0x3dde20e4c0]
[vic21:04047] [ 1] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
[0x2b911bc33800]
[vic21:04047] [ 2] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
[0x2b911bc38c2d]
[vic21:04047] [ 3] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
[0x2b911bc33fcb]
[vic21:04047] [ 4] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
[0x2b911bc22af8]
[vic21:04047] [ 5] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libopen-pal.so.0(mca_base_components_close+0x83) 
[0x2b911933da33]
[vic21:04047] [ 6] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0(mca_btl_base_close+0xe0) 
[0x2b9118ea3fb0]
[vic21:04047] [ 7] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_bml_r2.so 
[0x2b911ba1938f]
[vic21:04047] [ 8] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_pml_ob1.so 
[0x2b911b601cde]
[vic21:04047] [ 9] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0 
[0x2b9118e7241b]
[vic21:04047] [10] 
/usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1(main+0x178) 
[0x403498]
[vic21:04047] [11] /lib64/libc.so.6(__libc_start_main+0xf4) 
[0x3ddd61d974]
[vic21:04047] [12] 
/usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 [0x403269]

[vic21:04047] *** End of error message ***

___
ewg mailing list
e...@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [OMPI devel] Meta Question -- Open MPI: Is it a dessert toppingor is it a floor wax?

2009-03-13 Thread Pavel Shamis (Pasha)

I would like to add my 0.02 to this discussion.
The "code stability" it is not something that should stop project from
development and progress. During MPI Project live we already saw some
pretty critical changes (pml/btl/etc...) and as result after all we have
more stable and more optimized MPI. OMPI already have defined
workflow that allow to us handle big code changes without loosing  code
stability for a long period of time.  As I understand the btl movement
will make the code more modular and will open window for new features
and optimizations.
So if the BTL movement will not effect negatively on MPI performance and
code stability (for a long period of time, temporary unstability may be
reasonable) I do not see any constructive reason to block this
modification. If I understand correct the code will go to the
feature_oriented branch and we will have enough time to run QA cycles
before it will be moved to super_stable mode.
And after all I hope that we will get more stable,modular and feature
rich MPI implementation.

Thanks,
Pasha


Brian W. Barrett wrote:
I'm going to stay out of the debate about whether Andy correctly 
characterized the two points you brought up as a distributed OS or not.


Sandia's position on these two points remains the same as I previously 
stated when the question was distributed OS or not.  The primary goal 
of the Open MPI project was and should remain to be the best MPI 
project available.  Low-cost items to support different run-times or 
different non-MPI communication contexts are worth the work.  But 
high-cost items should be avoided, as they degrade our ability to 
provide the best MPI project available (of course, others, including 
OMPI developers, can take the source and do what they wish outside the 
primary development tree).


High performance is a concern, but so is code maintainability.  If it 
takes twices as long to implement feature A because I have to worry 
about it's impact not only on MPI, but also on projects X, Y, Z, as an 
MPI developer, I've lost something important.


Brian

On Thu, 12 Mar 2009, Richard Graham wrote:

I am assuming that by distributed OS you are referring to the changes 
that

we (not just ORNL) are trying to do.  If this is the case, this is a
mischaracterization of the of out intentions.  We have two goals

 - To be able to use a different run-time than ORTE to drive Open MPI
 - To use the communication primitives outside the context of MPI 
(with or

without ORTE)

High performance is critical, and at NO time have we ever said anything
about sacrificing performance - these have been concerns that others
(rightfully) have expressed.

Rich


On 3/12/09 8:24 AM, "Jeff Squyres"  wrote:


I think I have to agree with Terry.

I love to inspire and see new, original, and unintended uses for Open
MPI.  But our primary focus must remain to create, maintain, and
continue to deliver a high performance MPI implementation.

We have a long history of adding "small" things to Open MPI that are
useful to 3rd parties because it helps them, helps further Open MPI
adoption/usefulness, and wasn't difficult for us to do ("small" can
have varying definitions).  I'm in favor of such things, as long as we
maintain a policy of "in cases of conflict, OMPI/high performance MPI
wins".


On Mar 12, 2009, at 9:01 AM, Terry Dontje wrote:


Sun's participation in this community was to obtain a stable and
performant MPI implementation that had some research work done on the
side to improve those goals and the introduction of new features.   We
don't have problems with others using and improving on the OMPI code
base but we need to make sure such usage doesn't detract from our
primary goal of performant MPI implementation.

However, changes to the OMPI code base to allow it to morph or even
support a distributed OS does cause for some concern.  That is are we
opening the door to having more interfaces to support?  If so is this
wise in the fact that it seems to me we have a hard enough time trying
to focus on the MPI items?  Not to mention this definitely starts
detracting from the original goals.

--td

Andrew Lumsdaine wrote:

Hi all -- There is a meta question that I think is underlying some

of

the discussion about what to do with BTLs etc.  Namely, is Open

MPI an

MPI implementation with a portable run time system -- or is it a
distributed OS with an MPI interface?  It seems like some of the
changes being asked for (e.g., with the BTLs) reflect the latter --
but perhaps not everyone shares that view and hence the impedance
mismatch.

I doubt this is the last time that tensions will come up because of
differing views on this question.

I suggest that we come to some kind of common understanding of the
question (and answer) and structure development and administration
accordingly.

Best Regards,
Andrew Lumsdaine

___
devel mailing list
de...@open-mpi.org

Re: [OMPI devel] VT compile error: Fwd: [ofa-general] OFED 1.4.1 (rc1) is available

2009-03-05 Thread Pavel Shamis (Pasha)
Adding pointer to OFED bugzilla ticket for more information: 
https://bugs.openfabrics.org/show_bug.cgi?id=1537


Jeff Squyres wrote:

VT guys --

It looks like we still have a compile bug in OMPI 1.3.1rc4...  See below.

Do you think you can get a fix ASAP for OMPI 1.3.1final?


Begin forwarded message:


*From: *"PN" >
*Date: *March 5, 2009 12:51:28 AM EST
*To: *"Tziporet Koren" >
*Cc: *>, 
>

*Subject:** Re: [ofa-general] OFED 1.4.1 is available*

HI,

I have a build error of OFED-1.4.1-rc1 under CentOS 5.2:

.
Build openmpi_gcc RPM
Running  rpmbuild --rebuild  --define '_topdir /var/tmp/OFED_topdir' 
--define 'dist %{nil}' --target x86_64 --define '_name openmpi_gcc' 
--define 'mpi_selector /usr/bin/mpi-selector' --define 
'use_mpi_selector 1' --define 'install_shell_scripts 1' --define 
'shell_scripts_basename mpivars' --define '_usr /usr' --define 'ofed 
0' --define '_prefix /usr/mpi/gcc/openmpi-1.3.1rc4' --define 
'_defaultdocdir /usr/mpi/gcc/openmpi-1.3.1rc4' --define '_mandir 
%{_prefix}/share/man' --define 'mflags -j 4' --define 
'configure_options   --with-openib=/usr 
--with-openib-libdir=/usr/lib64  CC=gcc CXX=g++ F77=gfortran 
FC=gfortran --enable-mpirun-prefix-by-default' --define 
'use_default_rpm_opt_flags 1' 
/opt/software/packages/ofed/OFED-1.4.1-rc1/OFED-1.4.1-rc1/SRPMS/openmpi-1.3.1rc4-1.src.rpm

Failed to build openmpi RPM
See /tmp/OFED.28377.logs/openmpi.rpmbuild.log


In /tmp/OFED.28377.logs/openmpi.rpmbuild.log:
.
gcc -DHAVE_CONFIG_H -I. -I.. -I../tools/opari/lib 
-I../extlib/otf/otflib -I../extlib/otf/otflib -D_GNU_SOURCE 
-DBINDIR=\"/usr/mpi/gcc/openmpi-1.3.1rc4/bin\" 
-DDATADIR=\"/usr/mpi/gcc/openmpi-1.3.1rc4/share\" -DRFG  -DVT_MEMHOOK 
-DVT_IOWRAP  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -MT 
vt_iowrap_helper.o -MD -MP -MF .deps/vt_iowrap_helper.Tpo -c -o 
vt_iowrap_helper.o vt_iowrap_helper.c

mv -f .deps/vt_memhook.Tpo .deps/vt_memhook.Po
gcc -DHAVE_CONFIG_H -I. -I.. -I../tools/opari/lib 
-I../extlib/otf/otflib -I../extlib/otf/otflib -D_GNU_SOURCE 
-DBINDIR=\"/usr/mpi/gcc/openmpi-1.3.1rc4/bin\" 
-DDATADIR=\"/usr/mpi/gcc/openmpi-1.3.1rc4/share\" -DRFG  -DVT_MEMHOOK 
-DVT_IOWRAP  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -MT 
rfg_regions.o -MD -MP -MF .deps/rfg_regions.Tpo -c -o rfg_regions.o 
rfg_regions.c
vt_iowrap.c:1242: error: expected declaration specifiers or '...' 
before numeric constant

vt_iowrap.c:1243: error: conflicting types for '__fprintf_chk'
mv -f .deps/vt_iowrap_helper.Tpo .deps/vt_iowrap_helper.Po
make[5]: *** [vt_iowrap.o] Error 1
make[5]: *** Waiting for unfinished jobs
mv -f .deps/vt_comp_gnu.Tpo .deps/vt_comp_gnu.Po
mv -f .deps/rfg_regions.Tpo .deps/rfg_regions.Po
make[5]: Leaving directory 
`/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi/contrib/vt/vt/vtlib'

make[4]: *** [all-recursive] Error 1
make[4]: Leaving directory 
`/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi/contrib/vt/vt'

make[3]: *** [all] Error 2
make[3]: Leaving directory 
`/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi/contrib/vt/vt'

make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory 
`/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi/contrib/vt'

make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory 
`/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi'

make: *** [all-recursive] Error 1
error: Bad exit status from /var/tmp/rpm-tmp.40739 (%build)


RPM build errors:
user jsquyres does not exist - using root
group eng10 does not exist - using root
user jsquyres does not exist - using root
group eng10 does not exist - using root
Bad exit status from /var/tmp/rpm-tmp.40739 (%build)


The error seems similar to 
http://www.open-mpi.org/community/lists/devel/2009/01/5230.php


Regards,
PN


2009/3/5 Tziporet Koren >


Hi,

OFED-1.4.1-rc1 release is available on

_http://www.openfabrics.org/downloads/OFED/ofed-1.4.1/OFED-1.4.1-rc1.tgz_


To get BUILD_ID run ofed_info

Please report any issues in bugzilla
_https://bugs.openfabrics.org/_ for OFED 1.4.1

Vladimir & Tziporet




Release information:

--

Linux Operating Systems:

   - RedHat EL4 up4:  2.6.9-42.ELsmp  *

   - RedHat EL4 up5:  2.6.9-55.ELsmp

   - RedHat EL4 up6:  2.6.9-67.ELsmp

   - RedHat EL4 up7:2.6.9-78.ELsmp

   - RedHat EL5:2.6.18-8.el5

   - RedHat EL5 up1:  2.6.18-53.el5

   - RedHat EL5 up2:  

Re: [OMPI devel] 1.3.1rc3 was borked; 1.3.1rc4 is out

2009-03-05 Thread Pavel Shamis (Pasha)
I tried to build latest OFED with new ompi rc4, but is looks that vtune 
code is broken again ?


gcc -DHAVE_CONFIG_H -I. -I.. -I../tools/opari/lib -I../extlib/otf/otflib 
-I../extlib/otf/otflib -D_GNU_SOURCE 
-DBINDIR=\"/usr/local/mpi/gcc/openmpi-1.3.1rc4/bin\" 
-DDATADIR=\"/usr/local/mpi/gcc/openmpi-1.3.1rc4/share\" -DRFG  
-DVT_MEMHOOK -DVT_IOWRAP  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 
-march=i386 -mtune=generic -fasynchronous-unwind-tables -MT 
vt_iowrap_helper.o -MD -MP -MF .deps/vt_iowrap_helper.Tpo -c -o 
vt_iowrap_helper.o vt_iowrap_helper.c
gcc -DHAVE_CONFIG_H -I. -I.. -I../tools/opari/lib -I../extlib/otf/otflib 
-I../extlib/otf/otflib -D_GNU_SOURCE 
-DBINDIR=\"/usr/local/mpi/gcc/openmpi-1.3.1rc4/bin\" 
-DDATADIR=\"/usr/local/mpi/gcc/openmpi-1.3.1rc4/share\" -DRFG  
-DVT_MEMHOOK -DVT_IOWRAP  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 
-march=i386 -mtune=generic -fasynchronous-unwind-tables -MT 
rfg_regions.o -MD -MP -MF .deps/rfg_regions.Tpo -c -o rfg_regions.o 
rfg_regions.c
*vt_iowrap.c:1242: error: expected declaration specifiers or '...' 
before numeric constant

vt_iowrap.c:1243: error: conflicting types for '__fprintf_chk'
*gcc -DHAVE_CONFIG_H -I. -I.. -I../tools/opari/lib 
-I../extlib/otf/otflib -I../extlib/otf/otflib -D_GNU_SOURCE 
-DBINDIR=\"/usr/local/mpi/gcc/openmpi-1.3.1rc4/bin\" 
-DDATADIR=\"/usr/local/mpi/gcc/openmpi-1.3.1rc4/share\" -DRFG  
-DVT_MEMHOOK -DVT_IOWRAP  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 
-march=i386 -mtune=generic -fasynchronous-unwind-tables -MT rfg_filter.o 
-MD -MP -MF .deps/rfg_filter.Tpo -c -o rfg_filter.o rfg_filter.c

make[5]: *** [vt_iowrap.o] Error 1
make[5]: *** Waiting for unfinished jobs
mv -f .deps/vt_iowrap_helper.Tpo .deps/vt_iowrap_helper.Po
mv -f .deps/rfg_filter.Tpo .deps/rfg_filter.Po
mv -f .deps/rfg_regions.Tpo .deps/rfg_regions.Po
make[5]: Leaving directory 
`/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi/contrib/vt/vt/vtlib'

make[4]: *** [all-recursive] Error 1
make[4]: Leaving directory 
`/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi/contrib/vt/vt'

make[3]: *** [all] Error 2
make[3]: Leaving directory 
`/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi/contrib/vt/vt'

make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory 
`/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi/contrib/vt'

make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory 
`/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi'

make: *** [all-recursive] Error 1
error: Bad exit status from /var/tmp/rpm-tmp.43206 (%build)



Ralph H. Castain wrote:

Looks okay to me Brian - I went ahead and filed the CMR and sent it on to
Brad for approval.

Ralph


  

On Tue, 3 Mar 2009, Brian W. Barrett wrote:



On Tue, 3 Mar 2009, Jeff Squyres wrote:

  

1.3.1rc3 had a race condition in the ORTE shutdown sequence.  The only
difference between rc3 and rc4 was a fix for that race condition.
Please
test ASAP:

  http://www.open-mpi.org/software/ompi/v1.3/


I'm sorry, I've failed to test rc1 & rc2 on Catamount.  I'm getting a
compile
failure in the ORTE code.  I'll do a bit more testing and send Ralph an
e-mail this afternoon.
  

Attached is a patch against v1.3 branch that makes it work on Red Storm.
I'm not sure it's right, so I'm just e-mailing it rather than committing..
Sorry Ralph, but can you take a look? :(

Brian___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




Re: [OMPI devel] Fwd: [OMPI users] OpenMPI with openib partitions

2008-10-07 Thread Pavel Shamis (Pasha)

Matt,
For all 1.2.X versions you should use btl_openib_ib_pkey_val
In ongoing 1.3 version the parameter was renamed to btl_openib_of_pkey_val.

BTW we plan to release 1.2.8 version very soon and it will include the 
partition bug fix.


Regards,
Pasha

Matt Burgess wrote:

Pasha,

With your patch and parameter suggestion, it works! So to be clear 
btl_openib_ib_pkey_val is for 1.2.6 and btl_openib_of_pkey_val is for 
1.2.7?


Thanks again,
Matt

On Tue, Oct 7, 2008 at 12:24 PM, Pavel Shamis (Pasha) 
<pa...@dev.mellanox.co.il <mailto:pa...@dev.mellanox.co.il>> wrote:


Matt,
Can you please run " cat
/sys/class/infiniband/mlx4_0/ports/1/pkeys/* " on your d2-ib,d3-ib.
I would like to check the partition configuration.

Ohh, BTW I see that the command line in previous email was wrong,
Please use follow command line (the parameter name should be
"btl_openib_ib_pkey_val" for ompi-1.2.6 and my patch accepts
HEX/DEC values):
/opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib -mca btl
openib,self -mca btl_openib_ib_pkey_val 0x8109
/cluster/pallas/x86_64-ib/IMB-MPI1

Ompi 1.2.6 version should work ok with this patch.


Thanks,
Pasha

Matt Burgess wrote:

Pasha,

Thanks for the patch. Unfortunately, it doesn't seem like that
fixed the problem. I realized earlier I didn't mention what
version of OpenMPI I was trying - it's 1.2.6. <http://1.2.6.>
<http://1.2.6.> Should I be trying 1.2.7 with this patch?

Thanks,
    Matt

2008/10/7 Pavel Shamis (Pasha) <pa...@dev.mellanox.co.il
<mailto:pa...@dev.mellanox.co.il>
<mailto:pa...@dev.mellanox.co.il
<mailto:pa...@dev.mellanox.co.il>>>


   Matt,
   Can you please try attached patch ? I guess it will resolve
this
   issue.

   Thanks,
   Pasha

   Matt Burgess wrote:

   Lenny,

   Thanks for the info. It doesn't seem to be be working
still.
   My command line is:

   /opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib
-mca btl
   openib,self -mca btl_openib_of_pkey_val 33033
   /cluster/pallas/x86_64-ib/IMB-MPI1

   I don't have a
"/sys/class/infiniband/mthca0/ports/1/pkeys/"
   but I do have
"/sys/class/infiniband/mlx4_0/ports/1/pkeys/".
   It's contents are:

   0106  114  122  16   24   32   40   49   57   65  
73   81

 998
   1107  115  123  17   25   33   41   558   66  
74   82

 90   99
   10   108  116  124  18   26   34   42   50   59   67  
75   83
 91  100  109  117  125  19   27   35   43   51   6  
 68  76   84   92  101  11   118  126  228   36  
44   52   60
 69   77   85   93  102  110  119  127  20   29   37  
45  53   61   778   86   94  103  111  12   13  
21   338
 46   54   62   70   79   87   95  104  112  120  14  
22  30   39   47   55   63   71   888   96  105

 113  121  15
 23   31   448   56   64   72   80   89   97
   We aren't using the opensm, but voltaire's SM on a 2012
switch.

   Thanks again,
   Matt


   On Tue, Oct 7, 2008 at 9:37 AM, Lenny Verkhovsky
   <lenny.verkhov...@gmail.com
<mailto:lenny.verkhov...@gmail.com>
   <mailto:lenny.verkhov...@gmail.com
<mailto:lenny.verkhov...@gmail.com>>
   <mailto:lenny.verkhov...@gmail.com
<mailto:lenny.verkhov...@gmail.com>
   <mailto:lenny.verkhov...@gmail.com
<mailto:lenny.verkhov...@gmail.com>>>> wrote:

  Hi Matt,

  It seems that the right way to do it is the fallowing:

  -mca btl openib,self -mca btl_openib_ib_pkey_val 33033

  when the value is a decimal number of the pkey, in
your case
  0x8109 = 33033, and no need for
btl_openib_ib_pkey_ix value.

  ex.
  mpirun -np 2 -H witch2,witch3 -mca btl openib,self -mca
  btl_openib_ib_pkey_val 32769 ./mpi_p1_4_1_2 -t lt
  LT (2) (size min max avg) 1 3.511429 3.511429 3.511429

  if it's not working check cat
  /sys/class/infiniband/mthca0/ports/1/pkeys/* for
pkeys ans SM,
  maybe it's a setup.

  Pasha is currently checking this issue.

  Best regards,

  Lenny.





  On 10/7/08, *Je

[OMPI devel] OpenIB BTL - removing btl_openib_ib_pkey_ix parameter

2008-10-07 Thread Pavel Shamis (Pasha)

Hi,

I would like to remove the btl_openib_ib_pkey_ix (btl_openib_ib_pkey_ix) 
parameter.
The partition key index (btl_openib_ib_pkey_ix) is defined locally by 
each HCA, so in most cases each host will have different pkey index and
user have no control over this value. So direct pkey_ix specification is 
not possible because each HCA will have different value.
If user want to use specific partition he should specify only the 
btl_openib_ibib_pkey_val parameter, and Open-mpi translate it to 
corresponding pkey_ix.


I think the btl_openib_ib_pkey_ix is useless and I would like to remove it.

Comments ?

--
Pavel Shamis (Pasha)
Mellanox Technologies LTD.



Re: [OMPI devel] Fwd: [OMPI users] OpenMPI with openib partitions

2008-10-07 Thread Pavel Shamis (Pasha)

Matt,
Can you please run " cat /sys/class/infiniband/mlx4_0/ports/1/pkeys/* " 
on your d2-ib,d3-ib.

I would like to check the partition configuration.

Ohh, BTW I see that the command line in previous email was wrong,
Please use follow command line (the parameter name should be 
"btl_openib_ib_pkey_val" for ompi-1.2.6 and my patch accepts HEX/DEC 
values):
/opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib -mca btl 
openib,self -mca btl_openib_ib_pkey_val 0x8109 
/cluster/pallas/x86_64-ib/IMB-MPI1


Ompi 1.2.6 version should work ok with this patch.

Thanks,
Pasha

Matt Burgess wrote:

Pasha,

Thanks for the patch. Unfortunately, it doesn't seem like that fixed 
the problem. I realized earlier I didn't mention what version of 
OpenMPI I was trying - it's 1.2.6. <http://1.2.6.> Should I be trying 
1.2.7 with this patch?


Thanks,
Matt

2008/10/7 Pavel Shamis (Pasha) <pa...@dev.mellanox.co.il 
<mailto:pa...@dev.mellanox.co.il>>


Matt,
Can you please try attached patch ? I guess it will resolve this
issue.

Thanks,
Pasha

Matt Burgess wrote:

Lenny,

Thanks for the info. It doesn't seem to be be working still.
My command line is:

/opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib -mca btl
openib,self -mca btl_openib_of_pkey_val 33033
/cluster/pallas/x86_64-ib/IMB-MPI1

I don't have a "/sys/class/infiniband/mthca0/ports/1/pkeys/"
but I do have "/sys/class/infiniband/mlx4_0/ports/1/pkeys/".
It's contents are:

0106  114  122  16   24   32   40   49   57   65   73   81
  998
1107  115  123  17   25   33   41   558   66   74   82
  90   99
10   108  116  124  18   26   34   42   50   59   67   75   83
  91  100  109  117  125  19   27   35   43   51   668  
76   84   92  101  11   118  126  228   36   44   52   60
  69   77   85   93  102  110  119  127  20   29   37   45  
53   61   778   86   94  103  111  12   13   21   338
  46   54   62   70   79   87   95  104  112  120  14   22  
30   39   47   55   63   71   888   96  105  113  121  15

  23   31   448   56   64   72   80   89   97
We aren't using the opensm, but voltaire's SM on a 2012 switch.

Thanks again,
Matt


On Tue, Oct 7, 2008 at 9:37 AM, Lenny Verkhovsky
<lenny.verkhov...@gmail.com
<mailto:lenny.verkhov...@gmail.com>
<mailto:lenny.verkhov...@gmail.com
<mailto:lenny.verkhov...@gmail.com>>> wrote:

   Hi Matt,

   It seems that the right way to do it is the fallowing:

   -mca btl openib,self -mca btl_openib_ib_pkey_val 33033

   when the value is a decimal number of the pkey, in your case
   0x8109 = 33033, and no need for btl_openib_ib_pkey_ix value.

   ex.
   mpirun -np 2 -H witch2,witch3 -mca btl openib,self -mca
   btl_openib_ib_pkey_val 32769 ./mpi_p1_4_1_2 -t lt
   LT (2) (size min max avg) 1 3.511429 3.511429 3.511429

   if it's not working check cat
   /sys/class/infiniband/mthca0/ports/1/pkeys/* for pkeys ans SM,
   maybe it's a setup.

   Pasha is currently checking this issue.

   Best regards,

   Lenny.





   On 10/7/08, *Jeff Squyres* <jsquy...@cisco.com
<mailto:jsquy...@cisco.com>
   <mailto:jsquy...@cisco.com <mailto:jsquy...@cisco.com>>> wrote:

   FWIW, if this configuration is for all of your users, you
   might want to specify these MCA params in the default MCA
   param file, or the environment, ...etc.  Just so that you
   don't have to specify it on every mpirun command line.

   See
 
 http://www.open-mpi.org/faq/?category=tuning#setting-mca-params.




   On Oct 7, 2008, at 5:43 AM, Lenny Verkhovsky wrote:

   Sorry, misunderstood the question,

   thanks for Pasha the right command line will be

   -mca btl openib,self -mca btl_openib_of_pkey_val 0x8109
   -mca btl_openib_of_pkey_ix 1

   ex.

   #mpirun -np 2 -H witch2,witch3 -mca btl openib,self
-mca
   btl_openib_of_pkey_val 0x8001 -mca
btl_openib_of_pkey_ix 1
   ./mpi_p1_4_TRUNK -t lt
   LT (2) (size min max avg) 1 3.443480 3.443480 3.443480


   Best regards

   Lenny.


   On 10/6/08, Jeff Squyres <jsquy...@cisco.com
<mailto:jsquy...@cisco.com>
   <mailto:jsquy...@cisco.com
<mailto:jsquy...@cisco.com>>> wrote: On Oct 5, 2008, at

   1:22 PM, Lenny V

Re: [OMPI devel] Fwd: [OMPI users] OpenMPI with openib partitions

2008-10-07 Thread Pavel Shamis (Pasha)
 /cluster/pallas/x86_64-ib/IMB-MPI1

but I just get a RETRY EXCEEDED ERROR. Is there a MCA
parameter I am missing?

I was successful using tcp only:

/opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -machinefile
machinefile -mca btl tcp,self -mca btl_openib_max_btls 1
-mca btl_openib_ib_pkey_val 0x8109
/cluster/pallas/x86_64-ib/IMB-MPI1



Thanks,
Matt Burgess

___
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres

Cisco Systems


___
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users



-- 
Jeff Squyres

Cisco Systems






___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
--
Pavel Shamis (Pasha)
Mellanox Technologies LTD.

Index: ompi/mca/btl/openib/btl_openib_component.c
===
--- ompi/mca/btl/openib/btl_openib_component.c  (revision 19490)
+++ ompi/mca/btl/openib/btl_openib_component.c  (working copy)
@@ -558,7 +558,7 @@ static int init_one_hca(opal_list_t *btl
  goto dealloc_pd;
 }

-ret = OMPI_SUCCESS; 
+ret = OMPI_SUCCESS;
 /* Note ports are 1 based hence j = 1 */
 for(i = 1; i <= hca->ib_dev_attr.phys_port_cnt; i++){
 struct ibv_port_attr ib_port_attr;
@@ -580,7 +580,7 @@ static int init_one_hca(opal_list_t *btl
 uint16_t pkey,j;
 for (j=0; j < hca->ib_dev_attr.max_pkeys; j++) {
 ibv_query_pkey(hca->ib_dev_context, i, j, );
-pkey=ntohs(pkey);
+pkey=ntohs(pkey) & 0x7fff;
 if(pkey == mca_btl_openib_component.ib_pkey_val){
 ret = init_one_port(btl_list, hca, i, j, 
_port_attr);
 break;
Index: ompi/mca/btl/openib/btl_openib_ini.c
===
--- ompi/mca/btl/openib/btl_openib_ini.c(revision 19490)
+++ ompi/mca/btl/openib/btl_openib_ini.c(working copy)
@@ -90,8 +90,6 @@ static int parse_line(parsed_section_val
 static void reset_section(bool had_previous_value, parsed_section_values_t *s);
 static void reset_values(ompi_btl_openib_ini_values_t *v);
 static int save_section(parsed_section_values_t *s);
-static int intify(char *string);
-static int intify_list(char *str, uint32_t **values, int *len);
 static inline void show_help(const char *topic);


@@ -364,14 +362,14 @@ static int parse_line(parsed_section_val
all whitespace at the beginning and ending of the value. */

 if (0 == strcasecmp(key_buffer, "vendor_id")) {
-if (OMPI_SUCCESS != (ret = intify_list(value, >vendor_ids, 
+if (OMPI_SUCCESS != (ret = ompi_btl_openib_ini_intify_list(value, 
>vendor_ids, 
>vendor_ids_len))) {
 return ret;
 }
 }

 else if (0 == strcasecmp(key_buffer, "vendor_part_id")) {
-if (OMPI_SUCCESS != (ret = intify_list(value, >vendor_part_ids,
+if (OMPI_SUCCESS != (ret = ompi_btl_openib_ini_intify_list(value, 
>vendor_part_ids,
>vendor_part_ids_len))) {
 return ret;
 }
@@ -379,13 +377,13 @@ static int parse_line(parsed_section_val

 else if (0 == strcasecmp(key_buffer, "mtu")) {
 /* Single value */
-sv->values.mtu = (uint32_t) intify(value);
+sv->values.mtu = (uint32_t) ompi_btl_openib_ini_intify(value);
 sv->values.mtu_set = true;
 }

 else if (0 == strcasecmp(key_buffer, "use_eager_rdma")) {
 /* Single value */
-sv->values.use_eager_rdma = (uint32_t) intify(value);
+sv->values.use_eager_rdma = (uint32_t) 
ompi_btl_openib_ini_intify(value);
 sv->values.use_eager_rdma_set = true;
 }

@@ -547,7 +545,7 @@ static int save_section(parsed_section_v
 /*
  * Do string-to-integer conversion, for both hex and decimal numbers
  */
-static int intify(char *str)
+int ompi_btl_openib_ini_intify(char *str)
 {
 while (isspace(*str)) {
 ++str;
@@ -568,7 +566,7 @@ static int inti

Re: [OMPI devel] openib component lock

2008-08-06 Thread Pavel Shamis (Pasha)

Jeff Squyres wrote:
In working on https://svn.open-mpi.org/trac/ompi/ticket/1434, I see 
fairly inconsistent use of the mca_openib_btl_component.ib_lock, such 
as within btl_openib_proc.c.


In fixing #1434, do I need to be mindful doing all the locking 
properly, or is it already so broken that it doesn't really matter, 
and all I need to do is ensure that I don't put in any bozo deadlocks?


Hmm good question... I never tested thread build of openib btl, so I 
don't know if it really works. But we try to keep it thread safe.

(All the thread stuff in openib btl require serious review..)


Re: [OMPI devel] IBCM error

2008-08-03 Thread Pavel Shamis (Pasha)

Thanks for update.

Sean Hefty wrote:

I've committed a patch to my libibcm git tree with the values

IB_CM_ASSIGN_SERVICE_ID
IB_CM_ASSIGN_SERVICE_ID_MASK

these will be in libibcm release 1.0.3, which will shortly...

- Sean


  




Re: [OMPI devel] trunk hangs since r19010

2008-07-29 Thread Pavel Shamis (Pasha)

Jeff Squyres wrote:


This used to be true, but I think we changed it a while ago (Pasha: do 
you remember?) because Mellanox HCAs are capable of send-to-self 
(process) and there were no code changes necessary to enable it.  So 
it allowed a slightly simpler command line.  This was quite a while 
ago, IIRC.

Yep, Correct.

FYI. In my MTT testing I also see a lot of killed tests.


Re: [OMPI devel] IBCM error

2008-07-17 Thread Pavel Shamis (Pasha)

Sean Hefty wrote:

It is not zero, it should be:
#define IB_CM_ASSIGN_SERVICE_ID __cpu_to_be64(0x0200ULL)

Unfortunately the value defined in kernel level IBCM and does not
exposed to user level.
Can you please expose it to user level (infiniband/cm.h)



Oops - good catch.  I will add the assign ID and mask values to the header file
for the next release.  Until then, can you try using the values given in the
kernel header file and let me know if it solves the problem?
  

I already prepared patch for OMPI that defines the value.
Few people already reported that the patch ok ( 
https://svn.open-mpi.org/trac/ompi/ticket/1388 )


Pasha


Re: [OMPI devel] IBCM error

2008-07-17 Thread Pavel Shamis (Pasha)

Jeff Squyres wrote:

On Jul 16, 2008, at 11:07 AM, Don Kerr wrote:


Pasha added configure switches for this about a week ago:

   --en|disable-openib-ibcm
   --en|disable-openib-rdmacm
I like these flags but I thought there was going to be a run time 
check for cases where Open MPI is built on a system that has ibcm 
support but is later run on a system without ibcm support.



Yes, there are.

- if the /dev/infiniband/ucm* files aren't there, we silently return 
"not supported" and skip ibcm
- if ib_cm_open_device() (the first function call) fails, we assume 
that IBCM simply isn't supported on this platform and silently return 
"not supported" and skip ibcm


Right not we are skipping the IBCM all time. Only if user specify IBCM 
explicitly via include/exclude interface the IBCM will be used.


Pasha


Re: [OMPI devel] IBCM error

2008-07-17 Thread Pavel Shamis (Pasha)

Sean Hefty wrote:

If you don't care what the service ID is, you can specify 0, and the kernel will
assign one.  The assigned value can be retrieved by calling ib_cm_attr_id().
(I'm assuming that you communicate the IDs out of band somehow.)
  

It is not zero, it should be:
#define IB_CM_ASSIGN_SERVICE_ID __cpu_to_be64(0x0200ULL)

Unfortunately the value defined in kernel level IBCM and does not 
exposed to user level.

Can you please expose it to user level (infiniband/cm.h)

Regards,
Pasha



Re: [OMPI devel] Segfault in 1.3 branch

2008-07-15 Thread Pavel Shamis (Pasha)

I opened ticket for the bug:
https://svn.open-mpi.org/trac/ompi/ticket/1389

Ralph Castain wrote:

It looks like a new issue to me, Pasha. Possibly a side consequence of the
IOF change made by Jeff and I the other day. From what I can see, it looks
like you app was a simple "hello" - correct?

If you look at the error, the problem occurs when mpirun is trying to route
a message. Since the app is clearly running at this time, the problem is
probably in the IOF. The error message shows that mpirun is attempting to
route a message to a jobid that doesn't exist. We have a test in the RML
that forces an "abort" if that occurs.

I would guess that there is either a race condition or memory corruption
occurring somewhere, but I have no idea where.

This may be the "new hole in the dyke" I cautioned about in earlier notes
regarding the IOF... :-)

Still, given that this hits rarely, it probably is a more acceptable bug to
leave in the code than the one we just fixed (duplicated stdin)...

Ralph



On 7/14/08 1:11 AM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il> wrote:

  

Please see http://www.open-mpi.org/mtt/index.php?do_redir=764

The error is not consistent. It takes a lot of iteration to reproduce it.
In my MTT testing I seen it few times.

Is it know issue ?

Regards,
Pasha
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




Re: [OMPI devel] IBCM error

2008-07-15 Thread Pavel Shamis (Pasha)



Guess what - we don't always put them out there because - tada - we don't
use them! What goes out on the backend is a stripped down version of
libraries we require. Given the huge number of libraries people provide
(looking at the bigger, beyond OMPI picture), it consumes a lot of limited
disk space to install every library on every node. So sometimes we build our
own rpm's to pick up only what we need.

As long as --without-rdmacm --without-ibcm are present, then we are happy.

  

FYI
I recently added options that allow enable/disable all the *cm stuff:

 --enable-openib-ibcmEnable Open Fabrics IBCM support in openib BTL
 (default: enabled)
 --enable-openib-rdmacm  Enable Open Fabrics RDMACM support in openib BTL
 (default: enabled)




Re: [OMPI devel] IBCM error

2008-07-15 Thread Pavel Shamis (Pasha)


I need to check on this.  You may want to look at section A3.2.3 of 
the spec.
If you set the first byte (network order) to 0x00, and the 2nd byte 
to 0x01,
then you hit a 'reserved' range that probably isn't being used 
currently.


If you don't care what the service ID is, you can specify 0, and the 
kernel will
assign one.  The assigned value can be retrieved by calling 
ib_cm_attr_id().

(I'm assuming that you communicate the IDs out of band somehow.)



Ok; we'll need to check into this.  I don't remember the ordering -- 
we might actually be communicating the IDs before calling 
ib_cm_listen() (since we were simply using the PIDs, we could do that).


Thanks for the tip!  Pasha -- can you look into this?
It looks that th modex message we are preparing during query stage, so 
the order looks ok.
Unfortunately on my machines ibcm module does not create 
"/dev/infiniband/ucm*" and I can not thest the functionality.


Regards,
Pasha.



Re: [OMPI devel] Segfault in 1.3 branch

2008-07-15 Thread Pavel Shamis (Pasha)



It looks like a new issue to me, Pasha. Possibly a side consequence of the
IOF change made by Jeff and I the other day. From what I can see, it looks
like you app was a simple "hello" - correct?
  

Yep, it is simple hello application.

If you look at the error, the problem occurs when mpirun is trying to route
a message. Since the app is clearly running at this time, the problem is
probably in the IOF. The error message shows that mpirun is attempting to
route a message to a jobid that doesn't exist. We have a test in the RML
that forces an "abort" if that occurs.

I would guess that there is either a race condition or memory corruption
occurring somewhere, but I have no idea where.

This may be the "new hole in the dyke" I cautioned about in earlier notes
regarding the IOF... :-)

Still, given that this hits rarely, it probably is a more acceptable bug to
leave in the code than the one we just fixed (duplicated stdin)...
  
It is not so rare issue, 19 failures in my MTT run 
(http://www.open-mpi.org/mtt/index.php?do_redir=765).


Pasha

Ralph



On 7/14/08 1:11 AM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il> wrote:

  

Please see http://www.open-mpi.org/mtt/index.php?do_redir=764

The error is not consistent. It takes a lot of iteration to reproduce it.
In my MTT testing I seen it few times.

Is it know issue ?

Regards,
Pasha
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




Re: [OMPI devel] IBCM error

2008-07-14 Thread Pavel Shamis (Pasha)




Should we not even build support for it?
I think IBCM CPC build should be enabled by default. The IBCM is 
supplied with OFED so it should not be any problem during install.


PRO: don't even allow the possibility of running with it, because we 
know that there are issues with the ibcm userspace library (i.e., 
reduce problem reports from users)


PRO: users don't have to have libibcm installed on compute nodes 
(we've actually gotten some complaints about this)
We got compliances only for case when ompi was build on platform with 
IBCM and after it was run on platform without IBCM.  Also we did not 
have option to disable
the ibcm during compilation. So actually it was no way to install OMPI 
on compute node. We added the option and the problem was resolved.
In most cases the OFED install is the same on all nodes and it should 
not be any problem to build IBCM support by default.


Pasha




Re: [OMPI devel] IBCM error

2008-07-14 Thread Pavel Shamis (Pasha)

I can add in head of  query function something like :

if (!mca_btl_openib_component.cpc_explicitly_defined)
   return OMPI_ERR_NOT_SUPPORTED;


Jeff Squyres wrote:

On Jul 14, 2008, at 3:59 AM, Lenny Verkhovsky wrote:


Seems to be fixed.


Well, it's "fixed" in that Pasha turned off the error message.  But 
the same issue is undoubtedly happening.


I was asking for something a little stronger: perhaps we should 
actually have IBCM not try to be used unless it's specifically asked 
for.  Or maybe it shouldn't even build itself unless specifically 
asked for (which would obviously take care of the run-time issues as 
well).


The whole point of doing IBCM was to have a simple and fast mechanism 
for IB wireup.  But with these two problems (IBCM not properly 
installed on all systems, and ib_cm_listen() fails periodically), it 
more or less makes it unusable.  Therefore we shouldn't recommend it 
to production customers, and per precedent elsewhere in the code base, 
we should either not build it by default and/or not use it unless 
specifically asked for.






[OMPI devel] Segfault in 1.3 branch

2008-07-14 Thread Pavel Shamis (Pasha)

Please see http://www.open-mpi.org/mtt/index.php?do_redir=764

The error is not consistent. It takes a lot of iteration to reproduce it.
In my MTT testing I seen it few times.

Is it know issue ?

Regards,
Pasha


Re: [OMPI devel] IBCM error

2008-07-13 Thread Pavel Shamis (Pasha)

Fixed in https://svn.open-mpi.org/trac/ompi/changeset/18897

Is it any other know IBCM issue ?

Regards,
Pasha

Jeff Squyres wrote:
I think you said opposite things: Lenny's command line did not 
specifically ask for ibcm, but it was used anyway.  Lenny -- did you 
explicitly request it somewhere else (e.g., env var or MCA param file)?


I suspect that you did not; I suspect (without looking at the code 
again) that ibcm tried to select itself and failed on the 
ibcm_listen() call, so it fell back to oob.  This might have to be 
another workaround in OMPI, perhaps something like this:


if (ibcm_listen() fails)
   if (ibcm explicitly requested)
   print_warning()
   fail to use ibcm

Has this been filed as a bug at openfabrics.org?  I don't think that I 
filed it when Brad and I were testing on RoadRunner -- it would 
probably be good if someone filed it.




On Jul 13, 2008, at 8:56 AM, Lenny Verkhovsky wrote:


Pasha is right, I didn't disabled it.

On 7/13/08, Pavel Shamis (Pasha) <pa...@dev.mellanox.co.il> wrote: 
Jeff Squyres wrote:
Brad and I did some scale testing of IBCM and saw this error 
sometimes.  It seemed to happen with higher frequency when you 
increased the number of processes on a single node.


I talked to Sean Hefty about it, but we never figured out a 
definitive cause or solution.  My best guess is that there is 
something wonky about multiple processes simultaneously interacting 
with the IBCM kernel driver from userspace; but I don't know jack 
about kernel stuff, so that's a total SWAG.


Thanks for reminding me of this issue; I admit that I had forgotten 
about it.  :-(  Pasha -- should IBCM not be the default?

It is not default. I guess Lenny configured it explicitly, is not it ?

Pasha.





On Jul 13, 2008, at 7:08 AM, Lenny Verkhovsky wrote:

Hi,

I am getting this error sometimes.

/home/USERS/lenny/OMPI_COMP_PATH/bin/mpirun -np 100 -hostfile 
/home/USERS/lenny/TESTS/COMPILERS/hostfile 
/home/USERS/lenny/TESTS/COMPILERS/hello
[witch24][[32428,1],96][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_ibcm.c:769:ibcm_component_query] 
failed to ib_cm_listen 10 times: rc=-1, errno=22

Hello world! I'm 0 of 100 on witch2


Best Regards

Lenny.


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel







Re: [OMPI devel] IBCM error

2008-07-13 Thread Pavel Shamis (Pasha)

Jeff Squyres wrote:
Brad and I did some scale testing of IBCM and saw this error 
sometimes.  It seemed to happen with higher frequency when you 
increased the number of processes on a single node.


I talked to Sean Hefty about it, but we never figured out a definitive 
cause or solution.  My best guess is that there is something wonky 
about multiple processes simultaneously interacting with the IBCM 
kernel driver from userspace; but I don't know jack about kernel 
stuff, so that's a total SWAG.


Thanks for reminding me of this issue; I admit that I had forgotten 
about it.  :-(  Pasha -- should IBCM not be the default?

It is not default. I guess Lenny configured it explicitly, is not it ?

Pasha.





On Jul 13, 2008, at 7:08 AM, Lenny Verkhovsky wrote:


Hi,

I am getting this error sometimes.

/home/USERS/lenny/OMPI_COMP_PATH/bin/mpirun -np 100 -hostfile 
/home/USERS/lenny/TESTS/COMPILERS/hostfile 
/home/USERS/lenny/TESTS/COMPILERS/hello
[witch24][[32428,1],96][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_ibcm.c:769:ibcm_component_query] 
failed to ib_cm_listen 10 times: rc=-1, errno=22

Hello world! I'm 0 of 100 on witch2


Best Regards

Lenny.


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel







Re: [OMPI devel] open ib dependency question

2008-07-10 Thread Pavel Shamis (Pasha)

FYI the issue was resolved - https://svn.open-mpi.org/trac/ompi/ticket/1376
Regards,
Pasha

Bogdan Costescu wrote:

On Thu, 3 Jul 2008, Pavel Shamis (Pasha) wrote:

I had similar issue recently. It will be nice to have option to 
disable/enable *CM via config flags.


Not sure if this is related... I am looking at using a nightly 1.3 
snapshot and I get this type of error messages when running:


[n020205][[36506,1],0][connect/btl_openib_connect_ibcm.c:723:ibcm_component_query] 
failed to open IB CM device: /dev/infiniband/ucm0


which is actually right, as /dev/infiniband on the nodes doesn't 
contain ucm0. On the same cluster, OpenMPI 1.2.7rc2 works fine; the 
configure options for building them are the same.


The output of ldd shows for the binary linked with 1.3a:

libibcm.so.1 => /opt/ofed/1.2.5.4/lib64/libibcm.so.1

while this is missing from the binary linked with 1.2. Even after 
printing these messages, the binary linked with 1.3a works; it works 
even when I specify "--mca btl openib,self" so I think that the IB 
stack is still being used (there is also a TCP/GigE network which 
could be chosen otherwise).


I don't know whether this is caused by a somehow inconsistent setup of 
the system, but I would welcome an option to make 1.3a behave like 1.2.






Re: [OMPI devel] open ib dependency question

2008-07-08 Thread Pavel Shamis (Pasha)




Pasha -- do you want to open a ticket?


Done. https://svn.open-mpi.org/trac/ompi/ticket/1376
Pasha.


Re: [OMPI devel] open ib dependency question

2008-07-06 Thread Pavel Shamis (Pasha)

I added patch to the ticked, please review.

Pasha.
Jeff Squyres wrote:

Ok:

https://svn.open-mpi.org/trac/ompi/ticket/1375

I think any of us could do this -- it's pretty straightforward.  No 
guarantees on when I can get to it; my 1.3 list is already pretty long...



On Jul 3, 2008, at 6:20 AM, Pavel Shamis (Pasha) wrote:


Jeff Squyres wrote:

Do you need configury to disable building ibcm / rdmacm support?

The more I think about it, the more I think that these would be good 
features to have for v1.3...
I had similar issue recently. It will be nice to have option to 
disable/enable *CM via config flags.



On Jul 3, 2008, at 2:52 AM, Don Kerr wrote:

I did not think it was required but it hung me up when I built ompi 
on one system which had the ibcm libraries and then ran on a system 
without the ibcm libs. I had another issue on the system without 
ibcm libs which prevented my building there but I will go down that 
path again. Thanks.


Jeff Squyres wrote:
That is the IBCM library for the IBCM CPC -- IB connection manager 
stuff.


It's not *necessary*; you could use the OOB CPC if you want to.

That being said, OMPI currently builds support for it (and links 
it in) if it finds the right headers and library files. We don't 
currently have configury to disable this behavior (and *not* build 
RDMACM and/or IBCM support).


Do you have a problem / need to disable building support for IBCM?



On Jul 2, 2008, at 12:02 PM, Don Kerr wrote:



It appears that the mca_btl_openib.so has a dependency on 
libibcm.so. Is this necessary?



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel







Re: [OMPI devel] open ib dependency question

2008-07-06 Thread Pavel Shamis (Pasha)
I see the same issue on my Mellanox OFED 1.3. IBCM module is loaded but 
is no such device in system.

Jeff, looks like some bug in IBCM stuff... (not ompi)
I think we should print the error only if  ibcm  was  explicitly 
selected by user. But from the cpc level it is no way to know

about explicit selectionMaybe just hide the print ?

Bogdan Costescu wrote:

On Thu, 3 Jul 2008, Pavel Shamis (Pasha) wrote:

I had similar issue recently. It will be nice to have option to 
disable/enable *CM via config flags.


Not sure if this is related... I am looking at using a nightly 1.3 
snapshot and I get this type of error messages when running:


[n020205][[36506,1],0][connect/btl_openib_connect_ibcm.c:723:ibcm_component_query] 
failed to open IB CM device: /dev/infiniband/ucm0


which is actually right, as /dev/infiniband on the nodes doesn't 
contain ucm0. On the same cluster, OpenMPI 1.2.7rc2 works fine; the 
configure options for building them are the same.


The output of ldd shows for the binary linked with 1.3a:

libibcm.so.1 => /opt/ofed/1.2.5.4/lib64/libibcm.so.1

while this is missing from the binary linked with 1.2. Even after 
printing these messages, the binary linked with 1.3a works; it works 
even when I specify "--mca btl openib,self" so I think that the IB 
stack is still being used (there is also a TCP/GigE network which 
could be chosen otherwise).


I don't know whether this is caused by a somehow inconsistent setup of 
the system, but I would welcome an option to make 1.3a behave like 1.2.






Re: [OMPI devel] open ib dependency question

2008-07-03 Thread Pavel Shamis (Pasha)



Ok:

https://svn.open-mpi.org/trac/ompi/ticket/1375

I think any of us could do this -- it's pretty straightforward.  No 
guarantees on when I can get to it; my 1.3 list is already pretty long...

No problem. I will take this one.
Pasha.



On Jul 3, 2008, at 6:20 AM, Pavel Shamis (Pasha) wrote:


Jeff Squyres wrote:

Do you need configury to disable building ibcm / rdmacm support?

The more I think about it, the more I think that these would be good 
features to have for v1.3...
I had similar issue recently. It will be nice to have option to 
disable/enable *CM via config flags.



On Jul 3, 2008, at 2:52 AM, Don Kerr wrote:

I did not think it was required but it hung me up when I built ompi 
on one system which had the ibcm libraries and then ran on a system 
without the ibcm libs. I had another issue on the system without 
ibcm libs which prevented my building there but I will go down that 
path again. Thanks.


Jeff Squyres wrote:
That is the IBCM library for the IBCM CPC -- IB connection manager 
stuff.


It's not *necessary*; you could use the OOB CPC if you want to.

That being said, OMPI currently builds support for it (and links 
it in) if it finds the right headers and library files. We don't 
currently have configury to disable this behavior (and *not* build 
RDMACM and/or IBCM support).


Do you have a problem / need to disable building support for IBCM?



On Jul 2, 2008, at 12:02 PM, Don Kerr wrote:



It appears that the mca_btl_openib.so has a dependency on 
libibcm.so. Is this necessary?



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel







Re: [OMPI devel] open ib dependency question

2008-07-03 Thread Pavel Shamis (Pasha)

Jeff Squyres wrote:

Do you need configury to disable building ibcm / rdmacm support?

The more I think about it, the more I think that these would be good 
features to have for v1.3...
I had similar issue recently. It will be nice to have option to 
disable/enable *CM via config flags.



On Jul 3, 2008, at 2:52 AM, Don Kerr wrote:

I did not think it was required but it hung me up when I built ompi 
on one system which had the ibcm libraries and then ran on a system 
without the ibcm libs. I had another issue on the system without ibcm 
libs which prevented my building there but I will go down that path 
again. Thanks.


Jeff Squyres wrote:
That is the IBCM library for the IBCM CPC -- IB connection manager 
stuff.


It's not *necessary*; you could use the OOB CPC if you want to.

That being said, OMPI currently builds support for it (and links it 
in) if it finds the right headers and library files. We don't 
currently have configury to disable this behavior (and *not* build 
RDMACM and/or IBCM support).


Do you have a problem / need to disable building support for IBCM?



On Jul 2, 2008, at 12:02 PM, Don Kerr wrote:



It appears that the mca_btl_openib.so has a dependency on 
libibcm.so. Is this necessary?



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel







Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Pavel Shamis (Pasha)

I did fresh check out and everything works well.
So looks like some svn up screw my svn.
Ralph, thanks for help !

Ralph H Castain wrote:

Hmmm...something isn't right, Pasha. There is simply no way you should be
encountering this error. You are picking up the wrong grpcomm module.

I went ahead and fixed the grpcomm/basic module, but as I note in the commit
message, that is now an experimental area. The grpcomm/bad module is the
default for that reason.

Check to ensure you have the orte/mca/grpcomm/bad directory, and that it is
getting built. My guess is that you have a corrupted checkout or build and
that the component is either missing or not getting built.


On 6/19/08 1:37 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il> wrote:

  

Ralph H Castain wrote:


I can't find anything wrong so far. I'm waiting in a queue on Odin to try
there since Jeff indicated you are using rsh as a launcher, and that's the
only access I have to such an environment. Guess Odin is being pounded
because the queue isn't going anywhere.
  
  

 I use ssh., here is command line:
./bin/mpirun -np 2 -H sw214,sw214 -mca btl openib,sm,self
./osu_benchmarks-3.0/osu_latency


Meantime, I'm building on RoadRunner and will test there (TM enviro).


On 6/19/08 1:18 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il> wrote:

  
  

You'll have to tell us something more than that, Pasha. What kind of
environment, what rev level were you at, etc.
  
  
  

Ahh, sorry :) I run on Linux x86_64 Sles10 sp1. (Open MPI) 1.3a1r18682M
, OFED 1.3.1
Pasha.



So far as I know, the trunk is fine.


On 6/19/08 12:01 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il>
wrote:

  
  
  

I tried to run trunk on my machines and I got follow error:

[sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
end of buffer in file base/grpcomm_base_modex.c at line 451
[sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
end of buffer in file grpcomm_basic_module.c at line 560
[sw214:04365]
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  orte_grpcomm_modex failed
  --> Returned "Data unpack would read past end of buffer" (-26) instead
of "Success" (0)

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  
  
  

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  
  

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Pavel Shamis (Pasha)

Ralph H Castain wrote:

Ha! I found it - you left out one very important detail. You are specifying
the use of the grpcomm basic module instead of the default "bad" one.
  

Hmm , I did not specified any "grpcomm" module.

I just checked and that module is indeed showing a problem. I'll see what I
can do.

For now, though, just use the default grpcomm and it will work fine.


On 6/19/08 1:18 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il> wrote:

  

You'll have to tell us something more than that, Pasha. What kind of
environment, what rev level were you at, etc.
  
  

Ahh, sorry :) I run on Linux x86_64 Sles10 sp1. (Open MPI) 1.3a1r18682M
, OFED 1.3.1
Pasha.


So far as I know, the trunk is fine.


On 6/19/08 12:01 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il>
wrote:

  
  

I tried to run trunk on my machines and I got follow error:

[sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
end of buffer in file base/grpcomm_base_modex.c at line 451
[sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
end of buffer in file grpcomm_basic_module.c at line 560
[sw214:04365] 
--

It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  orte_grpcomm_modex failed
  --> Returned "Data unpack would read past end of buffer" (-26) instead
of "Success" (0)

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  
  

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Pavel Shamis (Pasha)

Ralph H Castain wrote:

I can't find anything wrong so far. I'm waiting in a queue on Odin to try
there since Jeff indicated you are using rsh as a launcher, and that's the
only access I have to such an environment. Guess Odin is being pounded
because the queue isn't going anywhere.
  

I use ssh., here is command line:
./bin/mpirun -np 2 -H sw214,sw214 -mca btl openib,sm,self   
./osu_benchmarks-3.0/osu_latency

Meantime, I'm building on RoadRunner and will test there (TM enviro).


On 6/19/08 1:18 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il> wrote:

  

You'll have to tell us something more than that, Pasha. What kind of
environment, what rev level were you at, etc.
  
  

Ahh, sorry :) I run on Linux x86_64 Sles10 sp1. (Open MPI) 1.3a1r18682M
, OFED 1.3.1
Pasha.


So far as I know, the trunk is fine.


On 6/19/08 12:01 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il>
wrote:

  
  

I tried to run trunk on my machines and I got follow error:

[sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
end of buffer in file base/grpcomm_base_modex.c at line 451
[sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
end of buffer in file grpcomm_basic_module.c at line 560
[sw214:04365] 
--

It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  orte_grpcomm_modex failed
  --> Returned "Data unpack would read past end of buffer" (-26) instead
of "Success" (0)

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  
  

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Pavel Shamis (Pasha)



You'll have to tell us something more than that, Pasha. What kind of
environment, what rev level were you at, etc.
  
Ahh, sorry :) I run on Linux x86_64 Sles10 sp1. (Open MPI) 1.3a1r18682M 
, OFED 1.3.1

Pasha.

So far as I know, the trunk is fine.


On 6/19/08 12:01 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il>
wrote:

  

I tried to run trunk on my machines and I got follow error:

[sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
end of buffer in file base/grpcomm_base_modex.c at line 451
[sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
end of buffer in file grpcomm_basic_module.c at line 560
[sw214:04365] 
--

It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  orte_grpcomm_modex failed
  --> Returned "Data unpack would read past end of buffer" (-26) instead
of "Success" (0)

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




[OMPI devel] Is trunk broken ?

2008-06-19 Thread Pavel Shamis (Pasha)

I tried to run trunk on my machines and I got follow error:

[sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past 
end of buffer in file base/grpcomm_base_modex.c at line 451
[sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past 
end of buffer in file grpcomm_basic_module.c at line 560
[sw214:04365] 
--

It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

 orte_grpcomm_modex failed
 --> Returned "Data unpack would read past end of buffer" (-26) instead 
of "Success" (0)




Re: [OMPI devel] Memory hooks change testing

2008-06-12 Thread Pavel Shamis (Pasha)

In my MTT testing it looks ok, too.

Brad Benton wrote:

Brian,

This is working smoothly now:  both the configuration/build and 
execution.  So,

from my standpoint, it looks good for inclusion into the trunk.

--brad



On Wed, Jun 11, 2008 at 4:50 PM, Brian W. Barrett 
> wrote:


Brad unfortunately figured out I had done something to annoy the
gods of
mercurial and the repository below didn't contain all the changes
advertised (and in fact didn't work).  I've since rebuilt the
repository
and verified it works now.  I'd recommend deleting your existing
clones of
the repository below and starting over.

Sorry about that,

Brian


On Wed, 11 Jun 2008, Brian Barrett wrote:

> Did anyone get a chance to test (or think about testing) this?
 I'd like to
> commit the changes on Friday evening, if I haven't heard any
negative
> feedback.
>
> Brian
>
> On Jun 9, 2008, at 8:56 PM, Brian Barrett wrote:
>
>> Hi all -
>>
>> Per the RFC I sent out last week, I've implemented a revised
behavior of
>> the memory hooks for high speed networks.  It's a bit different
than the
>> RFC proposed, but still very minor and fairly straight foward.
>>
>> The default is to build ptmalloc2 support, but as an almost
complete
>> standalone library.  If the user wants to use ptmalloc2, he
only has to add
>> -lopenmpi-malloc to his link line.  Even when standalone and
openmpi-malloc
>> is not linked in, we'll still intercept munmap as it's needed
for mallopt
>> (below) and we've never had any trouble with that part of
ptmalloc2 (it's
>> easy to intercept).
>>
>> As a *CHANGE* in behavior, if leave_pinned support is turned on
and there's
>> no ptmalloc2 support, we will automatically enable mallopt.  As
a *CHANGE*
>> in behavior, if the user disables mallopt or mallopt is not
available and
>> leave pinned is requested, we'll abort.  I think these both
make sense and
>> are closest to expected behavior, but wanted to point them out.
 It is
>> possible for the user to disable mallopt and enable
leave_pinned, but that
>> will *only* work if there is some other mechanism for
intercepting free
>> (basically, it allows a way to ensure your using ptmalloc2
instead of
>> mallopt).
>>
>> There is also a new memory component, mallopt, which only
intercepts munmap
>> and exists only to allow users to enable mallopt while not
building the
>> ptmalloc2 component at all.  Previously, our mallopt support
was lacking in
>> that it didn't cover the case where users explicitly called
munmap in their
>> applications.  Now, it does.
>>
>> The changes are fairly small and can be seen/tested in the HG
repository
>> bwb/mem-hooks, URL below.  I think this would be a good thing
to push to
>> 1.3, as it will solve an ongoing problem on Linux (basically,
users getting
>> screwed by our ptmalloc2 implementation).
>>
>>   http://www.open-mpi.org/hg/hgwebdir.cgi/bwb/mem-hooks/
>>
>> Brian
>>
>> --
>> Brian Barrett
>> Open MPI developer
>> http://www.open-mpi.org/
>>
>>
>
>
___
devel mailing list
de...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Pallas fails

2008-06-12 Thread Pavel Shamis (Pasha)

With 1.3a1r18643 Pallas tests pass on my machine.
But I see new failures (assertion) in Intel-Test 
http://www.open-mpi.org/mtt/index.php?do_redir=733


PI_Type_struct_types_c: btl_sm.c:684: mca_btl_sm_sendi: Assertion `max_data == 
payload_size'
failed.
[sw216:32013] *** Process received signal ***
[sw216:32013] Signal: Aborted (6)
[sw216:32013] Signal code:  (-6)
[sw216:32013] [ 0] /lib64/libpthread.so.0 [0x2aba5e51ec10]
[sw216:32013] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x2aba5e657b95]
[sw216:32013] [ 2] /lib64/libc.so.6(abort+0x110) [0x2aba5e658f90]
[sw216:32013] [ 3] /lib64/libc.so.6(__assert_fail+0xf6) [0x2aba5e651256]
[sw216:32013] [ 4]




Pavel Shamis (Pasha) wrote:

Last conf. call Jeff mentioned that he see some collectives failures.
In my MTT testing I also see that Pallas collectives failed - 
http://www.open-mpi.org/mtt/index.php?do_redir=682


 Alltoall

#
# Benchmarking Alltoall 
# #processes = 20 
#

   #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
0 1000 0.03 0.05 0.04
1 1000   179.15   179.22   179.18
2 1000   155.96   156.02   155.98
4 1000   156.93   156.98   156.95
8 1000   163.63   163.67   163.65
   16 1000   115.04   115.08   115.07
   32 1000   123.57   123.62   123.59
   64 1000   129.78   129.82   129.80
  128 1000   141.45   141.49   141.48
  256 1000   960.11   960.24   960.20
  512 1000   900.95   901.11   901.04
 1024 1000   921.95   922.05   922.00
 2048 1000   862.50   862.72   862.60
 4096 1000  1044.90  1044.95  1044.92
 8192 1000  1458.59  1458.77  1458.69
*** An error occurred in MPI_Alltoall
*** on communicator MPI COMMUNICATOR 4 SPLIT FROM 0
*** An error occurred in MPI_Alltoall
*** on communicator MPI COMMUNICATOR 4 SPLIT FROM 0

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




Re: [OMPI devel] Memory hooks change testing

2008-06-11 Thread Pavel Shamis (Pasha)

Brian Barrett wrote:
Did anyone get a chance to test (or think about testing) this?  I'd  
like to commit the changes on Friday evening, if I haven't heard any  
negative feedback.
  

I will run it tomorrow on my cluster.

Brian

On Jun 9, 2008, at 8:56 PM, Brian Barrett wrote:

  

Hi all -

Per the RFC I sent out last week, I've implemented a revised  
behavior of the memory hooks for high speed networks.  It's a bit  
different than the RFC proposed, but still very minor and fairly  
straight foward.


The default is to build ptmalloc2 support, but as an almost complete  
standalone library.  If the user wants to use ptmalloc2, he only has  
to add -lopenmpi-malloc to his link line.  Even when standalone and  
openmpi-malloc is not linked in, we'll still intercept munmap as  
it's needed for mallopt (below) and we've never had any trouble with  
that part of ptmalloc2 (it's easy to intercept).


As a *CHANGE* in behavior, if leave_pinned support is turned on and  
there's no ptmalloc2 support, we will automatically enable mallopt.   
As a *CHANGE* in behavior, if the user disables mallopt or mallopt  
is not available and leave pinned is requested, we'll abort.  I  
think these both make sense and are closest to expected behavior,  
but wanted to point them out.  It is possible for the user to  
disable mallopt and enable leave_pinned, but that will *only* work  
if there is some other mechanism for intercepting free (basically,  
it allows a way to ensure your using ptmalloc2 instead of mallopt).


There is also a new memory component, mallopt, which only intercepts  
munmap and exists only to allow users to enable mallopt while not  
building the ptmalloc2 component at all.  Previously, our mallopt  
support was lacking in that it didn't cover the case where users  
explicitly called munmap in their applications.  Now, it does.


The changes are fairly small and can be seen/tested in the HG  
repository bwb/mem-hooks, URL below.  I think this would be a good  
thing to push to 1.3, as it will solve an ongoing problem on Linux  
(basically, users getting screwed by our ptmalloc2 implementation).


   http://www.open-mpi.org/hg/hgwebdir.cgi/bwb/mem-hooks/

Brian

--
 Brian Barrett
 Open MPI developer
 http://www.open-mpi.org/





  




[OMPI devel] Pallas fails

2008-06-04 Thread Pavel Shamis (Pasha)

Last conf. call Jeff mentioned that he see some collectives failures.
In my MTT testing I also see that Pallas collectives failed - 
http://www.open-mpi.org/mtt/index.php?do_redir=682


Alltoall

#
# Benchmarking Alltoall 
# #processes = 20 
#

  #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
   0 1000 0.03 0.05 0.04
   1 1000   179.15   179.22   179.18
   2 1000   155.96   156.02   155.98
   4 1000   156.93   156.98   156.95
   8 1000   163.63   163.67   163.65
  16 1000   115.04   115.08   115.07
  32 1000   123.57   123.62   123.59
  64 1000   129.78   129.82   129.80
 128 1000   141.45   141.49   141.48
 256 1000   960.11   960.24   960.20
 512 1000   900.95   901.11   901.04
1024 1000   921.95   922.05   922.00
2048 1000   862.50   862.72   862.60
4096 1000  1044.90  1044.95  1044.92
8192 1000  1458.59  1458.77  1458.69
*** An error occurred in MPI_Alltoall
*** on communicator MPI COMMUNICATOR 4 SPLIT FROM 0
*** An error occurred in MPI_Alltoall
*** on communicator MPI COMMUNICATOR 4 SPLIT FROM 0



Re: [OMPI devel] r18551 - brakes ob1 compilation on SLES10

2008-06-03 Thread Pavel Shamis (Pasha)

The compilation passed on SLES 10 SP1.
So SP1 resolves the gcc/binutils issue.
We need to add somewhere notice that " ompi 1.3 can not be compiled on 
SLES10 , please update ... "


Regards,
Pasha

Pavel Shamis (Pasha) wrote:

Ralf Wildenhues wrote:

* Pavel Shamis (Pasha) wrote on Mon, Jun 02, 2008 at 02:25:13PM CEST:
 

r18551 brakes ompi compilation on SLES10 gcc 4.1.0.

I got follow error on my systems  
(http://www.open-mpi.org/mtt/index.php?do_redir=672 ):



[...]
 
/usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin/ld: 
.libs/pml_ob1_sendreq.o: relocation R_X86_64_PC32 against  
`mca_pml_ob1_rndv_completion' can not be used when making a shared  
object; recompile with -fPIC



The build log shows that your clock isn't set properly, so I'd first fix
that and do a complete rebuild afterwards.  

I fixed the clock and the problem still was there.

The log also shows that
.libs/pml_ob1_sendreq.o is compiled with -fPIC, so second I'd assume a
compiler or binutils bug.  The GCC bugzilla lists
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30153
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28781
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21382
as possible starting points for further investigation.  Maybe your
distributor has fixed or newer binutils packages for you.
  

I will check for SLES 10 updates.

Cheers,
Ralf

  







Re: [OMPI devel] r18551 - brakes ob1 compilation on SLES10

2008-06-02 Thread Pavel Shamis (Pasha)

Ralf Wildenhues wrote:

* Pavel Shamis (Pasha) wrote on Mon, Jun 02, 2008 at 02:25:13PM CEST:
  

r18551 brakes ompi compilation on SLES10 gcc 4.1.0.

I got follow error on my systems  
(http://www.open-mpi.org/mtt/index.php?do_redir=672 ):



[...]
  
/usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin/ld: 
.libs/pml_ob1_sendreq.o: relocation R_X86_64_PC32 against  
`mca_pml_ob1_rndv_completion' can not be used when making a shared  
object; recompile with -fPIC



The build log shows that your clock isn't set properly, so I'd first fix
that and do a complete rebuild afterwards.  

I fixed the clock and the problem still was there.

The log also shows that
.libs/pml_ob1_sendreq.o is compiled with -fPIC, so second I'd assume a
compiler or binutils bug.  The GCC bugzilla lists
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30153
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28781
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21382
as possible starting points for further investigation.  Maybe your
distributor has fixed or newer binutils packages for you.
  

I will check for SLES 10 updates.

Cheers,
Ralf

  




[OMPI devel] r18551 - brakes ob1 compilation on Sles10

2008-06-02 Thread Pavel Shamis (Pasha)

r18551 brakes ompi compilation on SLES10 gcc 4.1.0.

I got follow error on my systems 
(http://www.open-mpi.org/mtt/index.php?do_redir=672 ):
make[2]: Entering directory 
`/.autodirect/hpc/work/pasha/tmp/mtt-8/installs/5VHm/src/openmpi-1.3a1r18553/ompi/mca/pml/ob1'
/bin/sh ../../../../libtool --tag=CC   --mode=link gcc  -g -pipe -Wall 
-Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes 
-Wstrict-prototypes -Wcomment -pedantic 
-Werror-implicit-function-declaration -finline-functions 
-fno-strict-aliasing -pthread -fvisibility=hidden -module -avoid-version 
-export-dynamic   -o mca_pml_ob1.la -rpath 
/.autodirect/hpc/work/pasha/tmp/mtt-8/installs/5VHm/install/lib/openmpi 
pml_ob1.lo pml_ob1_comm.lo pml_ob1_component.lo pml_ob1_iprobe.lo 
pml_ob1_irecv.lo pml_ob1_isend.lo pml_ob1_progress.lo pml_ob1_rdma.lo 
pml_ob1_rdmafrag.lo pml_ob1_recvfrag.lo pml_ob1_recvreq.lo 
pml_ob1_sendreq.lo pml_ob1_start.lo  -lnsl -lutil  -lm
libtool: link: gcc -shared  .libs/pml_ob1.o .libs/pml_ob1_comm.o 
.libs/pml_ob1_component.o .libs/pml_ob1_iprobe.o .libs/pml_ob1_irecv.o 
.libs/pml_ob1_isend.o .libs/pml_ob1_progress.o .libs/pml_ob1_rdma.o 
.libs/pml_ob1_rdmafrag.o .libs/pml_ob1_recvfrag.o 
.libs/pml_ob1_recvreq.o .libs/pml_ob1_sendreq.o .libs/pml_ob1_start.o   
-lnsl -lutil -lm  -pthread   -pthread -Wl,-soname -Wl,mca_pml_ob1.so -o 
.libs/mca_pml_ob1.so
/usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin/ld: 
.libs/pml_ob1_sendreq.o: relocation R_X86_64_PC32 against 
`mca_pml_ob1_rndv_completion' can not be used when making a shared 
object; recompile with -fPIC
/usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin/ld: 
final link failed: Bad value

collect2: ld returned 1 exit status

Removing inline from some of functions (see attached file), resolves the 
problem.


Thanks,
Pasha

--- pml_ob1_sendreq.c   2008-06-01 10:59:51.094063000 +0300
+++ pml_ob1_sendreq.c.new   2008-06-02 15:07:02.612983000 +0300
@@ -192,7 +192,7 @@
 MCA_PML_OB1_PROGRESS_PENDING(bml_btl);
 }

-static inline void
+static void
 mca_pml_ob1_match_completion_free( struct mca_btl_base_module_t* btl,  
struct mca_btl_base_endpoint_t* ep,
struct mca_btl_base_descriptor_t* des,
@@ -235,7 +235,7 @@
  *  Completion of the first fragment of a long message that 
  *  requires an acknowledgement
  */
-static inline void
+static void
 mca_pml_ob1_rndv_completion( mca_btl_base_module_t* btl,
  struct mca_btl_base_endpoint_t* ep,
  struct mca_btl_base_descriptor_t* des,
@@ -269,7 +269,7 @@
  * Completion of a get request.
  */

-static inline void
+static void
 mca_pml_ob1_rget_completion( mca_btl_base_module_t* btl,
  struct mca_btl_base_endpoint_t* ep,
  struct mca_btl_base_descriptor_t* des,
@@ -295,7 +295,7 @@
  * Completion of a control message - return resources.
  */

-static inline void
+static void
 mca_pml_ob1_send_ctl_completion( mca_btl_base_module_t* btl,
  struct mca_btl_base_endpoint_t* ep,
  struct mca_btl_base_descriptor_t* des,
@@ -312,7 +312,7 @@
  * to schedule additional fragments.
  */

-static inline void
+static void
 mca_pml_ob1_frag_completion( mca_btl_base_module_t* btl,
  struct mca_btl_base_endpoint_t* ep,
  struct mca_btl_base_descriptor_t* des,


Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-29 Thread Pavel Shamis (Pasha)


I got some more feedback from Roland off-list explaining that if /sys/ 
class/infiniband does exist and is non-empty and /sys/class/ 
infiniband_verbs/abi_version does not exist, then this is definitely a  
case where we want to warn because it implies that config is screwed  
up -- RDMA devices are present but not usable.
  
Is it possible that /sys/class/infiniband directory exist and it is 
empty ? In which cases ?


Re: [OMPI devel] New HCA vendor part ID

2008-05-26 Thread Pavel Shamis (Pasha)


it seems that Open MPI doen't have HCA parameters for the following 
InfiniBand adapter

which using the compute nodes of our SGI Altix ICE 8200EX:

Mellanox Technologies MT26418 ConnectX

I assume that the vendor part ID 26418 should be listed in the section 
"Mellanox Hermon"

of the HCA parameter file. Is that correct?
Thanks for pointing it out. I fixed on the trunk 
https://svn.open-mpi.org/trac/ompi/changeset/18495

Regards,
Pasha



Matthias

--
Matthias Jurenz,
Center for Information Services and
High Performance Computing (ZIH), TU Dresden,
Willersbau A106, Zellescher Weg 12, 01062 Dresden
phone +49-351-463-31945, fax +49-351-463-37773



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Memory hooks stuff

2008-05-25 Thread Pavel Shamis (Pasha)

Jeff Squyres wrote:
Who would be interested in discussing this stuff?  (me, Brian, ? 
someone from Sun?, ...?)
  

Me.



Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Pavel Shamis (Pasha)


Brian and I chatted a bit about this off-list, and I think we're in  
agreement now:


- do not change the default value or meaning of  
btl_base_want_component_unsed.


- major point of confusion: the openib BTL is actually fairly unique  
in that it can (and does) tell the difference between "there are no  
devices present" and "there are devices, but something went wrong".   
Other BTL's have network interfaces that can't tell the difference and  
can *only* call the no_nics function, regardless of whether there are  
no relevant network interfaces or some error occurred during  
initialization.


- so a reasonable solution would be an openib-BTL-specific mechanism  
that doesn't call the no_nics function (to display that  
btl_base_want_component_unused) if there are no verbs-capable devices  
found because of the fact that mainline Linuxes are starting to ship  
libibverbs.  Specific mechanism TBD; likely to be an openib MCA param.
  
Ok, we will have own warning mechanism. But we still open question, Will 
we show (by default) error message in case

when libibverbs exists but it is no hca in the hca_list ?
I think we should show the error. The problem of libibverbs default 
install is relevant only  for
binary distribution, that install all ompi dependences with ompi 
package. In this case
distribution will have openib mca parameter that will allow to disable 
by default the warning message

during ompi package install (or build).
I guess that most people still install ompi from sources. And in this 
case it sound reasonable for me

to print this "no hca"  warning it openib btl was build.

Pasha



On May 21, 2008, at 9:56 PM, Jeff Squyres wrote:

  

On May 21, 2008, at 5:02 PM, Brian W. Barrett wrote:



If this is true (for some reason I thought it wasn't), then I think
we'd
actually be ok with your proposal, but you're right, you'd need
something
new in the IB btl.  I'm not concerned about the dual rail issue -- if
you're smart enough to configure dual rail IB, you're smart enough to
figure out OMPI mca params.  I'm not sure the same is true for a
simple
delivered from the white box vendor IB setup that barely works on a
good
day (and unfortunately, there seems to be evidence that these exist).
  

I'm not sure I understand what you're saying -- you agree, but what
"new" do you think we need in the openib BTL?  The MCA params saying
which ports you expect to be ACTIVE?

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




  




Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Pavel Shamis (Pasha)
As I know only Openib kernel drivers is installed by default with 
distribution.
But the user level - libibverbs and other openib stuff is not installed 
by default. User need go to the package manager and explicitly
select libibverb.  So if user decided to install libibverbs he had 
reasons for it, and I think it will be ok to show this warning.


Pasha.

Jeff Squyres wrote:

On May 21, 2008, at 11:14 AM, Brian W. Barrett wrote:

  
I think having a parameter to turn off the warning is a great idea.   
So
great in fact, that it already exists in the trunk and v1.2 :)!   
Setting
the default value for the btl_base_warn_component_unused flag from 0  
to 1

will have the desired effect.



Ah, ok.  I either didn't know about this flag or forgot about it.  :-)

I just tested this myself and see that there are actually *two* error  
messages (on a machine where I installed libibverbs, but with no  
OpenFabrics hardware, with OMPI 1.2.6):


% mpirun -np 1 hello
libibverbs: Fatal: couldn't read uverbs ABI version.
--
[0,1,0]: OpenIB on host eddie.osl.iu.edu was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--

So the MCA param takes care of the OMPI message; I'll contact the  
libibverbs authors about their message.


  
I'm not sure I agree with setting the default to 0, however.  The  
warning
has proven extremely useful for diagnosing that IB (or less often GM  
or
MX) isn't properly configured on a compute node due to some random  
error.

It's trivially easy for any packaging group to have the line

  btl_base_warn_component_unused = 0

added to $prefix/etc/openmpi-mca-params.conf during the install  
phase of

the package build (indeed, our simple build scripts at LANL used to do
this on a regular bases due to our need to tweek the OOB to keep IPoIB
happier at scale).

I think keeping the Debian guys happy is a good thing.  Giving them an
easy way to turn off silly warnings is a good thing.  Removing a known
useful warning to help them doesn't seem like a good thing.



I guess that this is what I am torn about.  Yes, it's a useful message  
-- in some cases.  But now that libibverbs is shipping in Debain and  
other Linuxes, the number of machines out there with verbs-capable  
hardware is far, far smaller than the number of machines without verbs- 
capable hardware.  Specifically:


1. The number of cases where seeing the message by default is *not*  
useful is now potentially [much] larger than the number of cases where  
the default message is useful.


2. An out-of-the-box "mpirun a.out" will print warning messages in  
perfectly valid/good configurations (no verbs-capable hardware, but  
just happen to have libibverbs installed).  This is a Big Deal.


3. Problems with HCA hardware and/or verbs stack are uncommon  
(nowadays).  I'd be ok asking someone to enable a debug flag to get  
more information on configuration problems or hardware faults.


Shouldn't we be optimizing for the common case?

In short: I think it's no longer safe to assume that machines with  
libibverbs installed must also have verbs-capable hardware.


  




Re: [OMPI devel] Threaded progress for CPCs

2008-05-20 Thread Pavel Shamis (Pasha)


Is it possible to have sane SRQ implementation without HW flow  
control?



It seems pretty unlikely if the only available HW flow control is to  
terminate the connection.  ;-)


  

Even if we can get the iWARP semantics to work, this feels kinda
icky.  Perhaps I'm overreacting and this isn't a problem that needs  
to

be fixed -- after all, this situation is no different than what
happens after the initial connection, but it still feels icky.
  
What is so icky about it? Sender is faster than a receiver so flow  
control

kicks in.



My point is that we have no real flow control for SRQ.

  

2. The CM progress thread posts its own receive buffers when creating
a QP (which is a necessary step in both CMs).  However, this is
problematic in two cases:

  

[skip]

I don't like 1,2 and 3. :(



4. Have a separate mpool for drawing initial receive buffers for the
CM-posted RQs.  We'd probably want this mpool to be always empty (or
close to empty) -- it's ok to be slow to allocate / register more
memory when a new connection request arrives.  The memory obtained
from this mpool should be able to be returned to the "main" mpool
after it is consumed.
  

This is slightly better, but still...



Agreed; my reactions were pretty much the same as yours.

  

5. ...?
  

What about moving posting of receive buffers into main thread. With
SRQ it is easy: don't post anything in CPC thread. Main thread will
prepost buffers automatically after first fragment received on the
endpoint (in btl_openib_handle_incoming()). With PPRQ it's more
complicated. What if we'll prepost dummy buffers (not from free list)
during IBCM connection stage and will run another three way handshake
protocol using those buffers, but from the main thread. We will need  
to
prepost one buffer on the active side and two buffers on the passive  
side.




This is probably the most viable alternative -- it would be easiest if  
we did this for all CPC's, not just for IBCM:


- for PPRQ: CPCs only post a small number of receive buffers, suitable  
for another handshake that will run in the upper-level openib BTL
- for SRQ: CPCs don't post anything (because the SRQ already "belongs"  
to the upper level openib BTL)
  
Currently I Iwarp do not have SRQ at and and IMHO the SRQ in not 
possible without HW flow control

So lets resolve the problem only for PPRQ  ?

Do we have a BSRQ restriction that there *must* be at least one PPRQ?   
  

No it is not such restriction.
If so, we could always run the upper-level openib BTL really-post-the- 
buffers handshake over the smallest buffer size BSRQ RC PPRQ (i.e.,  
have the CPC post a single receive on this QP -- see below), which  
would make things much easier.  If we don't already have this  
restriction, would we mind adding it?  We have one PPRQ in our default  
receive_queues value, anyway.
  

I don't see such reason to add such restrictions, at least for IB.
We may add it for Iwarp only (actually we already have it for Iwarp)



Re: [OMPI devel] Threaded progress for CPCs

2008-05-19 Thread Pavel Shamis (Pasha)




What about moving posting of receive buffers into main thread. With
SRQ it is easy: don't post anything in CPC thread. Main thread will
prepost buffers automatically after first fragment received on the
endpoint (in btl_openib_handle_incoming()).   
It still doesn't guaranty that we will not see RNR (as I understand 
we trying to resolve this problem  for iwarp?!)




I don't think that iwarp has SRQ at all. And if it has then it should
have HW flow control for it too. I don't see what advantage SRQ without
flow control can provide over PPRQ.   

I'm agree that HW flow it is no reason for SRQ.
 
So this solution will cost 1 buffer on each srq ... sounds 
acceptable for me. But I don't see too much
difference compared to #1, as I understand  we anyway will be need 
the pipe for communication with main thread.

so why don't use #1 ?


What communication? No communication at all. Just don't prepost buffers
to SRQ during connection establishment. Problem solved (only for SRQ of
cause).  
As i know Jeff use the pipe for some status update (Jeff, please correct 
me if  I wrong).

If we still need pipe for communication , I prefer #1.
If we don't have the pipe , I prefer your solution

Pasha



Re: [OMPI devel] Threaded progress for CPCs

2008-05-19 Thread Pavel Shamis (Pasha)


1. When CM progress thread completes an incoming connection, it sends  
a command down a pipe to the main thread indicating that a new  
endpoint is ready to use.  The pipe message will be noticed by  
opal_progress() in the main thread and will run a function to do all  
necessary housekeeping (sets the endpoint state to CONNECTED, etc.).   
But it is possible that the receiver process won't dip into the MPI  
layer for a long time (and therefore not call opal_progress and the  
housekeeping function).  Therefore, it is possible that with an active  
sender and a slow receiver, the sender can overwhelm an SRQ.  On IB,  
this will just generate RNRs and be ok (we configure SRQs to have  
infinite RNRs), but I don't understand the semantics of what will  
happen on iWARP (it may terminate?  I sent an off-list question to  
Steve Wise to ask for detail -- we may have other issues with SRQ on  
iWARP if this is the case, but let's skip that discussion for now).




Is it possible to have sane SRQ implementation without HW flow control?
Anyway the described problem exists with SRQ right now too. If receiver
doesn't enter progress for a long time sender can overwhelm an SRQ.
I don't see how this can be fixed without progress thread (and I am not
even sure that this is the problem that has to be fixed).
  
It may be resolved particularly by srq_limit_event (this event is 
generated when number posted receive buffer come down under predefined 
watermark )
But I'm not sure that we want to move the RNR problem from sender side 
to receiver.


The full solution will be progress thread + srq_limit_event.

  
Even if we can get the iWARP semantics to work, this feels kinda  
icky.  Perhaps I'm overreacting and this isn't a problem that needs to  
be fixed -- after all, this situation is no different than what  
happens after the initial connection, but it still feels icky.


What is so icky about it? Sender is faster than a receiver so flow control
kicks in.

  
2. The CM progress thread posts its own receive buffers when creating  
a QP (which is a necessary step in both CMs).  However, this is  
problematic in two cases:




[skip]
 
I don't like 1,2 and 3. :(
  

If Iwarp may handle RNR , #1 - sounds ok for me, at least for 1.3.
  

4. Have a separate mpool for drawing initial receive buffers for the
CM-posted RQs.  We'd probably want this mpool to be always empty (or
close to empty) -- it's ok to be slow to allocate / register more
memory when a new connection request arrives.  The memory obtained
from this mpool should be able to be returned to the "main" mpool
after it is consumed.



This is slightly better, but still...

  

5. ...?


What about moving posting of receive buffers into main thread. With
SRQ it is easy: don't post anything in CPC thread. Main thread will
prepost buffers automatically after first fragment received on the
endpoint (in btl_openib_handle_incoming()). 
It still doesn't guaranty that we will not see RNR (as I understand we 
trying to resolve this problem  for iwarp?!)


So this solution will cost 1 buffer on each srq ... sounds acceptable 
for me. But I don't see too much
difference compared to #1, as I understand  we anyway will be need the 
pipe for communication with main thread.

so why don't use #1 ?

With PPRQ it's more
complicated. What if we'll prepost dummy buffers (not from free list)
during IBCM connection stage and will run another three way handshake
protocol using those buffers, but from the main thread. We will need to
prepost one buffer on the active side and two buffers on the passive side.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




Re: [OMPI devel] heterogeneous OpenFabrics adapters

2008-05-13 Thread Pavel Shamis (Pasha)

Jeff,
Your proposal for 1.3 sounds ok for me.

For 1.4 we need review this point again. The qp information is split 
over 3 different structs:
mca_btl_openub_module_qp_t (used by module), mca_btl_openib_qp_t (used 
by component) and mca_btl_openib_endpoint_qp_t (used by endpoint).
Need see how we will resolve the issue for each of them. Lets put it to 
1.4 todo list.


Pasha.


Jeff Squyres wrote:

Short version:
--

I propose that we should disallow multiple different  
mca_btl_openib_receive_queues values (or receive_queues values from  
the INI file) to be used in a single MPI job for the v1.3 series.


More details:
-

The reason I'm looking into this heterogeneity stuff is to help  
Chelsio support their iWARP NIC in OMPI.  Their NIC needs a specific  
value for mca_btl_openib_receive_queues (specifically: it does not  
support SRQ and it has the wireup race condition that we discussed  
before).


The major problem is that all the BSRQ information is currently stored  
in on the openib component -- it is *not* maintained on a per-HCA (or  
per port) basis.  We *could* move all the BSRQ info to live on the  
hca_t struct (or even the openib module struct), but it has at least 3  
big consequences:


1. It would touch a lot of code.  But touching all this code is  
relatively low risk; it will be easy to check for correctness because  
the changes will either compile or not.


2. There are functions (some of which are static inline) that read the  
BSRQ data.  These functions would have to take an additional (hca_t*)  
(or (btl_openib_module_t*)) parameter.


3. Getting to the BSRQ info will take at least 1 or 2 more  
dereferences (e.g., module->hca->bsrq_info or module->bsrq_info...).


I'm not too concerned about #1 (it's grunt work), but I am a bit  
concerned about #2 and #3 since at least some of these places are in  
the critical performance path.


Given these concerns, I propose the following v1.3:

- Add a "receive_queues" field to the INI file so that the Chelsio  
adapter can run out of the box (i.e., "mpirun -np 4 a.out" with hosts  
containing Chelsio NICs will get a value for btl_openib_receive_queues  
that will work).


- NetEffect NICs will also require overriding  
btl_openib_receive_queues, but will likely have a different value than  
Chelsio NICs (they don't have the wireup race condition that Chelsio  
does).


- Because the BSRQ info is on the component (i.e., global), we should  
detect when multiple different receive_queues values are specified and  
gracefully abort.


I think it'll be quite uncommon to have a need for two different  
receive_queues values, and that this proposal will be fine for v1.3


Comments?



On May 12, 2008, at 6:44 PM, Jeff Squyres wrote:

  

After looking at the code a bit, I realized that I completely forgot
that the INI file was invented to solve at least the heterogeneous-
adapters-in-a-host problem.

So I amended the ticket to reflect that that problem is already
solved.  The other part is not, though -- consider two MPI procs on
different hosts, each with an iWARP NIC, but one NIC supports SRQs and
one does not.


On May 12, 2008, at 5:36 PM, Jeff Squyres wrote:



I think that this issue has come up before, but I filed a ticket
about it because at least one developer (Jon) has a system with both
IB and iWARP adapters:

  https://svn.open-mpi.org/trac/ompi/ticket/1282

My question: do we care about the heterogeneous adapter scenarios?
For v1.3?  For v1.4?  For ...some version in the future?

I think the first issue I identified in the ticket is grunt work to
fix (annoying and tedious, but not difficult), but the second one
will be a little dicey -- it has scalability issues (e.g., sending
around all info in the modex, etc.).

--
Jeff Squyres
Cisco Systems

  

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
    



  



--
Pavel Shamis (Pasha)
Mellanox Technologies



Re: [OMPI devel] Merging in the CPC work

2008-04-24 Thread Pavel Shamis (Pasha)

The patch below resolves the segfault :

-- btl_openib_connect_ibcm.c.orig  2008-04-24 15:14:28.500676000 +0300
+++ btl_openib_connect_ibcm.c   2008-04-24 15:15:08.961168000 +0300
@@ -328,7 +328,7 @@
{
int rc;
modex_msg_t *msg;
-ibcm_module_t *m;
+ibcm_module_t *m = NULL;
opal_list_item_t *item;
ibcm_listen_cm_id_t *cmh;
ibcm_module_list_item_t *imli;


Jeff Squyres wrote:
I had a linker error with the rdmacm library yesterday that I fixed 
later, sorry.


Could you try it again?  You'll need to svn up, re-autogen, etc.  It 
should be obvious whether I fixed it -- even trivial apps will work or 
not work.


Thanks.


On Apr 24, 2008, at 6:24 AM, Gleb Natapov wrote:


On Thu, Apr 24, 2008 at 11:50:10AM +0300, Pavel Shamis (Pasha) wrote:

Jeff,
All my tests fail.
XRC disabled tests failed with:
mtt/installs/Zq_9/install/lib/openmpi/mca_btl_openib.so: undefined
symbol: rdma_create_event_channel
XRC enabled failed with segfault , I will take a look later today.

Well it is a little bit better for me. I compiled only OOB connection
manager and ompi passes simple testing.



Pasha

Jeff Squyres wrote:

As we discussed yesterday, I have started the merge from the /tmp-
public/openib-cpc2 branch.  "oob" is currently the default.

Unfortunately, it caused quite a few conflicts when I merged with the
trunk, so I created a new temp branch and put all the work there: 
/tmp-

public/openib-cpc3.

Could all the IB and iWARP vendors and any other interested parties
please try this branch before we bring it back to the trunk?  Please
test all functionality that you care about -- XRC, etc.  I'd like to
bring it back to the trunk COB Thursday.  Please let me know if this
is too soon.

You can force the selection of a different CPC with the
btl_openib_cpc_include MCA param:

mpirun --mca btl_openib_cpc_include oob ...
mpirun --mca btl_openib_cpc_include xoob ...
mpirun --mca btl_openib_cpc_include rdma_cm ...
mpirun --mca btl_openib_cpc_include ibcm ...

You might want to concentrate on testing oob and xoob to ensure that
we didn't cause any regressions.  The ibcm and rdma_cm CPCs probably
still have some rough edges (and the IBCM package in OFED itself may
not be 100% -- that's one of the things we're evaluating.  It's known
to not install properly on RHEL4U4, for example -- you have to
manually mknod and chmod a device in /dev/infiniband for every HCA in
the host).

Thanks.





--
Pavel Shamis (Pasha)
Mellanox Technologies

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel






--
Pavel Shamis (Pasha)
Mellanox Technologies



Re: [OMPI devel] Merging in the CPC work

2008-04-24 Thread Pavel Shamis (Pasha)

Jeff,
All my tests fail.
XRC disabled tests failed with:
mtt/installs/Zq_9/install/lib/openmpi/mca_btl_openib.so: undefined 
symbol: rdma_create_event_channel

XRC enabled failed with segfault , I will take a look later today.

Pasha

Jeff Squyres wrote:
As we discussed yesterday, I have started the merge from the /tmp- 
public/openib-cpc2 branch.  "oob" is currently the default.


Unfortunately, it caused quite a few conflicts when I merged with the  
trunk, so I created a new temp branch and put all the work there: /tmp- 
public/openib-cpc3.


Could all the IB and iWARP vendors and any other interested parties  
please try this branch before we bring it back to the trunk?  Please  
test all functionality that you care about -- XRC, etc.  I'd like to  
bring it back to the trunk COB Thursday.  Please let me know if this  
is too soon.


You can force the selection of a different CPC with the  
btl_openib_cpc_include MCA param:


 mpirun --mca btl_openib_cpc_include oob ...
 mpirun --mca btl_openib_cpc_include xoob ...
 mpirun --mca btl_openib_cpc_include rdma_cm ...
 mpirun --mca btl_openib_cpc_include ibcm ...

You might want to concentrate on testing oob and xoob to ensure that  
we didn't cause any regressions.  The ibcm and rdma_cm CPCs probably  
still have some rough edges (and the IBCM package in OFED itself may  
not be 100% -- that's one of the things we're evaluating.  It's known  
to not install properly on RHEL4U4, for example -- you have to  
manually mknod and chmod a device in /dev/infiniband for every HCA in  
the host).


Thanks.

  



--
Pavel Shamis (Pasha)
Mellanox Technologies



Re: [OMPI devel] Setting CQ depth

2008-02-26 Thread Pavel Shamis (Pasha)

Please apply. It should be cqe.
Pasha.

Jon Mason wrote:

A quick sanity check.

When setting the cq depth in the openib btl, it checks the calculated
depth against the maxmium cq depth allowed and sets the minimum of those
two.  However, I think it is checking the wrong variable.  If I
understand correctly, ib_dev_attr.max_cq represents the maximum number
of cqs while ib_dev_attr.max_cqe represents the max depth allowed in
each individual cq.  Is this correct?

If the above is true, then I'll apply the patch below.

Thanks,
Jon

Index: ompi/mca/btl/openib/btl_openib.c
===
--- ompi/mca/btl/openib/btl_openib.c(revision 17472)
+++ ompi/mca/btl/openib/btl_openib.c(working copy)
@@ -140,8 +140,8 @@
  if(cq_size < mca_btl_openib_component.ib_cq_size[cq])
 cq_size = mca_btl_openib_component.ib_cq_size[cq];

-if(cq_size > (uint32_t)hca->ib_dev_attr.max_cq)
-cq_size = hca->ib_dev_attr.max_cq;
+if(cq_size > (uint32_t)hca->ib_dev_attr.max_cqe)
+cq_size = hca->ib_dev_attr.max_cqe;

 if(NULL == hca->ib_cq[cq]) {
 hca->ib_cq[cq] = ibv_create_cq_compat(hca->ib_dev_context, cq_size,
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  



--
Pavel Shamis (Pasha)
Mellanox Technologies



Re: [OMPI devel] 1.3 Release schedule and contents

2008-02-21 Thread Pavel Shamis (Pasha)

Brad,
APM code was committed to trunk.
So you may mark it as done.

Thanks,
Pasha.

Brad Benton wrote:

All:

The latest scrub of the 1.3 release schedule and contents is ready for 
review and comment.  Please use the following links:
  1.3 milestones:  
https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3
  1.3.1 milestones: 
https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3.1


In order to try and keep the dates for 1.3 in, I've pushed a bunch of 
stuff (particularly ORTE things) to 1.3.1.  Even though there will be 
new functionality slated for 1.3.1, the goal is to not have any 
interface changes between the phases.


Please look over the list and schedules and let me or my fellow 1.3 
co-release manager George Bosilca (bosi...@eecs.utk.edu 
<mailto:bosi...@eecs.utk.edu>) know of any issues, errors, 
suggestions, omissions, heartburn, etc.


Thanks,
--Brad

Brad Benton
IBM


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Pavel Shamis (Pasha)
Mellanox Technologies



Re: [OMPI devel] open ib btl and xrc

2008-01-20 Thread Pavel Shamis (Pasha)


It easy to see the benefit of fewer qps (per node instead of per peer) 
and less consumption of resources the better but I am curious about 
the actual percentage of memory footprint decrease. I am thinking that 
the largest portion of the footprint comes from the fragments.
BTW here is link to another paper 
http://www.cs.sandia.gov/~rbbrigh/papers/ompi-ib-pvmmpi07.pdf

that talks about more efficient usage of receive buffers.
Do you have any numbers showing the actual memory footprint savings 
when using xrc?

I don't have.

Pasha.


-DON

Pavel Shamis (Pasha) wrote:
Here is paper from openib 
http://www.openib.org/archives/nov2007sc/XRC.pdf
and here is mvapich presentation 
http://mvapich.cse.ohio-state.edu/publications/ofa_nov07-mvapich-xrc.pdf


Button line: XRC decrease number of QPs that ompi opens and as result 
decrease ompi's memory footprint.
In the openib paper you may see more details about XRC. If you need 
more details about XRC implemention

in openib blt , please let me know.


Instead Don Kerr wrote:
 

Hi,

After searching, about the only thing I can find on xrc is what it 
stands for, can someone explain the benefits of open mpi's use of 
xrc, maybe point me to a paper, or both?


TIA
-DON

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  



  





--
Pavel Shamis (Pasha)
Mellanox Technologies



Re: [OMPI devel] open ib btl and xrc

2008-01-17 Thread Pavel Shamis (Pasha)

Here is paper from openib http://www.openib.org/archives/nov2007sc/XRC.pdf
and here is mvapich presentation 
http://mvapich.cse.ohio-state.edu/publications/ofa_nov07-mvapich-xrc.pdf


Button line: XRC decrease number of QPs that ompi opens and as result 
decrease ompi's memory footprint.
In the openib paper you may see more details about XRC. If you need more 
details about XRC implemention

in openib blt , please let me know.


Instead 
Don Kerr wrote:

Hi,

After searching, about the only thing I can find on xrc is what it 
stands for, can someone explain the benefits of open mpi's use of xrc, 
maybe point me to a paper, or both?


TIA
-DON

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  



--
Pavel Shamis (Pasha)
Mellanox Technologies



Re: [OMPI devel] Open IB BTL development question

2008-01-17 Thread Pavel Shamis (Pasha)

I plan to add IB APM support  (not something specific to OFED)

Don Kerr wrote:
Looking at the list of new features for OFED 1.3 and seeing that support 
for XRC went into the trunk I am curious if support for additional OFED 
1.3 features will be included, or plan to be included in Open MPI? 

I am looking at the list of features here: 
http://64.233.167.104/search?q=cache:RXXOrY36QHcJ:www.openib.org/archives/nov2007sc/OFED%25201.3%2520status.ppt+ofed+1.3+feature=en=clnk=3=us=firefox-a
but I do not have any specific feature in mind, just wanted to get an 
idea what others are planning.


Thanks
-DON
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  



--
Pavel Shamis (Pasha)
Mellanox Technologies



Re: [OMPI devel] [PATCH] openib btl: extensable cpc selection enablement

2008-01-14 Thread Pavel Shamis (Pasha)

Jon Mason wrote:
  

I have few machines with connectX and i will try to run MTT on Sunday.



Awesome!  I appreciate it.

  
After fixing the compilation problem in XRC part of code I was able to 
run mtt. Most of the test pass and one test
failed: mpi2c++_dynamics_test. The test pass without XRC. But I also see 
the test failed in trunk. Last time that is working is 1.3a1r17085

strange
Pasha.


Re: [OMPI devel] [PATCH] openib btl: extensable cpc selection enablement

2008-01-10 Thread Pavel Shamis (Pasha)

Jon Mason wrote:

The new cpc selection framework is now in place.  The patch below allows
for dynamic selection of cpc methods based on what is available.  It
also allows for inclusion/exclusions of methods.  It even futher allows
for modifying the priorities of certain cpc methods to better determine
the optimal cpc method.

This patch also contains XRC compile time disablement (per Jeff's
patch).
  
Need make sure that CM stuff will be disabled during compile tine if it 
is not installed on machine.

At a high level, the cpc selections works by walking through each cpc
and allowing it to test to see if it is permissable to run on this
mpirun.  It returns a priority if it is permissable or a -1 if not.  All
of the cpc names and priorities are rolled into a string.  This string
is then encapsulated in a message and passed around all the ompi
processes.  Once received and unpacked, the list received is compared
to a local copy of the list.  The connection method is chosen by
comparing the lists passed around to all nodes via modex with the list
generated locally.  Any non-negative number is a potentially valid
connection method.  The method below of determining the optimal
connection method is to take the cross-section of the two lists.  The
highest single value (and the other side being non-negative) is selected
as the cpc method.

Please test it out.  The tree can be found at
https://svn.open-mpi.org/svn/ompi/tmp-public/openib-cpc/

This patch has been tested with IB and iWARP adapters on a 2 node system
(with it correctly choosing to use oob and happily ignoring iWARP
adapters).  It needs XRC testing and testing of larger node systems.
  

Did you run MTT over all thess changes ?
I have few machines with connectX and i will try to run MTT on Sunday.


--
Pavel Shamis (Pasha)
Mellanox Technologies



Re: [OMPI devel] openib xrc CPC minor nit

2007-12-23 Thread Pavel Shamis (Pasha)

Jeff Squyres wrote:

Pasha --

I notice in the port info struct that you have a member for the lid,  
but only #if HAVE_XRC.  Per a comment in the code, this is supposed to  
save bytes when we're using OOB (because we don't need this value in  
the OOB CPC).


I think we should remove this #if and always have this struct member.   
~4 extra bytes (because it's DSS packed) is no big deal.  It's packed  
in with all the other modex info, so the message is already large.  4  
more bytes per port won't make a difference (IMHO).


And keep in mind that #if HAVE_XRC is true if XRC is supported -- we  
still send the extra bytes if XRC is supported and not used (which is  
the default when compiling for OFED 1.3, no?).
  
Yep, in OFED 1.3 the xrc is already supported and the #if always will be 
true.
So I think we should remove those #if's and just always have that data  
member there.  It's up to the CPC's if they want to use that info or  
not.


Any objections to me removing this #if on the openib-cpc branch?  (and  
eventual merge back up to the trunk)
  

I have no objections. We should remove it.


--
Pavel Shamis (Pasha)
Mellanox Technologies



Re: [OMPI devel] Fwd: [OMPI svn-full] svn:open-mpi r16959

2007-12-16 Thread Pavel Shamis (Pasha)

Tested.
XRC works.

Pasha

Pavel Shamis (Pasha) wrote:

I will try it.

Jeff Squyres wrote:
  
This commit does what we previously discussed: it only compiles the  
XOOB openib CPC if XRC support is actually present (vs. having a stub  
XOOB when XRC is not present).  This is on the /tmp-public/openib-cpc  
branch.


I have some hermon hca's, but due to dumb issues, I don't have XRC- 
capable OFED on those nodes yet.  It'll probably take me a few more  
days before I have that ready.


Could someone try the openib-cpc tmp branch and ensure I didn't break  
the case where XRC support is available?  It is easy to tell if the  
XOOB CPC compiled in -- run this command:


ompi_info --param btl openib --parsable | grep xoob

If the output is empty, then XOOB was not compiled in.  If you see  
output, then XOOB was compiled in.


Thanks!



Begin forwarded message:

  


From: jsquy...@osl.iu.edu
Date: December 14, 2007 12:10:24 PM EST
To: svn-f...@open-mpi.org
Subject: [OMPI svn-full] svn:open-mpi r16959
Reply-To: de...@open-mpi.org

Author: jsquyres
Date: 2007-12-14 12:10:23 EST (Fri, 14 Dec 2007)
New Revision: 16959
URL: https://svn.open-mpi.org/trac/ompi/changeset/16959

Log:
Only compile in the XOOB CPC if a) configure found that we have XRC
support available and b) the user didn't disable connectx support.

Text files modified:
  tmp-public/openib-cpc/config/ 
ompi_check_openib.m4   | 3 ++-
  tmp-public/openib-cpc/ompi/mca/btl/openib/ 
Makefile.am   | 8 ++--
  tmp-public/openib-cpc/ompi/mca/btl/openib/ 
configure.m4  | 8 
  tmp-public/openib-cpc/ompi/mca/btl/openib/connect/ 
btl_openib_connect_base.c | 2 ++
  tmp-public/openib-cpc/ompi/mca/btl/openib/connect/ 
btl_openib_connect_xoob.c |23 ---

  5 files changed, 18 insertions(+), 26 deletions(-)

Modified: tmp-public/openib-cpc/config/ompi_check_openib.m4
= 
= 
= 
= 
= 
= 
= 
= 
==

--- tmp-public/openib-cpc/config/ompi_check_openib.m4   (original)
+++ tmp-public/openib-cpc/config/ompi_check_openib.m4	2007-12-14  
12:10:23 EST (Fri, 14 Dec 2007)

@@ -102,7 +102,8 @@
AS_IF([test "$ompi_check_openib_happy" = "yes"],
  [AC_CHECK_DECLS([IBV_EVENT_CLIENT_REREGISTER], [], [],
  [#include ])
-   AC_CHECK_FUNCS([ibv_get_device_list ibv_resize_cq  
ibv_open_xrc_domain])])

+   AC_CHECK_FUNCS([ibv_get_device_list ibv_resize_cq])
+   AC_CHECK_FUNCS([ibv_open_xrc_domain], [$1_have_xrc=1])])

CPPFLAGS="$ompi_check_openib_$1_save_CPPFLAGS"
LDFLAGS="$ompi_check_openib_$1_save_LDFLAGS"

Modified: tmp-public/openib-cpc/ompi/mca/btl/openib/Makefile.am
= 
= 
= 
= 
= 
= 
= 
= 
==

--- tmp-public/openib-cpc/ompi/mca/btl/openib/Makefile.am   (original)
+++ tmp-public/openib-cpc/ompi/mca/btl/openib/Makefile.am	2007-12-14  
12:10:23 EST (Fri, 14 Dec 2007)

@@ -55,14 +55,18 @@
connect/btl_openib_connect_base.c \
connect/btl_openib_connect_oob.c \
connect/btl_openib_connect_oob.h \
-connect/btl_openib_connect_xoob.c \
-connect/btl_openib_connect_xoob.h \
connect/btl_openib_connect_rdma_cm.c \
connect/btl_openib_connect_rdma_cm.h \
connect/btl_openib_connect_ibcm.c \
connect/btl_openib_connect_ibcm.h \
connect/connect.h

+if MCA_btl_openib_have_xrc
+sources += \
+connect/btl_openib_connect_xoob.c \
+connect/btl_openib_connect_xoob.h
+endif
+
# Make the output library in this directory, and name it either
# mca__.la (for DSO builds) or libmca__.la
# (for static builds).

Modified: tmp-public/openib-cpc/ompi/mca/btl/openib/configure.m4
= 
= 
= 
= 
= 
= 
= 
= 
==

--- tmp-public/openib-cpc/ompi/mca/btl/openib/configure.m4  (original)
+++ tmp-public/openib-cpc/ompi/mca/btl/openib/configure.m4	 
2007-12-14 12:10:23 EST (Fri, 14 Dec 2007)

@@ -18,6 +18,14 @@
# $HEADER$
#

+# MCA_btl_openib_POST_CONFIG([should_build])
+# --
+AC_DEFUN([MCA_btl_openib_POST_CONFIG], [
+AS_IF([test $1 -eq 0 -a "$enable_dist" = "yes"],
+  [AC_MSG_ERROR([BTL openib is disabled but --enable-dist  
specifed.  This will result in a bad tarball.  Aborting configure.])])
+AM_CONDITIONAL([MCA_btl_openib_have_xrc], [test $1 -eq 1 -a "x 
$btl_openib_have_xrc" = "x1" -a "x$ompi_want_connectx_xrc" = "x1"])

+])
+

# MCA_btl_openib_CONFIG([action-if-can-compile],
#  [action-if-cant-compile])

Modified: tmp-public/openib-cpc/ompi/mca/btl/openib/connect/ 
btl_openib_connect_base.c
= 
= 
= 
= 
= 
= 
= 
= 
==
--- tmp-public/openib-cpc/ompi/

Re: [OMPI devel] Fwd: [OMPI svn-full] svn:open-mpi r16959

2007-12-16 Thread Pavel Shamis (Pasha)

I will try it.

Jeff Squyres wrote:
This commit does what we previously discussed: it only compiles the  
XOOB openib CPC if XRC support is actually present (vs. having a stub  
XOOB when XRC is not present).  This is on the /tmp-public/openib-cpc  
branch.


I have some hermon hca's, but due to dumb issues, I don't have XRC- 
capable OFED on those nodes yet.  It'll probably take me a few more  
days before I have that ready.


Could someone try the openib-cpc tmp branch and ensure I didn't break  
the case where XRC support is available?  It is easy to tell if the  
XOOB CPC compiled in -- run this command:


ompi_info --param btl openib --parsable | grep xoob

If the output is empty, then XOOB was not compiled in.  If you see  
output, then XOOB was compiled in.


Thanks!



Begin forwarded message:

  

From: jsquy...@osl.iu.edu
Date: December 14, 2007 12:10:24 PM EST
To: svn-f...@open-mpi.org
Subject: [OMPI svn-full] svn:open-mpi r16959
Reply-To: de...@open-mpi.org

Author: jsquyres
Date: 2007-12-14 12:10:23 EST (Fri, 14 Dec 2007)
New Revision: 16959
URL: https://svn.open-mpi.org/trac/ompi/changeset/16959

Log:
Only compile in the XOOB CPC if a) configure found that we have XRC
support available and b) the user didn't disable connectx support.

Text files modified:
  tmp-public/openib-cpc/config/ 
ompi_check_openib.m4   | 3 ++-
  tmp-public/openib-cpc/ompi/mca/btl/openib/ 
Makefile.am   | 8 ++--
  tmp-public/openib-cpc/ompi/mca/btl/openib/ 
configure.m4  | 8 
  tmp-public/openib-cpc/ompi/mca/btl/openib/connect/ 
btl_openib_connect_base.c | 2 ++
  tmp-public/openib-cpc/ompi/mca/btl/openib/connect/ 
btl_openib_connect_xoob.c |23 ---

  5 files changed, 18 insertions(+), 26 deletions(-)

Modified: tmp-public/openib-cpc/config/ompi_check_openib.m4
= 
= 
= 
= 
= 
= 
= 
= 
==

--- tmp-public/openib-cpc/config/ompi_check_openib.m4   (original)
+++ tmp-public/openib-cpc/config/ompi_check_openib.m4	2007-12-14  
12:10:23 EST (Fri, 14 Dec 2007)

@@ -102,7 +102,8 @@
AS_IF([test "$ompi_check_openib_happy" = "yes"],
  [AC_CHECK_DECLS([IBV_EVENT_CLIENT_REREGISTER], [], [],
  [#include ])
-   AC_CHECK_FUNCS([ibv_get_device_list ibv_resize_cq  
ibv_open_xrc_domain])])

+   AC_CHECK_FUNCS([ibv_get_device_list ibv_resize_cq])
+   AC_CHECK_FUNCS([ibv_open_xrc_domain], [$1_have_xrc=1])])

CPPFLAGS="$ompi_check_openib_$1_save_CPPFLAGS"
LDFLAGS="$ompi_check_openib_$1_save_LDFLAGS"

Modified: tmp-public/openib-cpc/ompi/mca/btl/openib/Makefile.am
= 
= 
= 
= 
= 
= 
= 
= 
==

--- tmp-public/openib-cpc/ompi/mca/btl/openib/Makefile.am   (original)
+++ tmp-public/openib-cpc/ompi/mca/btl/openib/Makefile.am	2007-12-14  
12:10:23 EST (Fri, 14 Dec 2007)

@@ -55,14 +55,18 @@
connect/btl_openib_connect_base.c \
connect/btl_openib_connect_oob.c \
connect/btl_openib_connect_oob.h \
-connect/btl_openib_connect_xoob.c \
-connect/btl_openib_connect_xoob.h \
connect/btl_openib_connect_rdma_cm.c \
connect/btl_openib_connect_rdma_cm.h \
connect/btl_openib_connect_ibcm.c \
connect/btl_openib_connect_ibcm.h \
connect/connect.h

+if MCA_btl_openib_have_xrc
+sources += \
+connect/btl_openib_connect_xoob.c \
+connect/btl_openib_connect_xoob.h
+endif
+
# Make the output library in this directory, and name it either
# mca__.la (for DSO builds) or libmca__.la
# (for static builds).

Modified: tmp-public/openib-cpc/ompi/mca/btl/openib/configure.m4
= 
= 
= 
= 
= 
= 
= 
= 
==

--- tmp-public/openib-cpc/ompi/mca/btl/openib/configure.m4  (original)
+++ tmp-public/openib-cpc/ompi/mca/btl/openib/configure.m4	 
2007-12-14 12:10:23 EST (Fri, 14 Dec 2007)

@@ -18,6 +18,14 @@
# $HEADER$
#

+# MCA_btl_openib_POST_CONFIG([should_build])
+# --
+AC_DEFUN([MCA_btl_openib_POST_CONFIG], [
+AS_IF([test $1 -eq 0 -a "$enable_dist" = "yes"],
+  [AC_MSG_ERROR([BTL openib is disabled but --enable-dist  
specifed.  This will result in a bad tarball.  Aborting configure.])])
+AM_CONDITIONAL([MCA_btl_openib_have_xrc], [test $1 -eq 1 -a "x 
$btl_openib_have_xrc" = "x1" -a "x$ompi_want_connectx_xrc" = "x1"])

+])
+

# MCA_btl_openib_CONFIG([action-if-can-compile],
#  [action-if-cant-compile])

Modified: tmp-public/openib-cpc/ompi/mca/btl/openib/connect/ 
btl_openib_connect_base.c
= 
= 
= 
= 
= 
= 
= 
= 
==
--- tmp-public/openib-cpc/ompi/mca/btl/openib/connect/ 
btl_openib_connect_base.c	(original)
+++ tmp-public/openib-cpc/ompi/mca/btl/openib/connect/ 
btl_openib_connect_base.c	2007-12-14 12:10:23 EST 

Re: [OMPI devel] [PATCH] openib: clean-up connect to allow for new cm's

2007-12-12 Thread Pavel Shamis (Pasha)

Gleb Natapov wrote:

On Wed, Dec 12, 2007 at 03:37:26PM +0200, Pavel Shamis (Pasha) wrote:
  

Gleb Natapov wrote:


On Tue, Dec 11, 2007 at 08:16:07PM -0500, Jeff Squyres wrote:
  
  
Isn't there a better way somehow?  Perhaps we should have "select"  
call *all* the functions and accept back a priority.  The one with the  
highest priority then wins.  This is quite similar to much of the  
other selection logic in OMPI.


Sidenote: Keep in mind that there are some changes coming to select  
CPCs on a per-endpoint basis (I can't look up the trac ticket right  
now...).  This makes things a little complicated -- do we need  
btl_openib_cpc_include and btl_openib_cpc_exclude MCA params to  
include/exclude CPCs (because you might need more than one CPC in a  
single job)?  That wouldn't be hard to do.


But then what to do about if someone sets to use some XRC QPs and  
selects to use OOB or RDMA CM?  How do we catch this and print an  
error?  It doesn't seem right to put the "if num_xrc_qps>0" check in  
every CPC.  What happens if you try to make an XRC QP when not using  
xoob?  Where is the error detected and what kind of error message do  
we print?





In my opinion "X" notation for QP specification should be removed. I
didn't want this to prevent XRC merging so I haven't raced this point.
It is enough to have two types of QPs "P" - SW credit management "S" -
HW credit management. 
  

How will you decide witch QP type to use ? (SRQ or XRC)



If both sides support XOOB and priority of XOOB is higher then all other CPC
then create XRC, otherwise use regular RC.
  

If some body have connectX hca but  he want to use SRQ and not XRC ?
I guess anyway we will be need some additional parameter that will allow 
enable/disable XRC, correct ? (So why just not leave the X qp type ?)




Re: [OMPI devel] [PATCH] openib btl: remove excess ompi_btl_openib_connect_base_open call

2007-12-06 Thread Pavel Shamis (Pasha)

:-)
Nice catch. Please commit the fix.

Pasha.

Jeff Squyres wrote:

Hah!  Sweet; good catch -- feel free to delete that extra call.


On Dec 5, 2007, at 6:42 PM, Jon Mason wrote:

  

There is a double call to ompi_btl_openib_connect_base_open in
mca_btl_openib_mca_setup_qps().  It looks like someone just forgot to
clean-up the previous call when they added the check for the return
code.

I ran a quick IMB test over IB to verify everything is still working.

Thanks,
Jon


Index: ompi/mca/btl/openib/btl_openib_mca.c
===
--- ompi/mca/btl/openib/btl_openib_mca.c(revision 16855)
+++ ompi/mca/btl/openib/btl_openib_mca.c(working copy)
@@ -672,10 +672,7 @@
mca_btl_openib_component.credits_qp = smallest_pp_qp;

/* Register any MCA params for the connect pseudo-components */
-
-ompi_btl_openib_connect_base_open();
-
-if ( OMPI_SUCCESS != ompi_btl_openib_connect_base_open())
+if (OMPI_SUCCESS != ompi_btl_openib_connect_base_open())
goto error;

ret = OMPI_SUCCESS;
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




  




Re: [OMPI devel] tmp XRC branches

2007-12-02 Thread Pavel Shamis (Pasha)

Jeff Squyres wrote:
Are any of the XRC tmp SVN branches still relevant?  Or have they now  
been integrated into the trunk?


I ask because I see 4 XRC-related branches out there under /tmp and / 
tmp-public.


  


I removed my XRC stuff from tmp repositories (3 of 4)
Pasha





Re: [OMPI devel] openib btl header caching

2007-08-13 Thread Pavel Shamis (Pasha)

Brian Barrett wrote:

On Aug 13, 2007, at 9:33 AM, George Bosilca wrote:


On Aug 13, 2007, at 11:28 AM, Pavel Shamis (Pasha) wrote:


Jeff Squyres wrote:

I guess reading the graph that Pasha sent is difficult; Pasha -- can
you send the actual numbers?


Ok here is the numbers on my machines:
0 bytes
mvapich with header caching: 1.56
mvapich without  header caching: 1.79
ompi 1.2: 1.59

So on zero bytes ompi not so bad. Also we can see that header caching
decrease the mvapich latency on 0.23

1 bytes
mvapich with header caching: 1.58
mvapich without  header caching: 1.83
ompi 1.2: 1.73

And here ompi make some latency jump.

In mvapich the header caching decrease the header size from 56bytes to
12bytes.
What is the header size (pml + btl) in ompi ?


The match header size is 16 bytes, so it looks like ours is already
optimized ...


Pasha -- Is your build of Open MPI built with 
--disable-heterogeneous?  If not, our headers all grow slightly to 
support heterogeneous operations.  For the heterogeneous case, a 1 
byte message includes:
I didn't build with "--disable-heterogeneous". So the heterogeneous 
support was enabled in the build


  16 bytes for the match header
  4 bytes for the Open IB header
  1 byte for the payload
 
  21 bytes total

If you are using eager RDMA, there's an extra 4 bytes for the RDMA 
length in the footer.  Without heterogeneous support, 2 bytes get 
knocked off the size of the match header, so the whole thing will be 
19 bytes (+ 4 for the eager RDMA footer).
I used eager rdma - it is faster than send.  So the message size on the 
wire for 1 byte in my case was - 25bytes  VS 13bytes in mvapich. And If 
i will --disable-heterogeneous it will decrease 2 bytes. So it sound 
like we are pretty optimized.




There are also considerably more ifs in the code if heterogeneous is 
used, especially on x86 machines.


Brian





Re: [OMPI devel] openib btl header caching

2007-08-13 Thread Pavel Shamis (Pasha)

George Bosilca wrote:


On Aug 13, 2007, at 11:28 AM, Pavel Shamis (Pasha) wrote:


Jeff Squyres wrote:

I guess reading the graph that Pasha sent is difficult; Pasha -- can
you send the actual numbers?


Ok here is the numbers on my machines:
0 bytes
mvapich with header caching: 1.56
mvapich without  header caching: 1.79
ompi 1.2: 1.59

So on zero bytes ompi not so bad. Also we can see that header caching
decrease the mvapich latency on 0.23

1 bytes
mvapich with header caching: 1.58
mvapich without  header caching: 1.83
ompi 1.2: 1.73

And here ompi make some latency jump.

In mvapich the header caching decrease the header size from 56bytes to
12bytes.
What is the header size (pml + btl) in ompi ?


The match header size is 16 bytes, so it looks like ours is already 
optimized ...
So for 0 bytes message we are sending only 16bytes on the wire , is it 
correct ?



Pasha.


  george.



Pasha
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel







Re: [OMPI devel] problems with openib finalize

2007-07-23 Thread Pavel Shamis (Pasha)
Just committed r15557  that adds finalize 
flow to mpool. So now openib should be able to release

all resources in normal way.
Pasha


Pavel Shamis (Pasha) wrote:

Jeff Squyres wrote:
  
Background: Pasha added a call in the openib BTL finalize function  
that will only succeed if all registered memory has been released  
(ibv_dealloc_pd()).  Since the test app didn't call MPI_FREE_MEM,  
there was some memory that was still registered, and therefore the  
call in finalize failed.  We treated this as a fatal error.  Last  
night's MTT runs turned up several apps that exhibited this fatal error.


While we're examining this problem, Pasha has removed the call to  
ibv_dealloc_pd() in the trunk openib BTL finalize.


I examined 1 of the tests that was failing last night in MTT:  
onesided/t.f90.  This test has an MPI_ALLOC_MEM with no corresponding  
MPI_FREE_MEM.  To investigate this problem, I restored the call to  
ibv_dealloc_pd() and re-ran the t.f90 test -- the problem still  
occurs.  Good.


However, once I got the right MPI_FREE_MEM call in t.f90, the test  
started passing.  I.e., ibv_dealloc_pd(hca->ib_pd) succeeds because  
all registered memory has been released.  Hence, the test itself was  
faulty.


However, I don't think we should *error* if we fail to ibv_dealloc_pd 
(hca->ib_pd); it's a user error, but it's not catastrophic unless  
we're trying to do an HCA restart scenario.  Specifically: during a  
normal MPI_FINALIZE, who cares?


I think we should do the following:

1. If we're not doing an HCA restart/checkpoint and we fail to  
ibv_dealloc_pd(), just move on (i.e., it's not a warning/error unless  
we *want* a warning, such as if an MCA parameter  
btl_openib_warn_if_finalize_fail is enabled, or somesuch).


2. If we *are* doing an HCA restart/checkpoint and ibv_dealloc_pd()  
fails, then we have to gracefully fail to notify upper layers that  
Bad Things happened (I suspect that we need mpool finalize  
implemented to properly implement checkpointing for RDMA networks).


3. Add a new MCA parameter named mpi_show_mpi_alloc_mem_leaks that,  
when enabled, shows a warning in ompi_mpi_finalize() if there is  
still memory allocated by MPI_ALLOC_MEM that was not freed by  
MPI_FREE_MEM (this MCA parameter will parallel the already-existing  
mpi_show_handle_leaks MCA param which displays warnings if the app  
creates MPI objects but does not free them).


My points:
- leaked MPI_ALLOC_MEM memory should be reported by the MPI layer,  
not a BTL or mpool
- failing to ibv_dealloc_pd() during MPI_FINALIZE should only trigger  
a warning if the user wants to see it
- failing to ibv_dealloc_pd() during an HCA restart or checkpoint  
should gracefully fail upwards


Comments?
  


Agree.

In addition I will add code that will flush all user data from mpool and 
will allow normal IB finalization.



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




Re: [OMPI devel] problems with openib finalize

2007-07-19 Thread Pavel Shamis (Pasha)

Jeff Squyres wrote:
Background: Pasha added a call in the openib BTL finalize function  
that will only succeed if all registered memory has been released  
(ibv_dealloc_pd()).  Since the test app didn't call MPI_FREE_MEM,  
there was some memory that was still registered, and therefore the  
call in finalize failed.  We treated this as a fatal error.  Last  
night's MTT runs turned up several apps that exhibited this fatal error.


While we're examining this problem, Pasha has removed the call to  
ibv_dealloc_pd() in the trunk openib BTL finalize.


I examined 1 of the tests that was failing last night in MTT:  
onesided/t.f90.  This test has an MPI_ALLOC_MEM with no corresponding  
MPI_FREE_MEM.  To investigate this problem, I restored the call to  
ibv_dealloc_pd() and re-ran the t.f90 test -- the problem still  
occurs.  Good.


However, once I got the right MPI_FREE_MEM call in t.f90, the test  
started passing.  I.e., ibv_dealloc_pd(hca->ib_pd) succeeds because  
all registered memory has been released.  Hence, the test itself was  
faulty.


However, I don't think we should *error* if we fail to ibv_dealloc_pd 
(hca->ib_pd); it's a user error, but it's not catastrophic unless  
we're trying to do an HCA restart scenario.  Specifically: during a  
normal MPI_FINALIZE, who cares?


I think we should do the following:

1. If we're not doing an HCA restart/checkpoint and we fail to  
ibv_dealloc_pd(), just move on (i.e., it's not a warning/error unless  
we *want* a warning, such as if an MCA parameter  
btl_openib_warn_if_finalize_fail is enabled, or somesuch).


2. If we *are* doing an HCA restart/checkpoint and ibv_dealloc_pd()  
fails, then we have to gracefully fail to notify upper layers that  
Bad Things happened (I suspect that we need mpool finalize  
implemented to properly implement checkpointing for RDMA networks).


3. Add a new MCA parameter named mpi_show_mpi_alloc_mem_leaks that,  
when enabled, shows a warning in ompi_mpi_finalize() if there is  
still memory allocated by MPI_ALLOC_MEM that was not freed by  
MPI_FREE_MEM (this MCA parameter will parallel the already-existing  
mpi_show_handle_leaks MCA param which displays warnings if the app  
creates MPI objects but does not free them).


My points:
- leaked MPI_ALLOC_MEM memory should be reported by the MPI layer,  
not a BTL or mpool
- failing to ibv_dealloc_pd() during MPI_FINALIZE should only trigger  
a warning if the user wants to see it
- failing to ibv_dealloc_pd() during an HCA restart or checkpoint  
should gracefully fail upwards


Comments?
  

Agree.

In addition I will add code that will flush all user data from mpool and 
will allow normal IB finalization.





Re: [OMPI devel] openib coord teleconf

2007-06-13 Thread Pavel Shamis (Pasha)

Jeff Squyres wrote:

On Jun 13, 2007, at 2:41 PM, Gleb Natapov wrote:

  

Pasha tells me that the best times for Ishai and him are:

- 2000-2030 Israel time
- 1300-1300 US Eastern
- 1100-1130 US Mountain
- 2230-2300 India (Bangalore)

Although they could also do the preceding half hour as well.

  

Depends on the date. The closest I can at 20:00 is June 19.



Oops!  I left out the date -- sorry.  I meant to say Monday, June  
18th.  And I got the US eastern time wrong; that should have been  
noon, not 1300.


20:00 Israel June 19th is right after the weekly OMPI teleconf; want  
to do it then?
  

OK for Me and Ishai.