Re: [OMPI devel] RDMA pipeline

2008-02-21 Thread Gleb Natapov
On Wed, Feb 20, 2008 at 04:08:46PM -0500, George Bosilca wrote:
> So I tracked this issue and it seems that the new behavior was  
> introduced one year ago by the commit 12433. Starting from this commit, 
Except that the log message of this commit says:

   Fix regression from v1.1.
   1) make the code do what comment says
   2) if memory is prepinned don't send multiple PUT messages.

And to be absolutely sure I checked v1.1 and of cause there is no
pipeline for TCP BTLs there as well.

> there was no pipeline in the RDMA protocol. That make sense as we usually 
> don't use NetPipe all the time to check the performances of the message 
> logging (we use real applications). However, last week, we did a NetPipe 
> and that's how we realized the lack of pipelining for the RDMA case.
Perhaps at the time you wrote message logging you relied on buggy
behaviour that was later fixed.

>
> I would be in favor of having a consistent behavior everywhere. In other 
> words don't ask the user to know if there is or not an mpool associated 
> with a particular device, in order to figure out what protocol we use 
> internally. Actually, it's not only for users, it might help us as well.
>
User indeed shouldn't care what protocol we use as long as performance
is good. Pipeline is need to improve performance of some "insane"
interconnects that need memory pinning. The heuristic of OB1 is very simple:
if send and receive message buffers are pinned do not use pipeline (no matter
what interconnect is in use) otherwise use pipeline protocol to hide pinning
cost. The only assumption OB1 does is that if BTL has not MPOOL then all memory
is always pinned. Think about pipeline as slow path and no pipeline as fast 
path.
For Infiniband we use every dirty trick in the book (registration cache +
ptmalloc) to go fast path and you want TCP/MX/ELAN to always go slow
path! This doesn't make sense to me.

If you need pipeline in OB1 to hide message logging cost we may add another 
config
parameter that will always enable pipeline. We may even not expose it to
users, but set it automatically if message logging is enabled.


>   Thanks,
> george.
>
> On Feb 20, 2008, at 4:29 AM, Gleb Natapov wrote:
>
>> On Tue, Feb 19, 2008 at 10:40:46PM -0500, George Bosilca wrote:
>>> Actually, it restores the original behavior. The RDMA operations were
>>> pipelined before the r15247 commit, independent of the fact that they
>>> had mpool or not. We were actively using this behavior in the message
>>> logging framework to hide the cost of the local storage of the  
>>> payload,
>>> and we were quite surprised when we realized that it disappeared.
>> I checked v1.2 with tcp BTL (I can't test mx or elan, but tcp also
>> support RDMA and has no mpool) and no matter what  
>> btl_tcp_max_rdma_size
>> I provide the whole buffer is sent in one rdma operation. And here is
>> explanation why this happens:
>> 1. If BTL is RDMA capable but does not provide mpool
>> mca_pml_ob1_rdma_btls() assumes that memory is always registered. This
>> function will always return non zero value for any buffer it is called
>> with in our case.
>>
>> 2. When mca_pml_ob1_send_request_start_btl() chooses what function to
>> use for rendezvous send it checks if buffer is contiguous and if it is
>> then it check if buffer is already registered by checking non zero  
>> value
>> returned by mca_pml_ob1_rdma_btls() and for BTLs without mpool
>> mca_pml_ob1_send_request_start_rdma() is always chosen.
>>
>> 3. Receiver checks if local buffer is registered by calling
>> mca_pml_ob1_rdma_btls() on it (see pml_ob1_recvreq.c:259):
>>
>>  recvreq->req_rdma_cnt = mca_pml_ob1_rdma_btls(
>>  bml_endpoint,
>>  (unsigned char*) base,
>>  recvreq->req_recv.req_bytes_packed,
>>  recvreq->req_rdma);
>> So recvreq->req_rdma_cnt is set to non zero value (if receive buffer  
>> is
>> contiguous of cause).
>>
>> 4. Receiver send PUT messages to a senders in
>> mca_pml_ob1_recv_request_schedule_exclusive(). Here is the code snip
>> from the function (see pml_ob1_recvreq.c:684):
>>
>>   /* makes sure that we don't exceed BTL max rdma size
>>* if memory is not pinned already */
>>   if(0 == recvreq->req_rdma_cnt &&
>> bml_btl->btl_max_rdma_size != 0 &&
>> size > bml_btl->btl_max_rdma_size)
>>   {
>>
>>   size = bml_btl->btl_max_rdma_size;
>>   }
>> Pay special attention to a comment. If recvreq->req_rdma_cnt is not
>> zero btl_max_rdma_size is ignored and message is send by one big RDMA
>> operation.
>>
>> So what I have shown here is that there was no pipeline for TCP btl in
>> v1.2 and that the code specifically written to behave this way.
>> If you still think that there is a difference in behaviour between  
>> v1.2
>> and the trunk can you explain what code path is executed in v1.2 for
>> your test case and how trunk behaves differently.
>>
>>>
>>> If a BTL don't want to use pipeline for RDMA operations, it can

Re: [OMPI devel] 1.3 Release schedule and contents

2008-02-21 Thread Pavel Shamis (Pasha)

Brad,
APM code was committed to trunk.
So you may mark it as done.

Thanks,
Pasha.

Brad Benton wrote:

All:

The latest scrub of the 1.3 release schedule and contents is ready for 
review and comment.  Please use the following links:
  1.3 milestones:  
https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3
  1.3.1 milestones: 
https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3.1


In order to try and keep the dates for 1.3 in, I've pushed a bunch of 
stuff (particularly ORTE things) to 1.3.1.  Even though there will be 
new functionality slated for 1.3.1, the goal is to not have any 
interface changes between the phases.


Please look over the list and schedules and let me or my fellow 1.3 
co-release manager George Bosilca (bosi...@eecs.utk.edu 
) know of any issues, errors, 
suggestions, omissions, heartburn, etc.


Thanks,
--Brad

Brad Benton
IBM


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Pavel Shamis (Pasha)
Mellanox Technologies



Re: [OMPI devel] PLPA ready?

2008-02-21 Thread Jeff Squyres

On Feb 20, 2008, at 7:53 AM, Sharon Melamed wrote:


I guess I was torn between reporting num_processors/sockets and
max_socket|core_id.  Really, you need both, right?  It is possible
that the number of processors and/or sockets are not contiguous.

I need both *because* the number of processor is not contiguous. In my
case, I have a machine with two sockets. the socket numbers are 0 and
3. so in num_sockets I have 2 and in max_socket_id I have 3 and I need
those both values.


Ok, so it sounds like a paffinity API change is in order.  When/if the  
Solaris plugin comes into effect, I know that they have similar issues  
(processors may not be numbered contiguously).


Do you want to change the API to include both parameters when querying  
sockets and cores?  The Solaris API has these functions, but they're  
no-ops (returning NOT_SUPPORTED), but we'll need to make their  
prototypes match.


I think PLPA otherwise passes criteria for release.  I'll release PLPA  
v1.1 today and try to get it integrated into the trunk.  Sorry it's  
taken a while...


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] PLPA ready?

2008-02-21 Thread Sharon Melamed
Jeff,

1. Yes, I need both parameters when querying socket and cores.
2. I don't think that sun will concern if we will change the
get_processor/socket/core_info because as Pak Lui from Sun said in one of
his early emails "I am guessing it will not messing us up because these are
the functions that Solaris doesn't really implement yet, right?"


Sharon.  

-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
Behalf Of Jeff Squyres
Sent: Thursday, February 21, 2008 4:18 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] PLPA ready?

On Feb 20, 2008, at 7:53 AM, Sharon Melamed wrote:

>> I guess I was torn between reporting num_processors/sockets and
>> max_socket|core_id.  Really, you need both, right?  It is possible
>> that the number of processors and/or sockets are not contiguous.
> I need both *because* the number of processor is not contiguous. In my
> case, I have a machine with two sockets. the socket numbers are 0 and
> 3. so in num_sockets I have 2 and in max_socket_id I have 3 and I need
> those both values.

Ok, so it sounds like a paffinity API change is in order.  When/if the  
Solaris plugin comes into effect, I know that they have similar issues  
(processors may not be numbered contiguously).

Do you want to change the API to include both parameters when querying  
sockets and cores?  The Solaris API has these functions, but they're  
no-ops (returning NOT_SUPPORTED), but we'll need to make their  
prototypes match.

I think PLPA otherwise passes criteria for release.  I'll release PLPA  
v1.1 today and try to get it integrated into the trunk.  Sorry it's  
taken a while...

-- 
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



[OMPI devel] rch-step2 branch errors

2008-02-21 Thread Lenny Verkhovsky
Hi,

In order to make a /tmp/rank_file branch with new RMAPS component I need
/tmp/rhc-step2b branch to be based on.

I tried to download and compile it, but it failed.
(missing many defines, h files and new directories e.t.c)



gcc -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -pthread -o
.libs/opal_wrapper opal_wrapper.o -Wl,--export-dynamic
../../../opal/.libs/libopen-pal.so -ldl -lnsl -lutil -lm  -Wl,--rpath
-Wl,/home/USERS/lenny/OMPI_ORTE_17540/lib
../../../opal/.libs/libopen-pal.so: undefined reference to `OPAL_OUTPUT'
collect2: ld returned 1 exit status


Best Regards,
Lenny.




Re: [OMPI devel] rch-step2 branch errors

2008-02-21 Thread Tim Prins

Hi,

I have been doing some work on this branch, and may have caused that 
problem. But I really cannot help at all without all the error output. 
If you do a 'make >/dev/null' and send that output I may be able to help.


Thanks,

Tim

Lenny Verkhovsky wrote:

Hi,

In order to make a /tmp/rank_file branch with new RMAPS component I need
/tmp/rhc-step2b branch to be based on.

I tried to download and compile it, but it failed.
(missing many defines, h files and new directories e.t.c)



gcc -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -pthread -o
.libs/opal_wrapper opal_wrapper.o -Wl,--export-dynamic
../../../opal/.libs/libopen-pal.so -ldl -lnsl -lutil -lm  -Wl,--rpath
-Wl,/home/USERS/lenny/OMPI_ORTE_17540/lib
../../../opal/.libs/libopen-pal.so: undefined reference to `OPAL_OUTPUT'
collect2: ld returned 1 exit status


Best Regards,
Lenny.


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] rch-step2 branch errors

2008-02-21 Thread Ralph H Castain
Interesting - it is building for me, but Tim P has noted a couple of bugs (I
haven't tested it in the last few days).

I'll take a look...


On 2/21/08 8:02 AM, "Lenny Verkhovsky"  wrote:

> Hi,
> 
> In order to make a /tmp/rank_file branch with new RMAPS component I need
> /tmp/rhc-step2b branch to be based on.
> 
> I tried to download and compile it, but it failed.
> (missing many defines, h files and new directories e.t.c)
> 
> 
> 
> gcc -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -pthread -o
> .libs/opal_wrapper opal_wrapper.o -Wl,--export-dynamic
> ../../../opal/.libs/libopen-pal.so -ldl -lnsl -lutil -lm  -Wl,--rpath
> -Wl,/home/USERS/lenny/OMPI_ORTE_17540/lib
> ../../../opal/.libs/libopen-pal.so: undefined reference to `OPAL_OUTPUT'
> collect2: ld returned 1 exit status
> 
> 
> Best Regards,
> Lenny.
> 




Re: [OMPI devel] PLPA ready?

2008-02-21 Thread Jeff Squyres

On Feb 21, 2008, at 7:01 AM, Sharon Melamed wrote:


1. Yes, I need both parameters when querying socket and cores.
2. I don't think that sun will concern if we will change the
get_processor/socket/core_info because as Pak Lui from Sun said in  
one of
his early emails "I am guessing it will not messing us up because  
these are

the functions that Solaris doesn't really implement yet, right?"


Right, but the plpa_solaris_module.c file will need to be updated with  
the new function signatures so that it will still compile (i.e., if  
you're going to be changing the function signatures in paffinity.h).


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] PLPA ready?

2008-02-21 Thread Jeff Squyres

On Feb 21, 2008, at 7:13 AM, Jeff Squyres wrote:

Right, but the plpa_solaris_module.c file will need to be updated  
with

the new function signatures so that it will still compile (i.e., if
you're going to be changing the function signatures in paffinity.h).



Hah -- I meant paffinity_solaris_module.c.  :-)

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] rch-step2 branch errors

2008-02-21 Thread Tim Prins
I have just made a change in r17540 which may fix your problem. If not, 
please send the requested output.


Tim

Tim Prins wrote:

Hi,

I have been doing some work on this branch, and may have caused that 
problem. But I really cannot help at all without all the error output. 
If you do a 'make >/dev/null' and send that output I may be able to help.


Thanks,

Tim

Lenny Verkhovsky wrote:

Hi,

In order to make a /tmp/rank_file branch with new RMAPS component I need
/tmp/rhc-step2b branch to be based on.

I tried to download and compile it, but it failed.
(missing many defines, h files and new directories e.t.c)



gcc -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -pthread -o
.libs/opal_wrapper opal_wrapper.o -Wl,--export-dynamic
../../../opal/.libs/libopen-pal.so -ldl -lnsl -lutil -lm  -Wl,--rpath
-Wl,/home/USERS/lenny/OMPI_ORTE_17540/lib
../../../opal/.libs/libopen-pal.so: undefined reference to `OPAL_OUTPUT'
collect2: ld returned 1 exit status


Best Regards,
Lenny.


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel







Re: [OMPI devel] PLPA ready?

2008-02-21 Thread Sharon Melamed
Yes, I think we should change paffinity.h and paffinity_solaris_module.c and
paffinity_windows_module.c .

I added those API's some time ago based on the plpa API's. Now, the plpa API
has changed and no one uses those API's. (Except me and in the future, maybe
Sun guys) So I don't see why not change those API's including their
signature in paffinity.h 

Sharon. 

-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
Behalf Of Jeff Squyres
Sent: Thursday, February 21, 2008 5:19 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] PLPA ready?

On Feb 21, 2008, at 7:13 AM, Jeff Squyres wrote:

>> Right, but the plpa_solaris_module.c file will need to be updated  
>> with
> the new function signatures so that it will still compile (i.e., if
> you're going to be changing the function signatures in paffinity.h).


Hah -- I meant paffinity_solaris_module.c.  :-)

-- 
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] rch-step2 branch errors

2008-02-21 Thread Pak Lui
I don't think I run into any problem building either rhc-step2b and 
tmp-public/rank_file yesterday or today. But I am building with Sun 
Studio 12 on Solaris 10 SPARC though.


Tim Prins wrote:
I have just made a change in r17540 which may fix your problem. If not, 
please send the requested output.


Tim

Tim Prins wrote:

Hi,

I have been doing some work on this branch, and may have caused that 
problem. But I really cannot help at all without all the error output. 
If you do a 'make >/dev/null' and send that output I may be able to help.


Thanks,

Tim

Lenny Verkhovsky wrote:

Hi,

In order to make a /tmp/rank_file branch with new RMAPS component I need
/tmp/rhc-step2b branch to be based on.

I tried to download and compile it, but it failed.
(missing many defines, h files and new directories e.t.c)



gcc -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -pthread -o
.libs/opal_wrapper opal_wrapper.o -Wl,--export-dynamic
../../../opal/.libs/libopen-pal.so -ldl -lnsl -lutil -lm  -Wl,--rpath
-Wl,/home/USERS/lenny/OMPI_ORTE_17540/lib
../../../opal/.libs/libopen-pal.so: undefined reference to `OPAL_OUTPUT'
collect2: ld returned 1 exit status


Best Regards,
Lenny.


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--

- Pak Lui
pak@sun.com


Re: [OMPI devel] PLPA ready?

2008-02-21 Thread Jeff Squyres

Sounds perfect.

How about this -- since your and my changes are inter-dependent, can  
you send me a patch for the paffinity change?  I'll apply it at the  
same time that I apply the new PLPA (later today).


Thanks!


On Feb 21, 2008, at 7:39 AM, Sharon Melamed wrote:

Yes, I think we should change paffinity.h and  
paffinity_solaris_module.c and

paffinity_windows_module.c .

I added those API's some time ago based on the plpa API's. Now, the  
plpa API
has changed and no one uses those API's. (Except me and in the  
future, maybe

Sun guys) So I don't see why not change those API's including their
signature in paffinity.h

Sharon.

-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]  
On

Behalf Of Jeff Squyres
Sent: Thursday, February 21, 2008 5:19 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] PLPA ready?

On Feb 21, 2008, at 7:13 AM, Jeff Squyres wrote:


Right, but the plpa_solaris_module.c file will need to be updated
with

the new function signatures so that it will still compile (i.e., if
you're going to be changing the function signatures in paffinity.h).



Hah -- I meant paffinity_solaris_module.c.  :-)

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] PLPA ready?

2008-02-21 Thread Sharon Melamed
Sure. 

-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
Behalf Of Jeff Squyres
Sent: Thursday, February 21, 2008 6:58 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] PLPA ready?

Sounds perfect.

How about this -- since your and my changes are inter-dependent, can  
you send me a patch for the paffinity change?  I'll apply it at the  
same time that I apply the new PLPA (later today).

Thanks!


On Feb 21, 2008, at 7:39 AM, Sharon Melamed wrote:

> Yes, I think we should change paffinity.h and  
> paffinity_solaris_module.c and
> paffinity_windows_module.c .
>
> I added those API's some time ago based on the plpa API's. Now, the  
> plpa API
> has changed and no one uses those API's. (Except me and in the  
> future, maybe
> Sun guys) So I don't see why not change those API's including their
> signature in paffinity.h
>
> Sharon.
>
> -Original Message-
> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]  
> On
> Behalf Of Jeff Squyres
> Sent: Thursday, February 21, 2008 5:19 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] PLPA ready?
>
> On Feb 21, 2008, at 7:13 AM, Jeff Squyres wrote:
>
>>> Right, but the plpa_solaris_module.c file will need to be updated
>>> with
>> the new function signatures so that it will still compile (i.e., if
>> you're going to be changing the function signatures in paffinity.h).
>
>
> Hah -- I meant paffinity_solaris_module.c.  :-)
>
> -- 
> Jeff Squyres
> Cisco Systems
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


linux-paffinity.patch
Description: Binary data


Re: [OMPI devel] 1.3 Release schedule and contents

2008-02-21 Thread George Bosilca

Pasha,

Thanks for the info. I updated the milestone page.

  Thanks,
george.

On Feb 21, 2008, at 8:39 AM, Pavel Shamis (Pasha) wrote:


Brad,
APM code was committed to trunk.
So you may mark it as done.

Thanks,
Pasha.

Brad Benton wrote:

All:

The latest scrub of the 1.3 release schedule and contents is ready  
for

review and comment.  Please use the following links:
 1.3 milestones:
https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3
 1.3.1 milestones:
https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3.1

In order to try and keep the dates for 1.3 in, I've pushed a bunch of
stuff (particularly ORTE things) to 1.3.1.  Even though there will be
new functionality slated for 1.3.1, the goal is to not have any
interface changes between the phases.

Please look over the list and schedules and let me or my fellow 1.3
co-release manager George Bosilca (bosi...@eecs.utk.edu
) know of any issues, errors,
suggestions, omissions, heartburn, etc.

Thanks,
--Brad

Brad Benton
IBM


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Pavel Shamis (Pasha)
Mellanox Technologies

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




smime.p7s
Description: S/MIME cryptographic signature