Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r24536

2011-03-17 Thread Tim Mattox
Ralph,
I love how this bugfix results in a net reduction of almost 200 lines
of code!  Very nice.
(I now return to deep lurking mode...)

On Wed, Mar 16, 2011 at 10:22 PM,   wrote:
> Author: rhc
> Date: 2011-03-16 22:22:23 EDT (Wed, 16 Mar 2011)
> New Revision: 24536
> URL: https://svn.open-mpi.org/trac/ompi/changeset/24536
>
> Log:
> Fix the hier grpcomm module so modex results in correct data. The prior 
> implementation stored the modex data as node-based attributes. This worked 
> fine for BTL's such as openib where the interfaces were associated with the 
> node. However, BTL's such as TCP have interfaces associated with a specific 
> process, not a node. Thus, store the data in the modex database so it is 
> correctly indexed.
>
>
> Text files modified:
>   trunk/orte/mca/grpcomm/hier/grpcomm_hier_module.c |   209 
> +--
>   1 files changed, 7 insertions(+), 202 deletions(-)

-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 timat...@open-mpi.org || tmat...@gmail.com
    I'm a bright... http://www.the-brights.net/



Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r24536

2011-03-17 Thread Ralph Castain
Actually, I could have fixed it without removing anything. All that was 
required was to change the flag passed to a function from false to true.

But I figured I would go ahead and remove some code (support for ompi-profiler) 
that nobody is using. Will probably remove it from the rest of the codebase 
over time.

:-)


On Mar 17, 2011, at 6:49 AM, Tim Mattox wrote:

> Ralph,
> I love how this bugfix results in a net reduction of almost 200 lines
> of code!  Very nice.
> (I now return to deep lurking mode...)
> 
> On Wed, Mar 16, 2011 at 10:22 PM,   wrote:
>> Author: rhc
>> Date: 2011-03-16 22:22:23 EDT (Wed, 16 Mar 2011)
>> New Revision: 24536
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/24536
>> 
>> Log:
>> Fix the hier grpcomm module so modex results in correct data. The prior 
>> implementation stored the modex data as node-based attributes. This worked 
>> fine for BTL's such as openib where the interfaces were associated with the 
>> node. However, BTL's such as TCP have interfaces associated with a specific 
>> process, not a node. Thus, store the data in the modex database so it is 
>> correctly indexed.
>> 
>> 
>> Text files modified:
>>   trunk/orte/mca/grpcomm/hier/grpcomm_hier_module.c |   209 
>> +--
>>   1 files changed, 7 insertions(+), 202 deletions(-)
> 
> -- 
> Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
>  timat...@open-mpi.org || tmat...@gmail.com
> I'm a bright... http://www.the-brights.net/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Bug btl:tcp with grpcomm:hier

2011-03-17 Thread Jeff Squyres
Does this need to be CMR'ed to 1.4 and/or 1.5?


On Mar 16, 2011, at 10:27 PM, Ralph Castain wrote:

> Okay, I fixed this in r24536.
> 
> Sorry for the problem, Damien - thanks for catching it! Went unnoticed 
> because the folks at the Labs always use IB.
> 
> 
> On Mar 16, 2011, at 7:20 PM, Ralph Castain wrote:
> 
>> I believe I see the problem - and why it wouldn't show up for IB. It looks 
>> like the hier module passes an incorrect flag to the modex unpack function, 
>> which causes that function to place the modex values as attributes assigned 
>> to the node instead of a process, rather than placing the values into the 
>> modex database. So when you look up a value, you get a single value for the 
>> entire node.
>> 
>> Works for IB because the interface info is at the node level. Doesn't work 
>> for TCP because the "interface" info is at the proc level.
>> 
>> Since it was only tested on IB before, this didn't show up. Should be easy 
>> to fix.
>> 
>> On Mar 16, 2011, at 6:15 PM, Jeff Squyres wrote:
>> 
>>> On Mar 16, 2011, at 5:37 PM, George Bosilca wrote:
>>> 
 I just checked and IB does work correctly. But then I remembered that IB 
 is different, the connection are peer based, so they don't happens during 
 the modex exchange. The data is exchanged over RML messages, but outside 
 the modex.
>>> 
>>> Not quite.  The openib BTL does use the modex to send around connection 
>>> information.  The actual connections are made lazily -- just like the TCP 
>>> BTL -- but the OOB CPC (i.e., the default connection mode in the openib 
>>> BTL) uses RML to do the 2/3 way handshake.  That's all.
>>> 
>>> But the point here is: the openib BTL does rely on the modex.
>>> 
>>> -- 
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] Bug btl:tcp with grpcomm:hier

2011-03-17 Thread Damien Guinier

You are welcome. I'm happy you find quickly this fix.

Thanks to all

Damien

Le 17/03/2011 03:27, Ralph Castain a écrit :

Okay, I fixed this in r24536.

Sorry for the problem, Damien - thanks for catching it! Went unnoticed because 
the folks at the Labs always use IB.


On Mar 16, 2011, at 7:20 PM, Ralph Castain wrote:

   

I believe I see the problem - and why it wouldn't show up for IB. It looks like 
the hier module passes an incorrect flag to the modex unpack function, which 
causes that function to place the modex values as attributes assigned to the 
node instead of a process, rather than placing the values into the modex 
database. So when you look up a value, you get a single value for the entire 
node.

Works for IB because the interface info is at the node level. Doesn't work for TCP 
because the "interface" info is at the proc level.

Since it was only tested on IB before, this didn't show up. Should be easy to 
fix.

On Mar 16, 2011, at 6:15 PM, Jeff Squyres wrote:

 

On Mar 16, 2011, at 5:37 PM, George Bosilca wrote:

   

I just checked and IB does work correctly. But then I remembered that IB is 
different, the connection are peer based, so they don't happens during the 
modex exchange. The data is exchanged over RML messages, but outside the modex.
 

Not quite.  The openib BTL does use the modex to send around connection 
information.  The actual connections are made lazily -- just like the TCP BTL 
-- but the OOB CPC (i.e., the default connection mode in the openib BTL) uses 
RML to do the 2/3 way handshake.  That's all.

But the point here is: the openib BTL does rely on the modex.

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
   
 


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


   




Re: [OMPI devel] Bug btl:tcp with grpcomm:hier

2011-03-17 Thread Damien Guinier

Yes please, this fixes is asked by Bull clients.

damien

Le 17/03/2011 15:44, Jeff Squyres a écrit :

Does this need to be CMR'ed to 1.4 and/or 1.5?


On Mar 16, 2011, at 10:27 PM, Ralph Castain wrote:

   

Okay, I fixed this in r24536.

Sorry for the problem, Damien - thanks for catching it! Went unnoticed because 
the folks at the Labs always use IB.


On Mar 16, 2011, at 7:20 PM, Ralph Castain wrote:

 

I believe I see the problem - and why it wouldn't show up for IB. It looks like 
the hier module passes an incorrect flag to the modex unpack function, which 
causes that function to place the modex values as attributes assigned to the 
node instead of a process, rather than placing the values into the modex 
database. So when you look up a value, you get a single value for the entire 
node.

Works for IB because the interface info is at the node level. Doesn't work for TCP 
because the "interface" info is at the proc level.

Since it was only tested on IB before, this didn't show up. Should be easy to 
fix.

On Mar 16, 2011, at 6:15 PM, Jeff Squyres wrote:

   

On Mar 16, 2011, at 5:37 PM, George Bosilca wrote:

 

I just checked and IB does work correctly. But then I remembered that IB is 
different, the connection are peer based, so they don't happens during the 
modex exchange. The data is exchanged over RML messages, but outside the modex.
   

Not quite.  The openib BTL does use the modex to send around connection 
information.  The actual connections are made lazily -- just like the TCP BTL 
-- but the OOB CPC (i.e., the default connection mode in the openib BTL) uses 
RML to do the 2/3 way handshake.  That's all.

But the point here is: the openib BTL does rely on the modex.

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
 
   


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
 


   




Re: [OMPI devel] Bug btl:tcp with grpcomm:hier

2011-03-17 Thread Ralph Castain
Would you mind filing these? I suspect you'll have to create patches - it might 
apply cleanly to 1.5, but I'm far less confident about 1.4. You might check to 
see if this even exists in 1.4 as I honestly don't remember.

Thanks
Ralph

On Mar 17, 2011, at 8:57 AM, Damien Guinier wrote:

> Yes please, this fixes is asked by Bull clients.
> 
> damien
> 
> Le 17/03/2011 15:44, Jeff Squyres a écrit :
>> Does this need to be CMR'ed to 1.4 and/or 1.5?
>> 
>> 
>> On Mar 16, 2011, at 10:27 PM, Ralph Castain wrote:
>> 
>>   
>>> Okay, I fixed this in r24536.
>>> 
>>> Sorry for the problem, Damien - thanks for catching it! Went unnoticed 
>>> because the folks at the Labs always use IB.
>>> 
>>> 
>>> On Mar 16, 2011, at 7:20 PM, Ralph Castain wrote:
>>> 
>>> 
 I believe I see the problem - and why it wouldn't show up for IB. It looks 
 like the hier module passes an incorrect flag to the modex unpack 
 function, which causes that function to place the modex values as 
 attributes assigned to the node instead of a process, rather than placing 
 the values into the modex database. So when you look up a value, you get a 
 single value for the entire node.
 
 Works for IB because the interface info is at the node level. Doesn't work 
 for TCP because the "interface" info is at the proc level.
 
 Since it was only tested on IB before, this didn't show up. Should be easy 
 to fix.
 
 On Mar 16, 2011, at 6:15 PM, Jeff Squyres wrote:
 
   
> On Mar 16, 2011, at 5:37 PM, George Bosilca wrote:
> 
> 
>> I just checked and IB does work correctly. But then I remembered that IB 
>> is different, the connection are peer based, so they don't happens 
>> during the modex exchange. The data is exchanged over RML messages, but 
>> outside the modex.
>>   
> Not quite.  The openib BTL does use the modex to send around connection 
> information.  The actual connections are made lazily -- just like the TCP 
> BTL -- but the OOB CPC (i.e., the default connection mode in the openib 
> BTL) uses RML to do the 2/3 way handshake.  That's all.
> 
> But the point here is: the openib BTL does rely on the modex.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
   
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>> 
>>   
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel