Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r24536
Ralph, I love how this bugfix results in a net reduction of almost 200 lines of code! Very nice. (I now return to deep lurking mode...) On Wed, Mar 16, 2011 at 10:22 PM, wrote: > Author: rhc > Date: 2011-03-16 22:22:23 EDT (Wed, 16 Mar 2011) > New Revision: 24536 > URL: https://svn.open-mpi.org/trac/ompi/changeset/24536 > > Log: > Fix the hier grpcomm module so modex results in correct data. The prior > implementation stored the modex data as node-based attributes. This worked > fine for BTL's such as openib where the interfaces were associated with the > node. However, BTL's such as TCP have interfaces associated with a specific > process, not a node. Thus, store the data in the modex database so it is > correctly indexed. > > > Text files modified: > trunk/orte/mca/grpcomm/hier/grpcomm_hier_module.c | 209 > +-- > 1 files changed, 7 insertions(+), 202 deletions(-) -- Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/ timat...@open-mpi.org || tmat...@gmail.com I'm a bright... http://www.the-brights.net/
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r24536
Actually, I could have fixed it without removing anything. All that was required was to change the flag passed to a function from false to true. But I figured I would go ahead and remove some code (support for ompi-profiler) that nobody is using. Will probably remove it from the rest of the codebase over time. :-) On Mar 17, 2011, at 6:49 AM, Tim Mattox wrote: > Ralph, > I love how this bugfix results in a net reduction of almost 200 lines > of code! Very nice. > (I now return to deep lurking mode...) > > On Wed, Mar 16, 2011 at 10:22 PM, wrote: >> Author: rhc >> Date: 2011-03-16 22:22:23 EDT (Wed, 16 Mar 2011) >> New Revision: 24536 >> URL: https://svn.open-mpi.org/trac/ompi/changeset/24536 >> >> Log: >> Fix the hier grpcomm module so modex results in correct data. The prior >> implementation stored the modex data as node-based attributes. This worked >> fine for BTL's such as openib where the interfaces were associated with the >> node. However, BTL's such as TCP have interfaces associated with a specific >> process, not a node. Thus, store the data in the modex database so it is >> correctly indexed. >> >> >> Text files modified: >> trunk/orte/mca/grpcomm/hier/grpcomm_hier_module.c | 209 >> +-- >> 1 files changed, 7 insertions(+), 202 deletions(-) > > -- > Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/ > timat...@open-mpi.org || tmat...@gmail.com > I'm a bright... http://www.the-brights.net/ > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Bug btl:tcp with grpcomm:hier
Does this need to be CMR'ed to 1.4 and/or 1.5? On Mar 16, 2011, at 10:27 PM, Ralph Castain wrote: > Okay, I fixed this in r24536. > > Sorry for the problem, Damien - thanks for catching it! Went unnoticed > because the folks at the Labs always use IB. > > > On Mar 16, 2011, at 7:20 PM, Ralph Castain wrote: > >> I believe I see the problem - and why it wouldn't show up for IB. It looks >> like the hier module passes an incorrect flag to the modex unpack function, >> which causes that function to place the modex values as attributes assigned >> to the node instead of a process, rather than placing the values into the >> modex database. So when you look up a value, you get a single value for the >> entire node. >> >> Works for IB because the interface info is at the node level. Doesn't work >> for TCP because the "interface" info is at the proc level. >> >> Since it was only tested on IB before, this didn't show up. Should be easy >> to fix. >> >> On Mar 16, 2011, at 6:15 PM, Jeff Squyres wrote: >> >>> On Mar 16, 2011, at 5:37 PM, George Bosilca wrote: >>> I just checked and IB does work correctly. But then I remembered that IB is different, the connection are peer based, so they don't happens during the modex exchange. The data is exchanged over RML messages, but outside the modex. >>> >>> Not quite. The openib BTL does use the modex to send around connection >>> information. The actual connections are made lazily -- just like the TCP >>> BTL -- but the OOB CPC (i.e., the default connection mode in the openib >>> BTL) uses RML to do the 2/3 way handshake. That's all. >>> >>> But the point here is: the openib BTL does rely on the modex. >>> >>> -- >>> Jeff Squyres >>> jsquy...@cisco.com >>> For corporate legal information go to: >>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>> >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] Bug btl:tcp with grpcomm:hier
You are welcome. I'm happy you find quickly this fix. Thanks to all Damien Le 17/03/2011 03:27, Ralph Castain a écrit : Okay, I fixed this in r24536. Sorry for the problem, Damien - thanks for catching it! Went unnoticed because the folks at the Labs always use IB. On Mar 16, 2011, at 7:20 PM, Ralph Castain wrote: I believe I see the problem - and why it wouldn't show up for IB. It looks like the hier module passes an incorrect flag to the modex unpack function, which causes that function to place the modex values as attributes assigned to the node instead of a process, rather than placing the values into the modex database. So when you look up a value, you get a single value for the entire node. Works for IB because the interface info is at the node level. Doesn't work for TCP because the "interface" info is at the proc level. Since it was only tested on IB before, this didn't show up. Should be easy to fix. On Mar 16, 2011, at 6:15 PM, Jeff Squyres wrote: On Mar 16, 2011, at 5:37 PM, George Bosilca wrote: I just checked and IB does work correctly. But then I remembered that IB is different, the connection are peer based, so they don't happens during the modex exchange. The data is exchanged over RML messages, but outside the modex. Not quite. The openib BTL does use the modex to send around connection information. The actual connections are made lazily -- just like the TCP BTL -- but the OOB CPC (i.e., the default connection mode in the openib BTL) uses RML to do the 2/3 way handshake. That's all. But the point here is: the openib BTL does rely on the modex. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Bug btl:tcp with grpcomm:hier
Yes please, this fixes is asked by Bull clients. damien Le 17/03/2011 15:44, Jeff Squyres a écrit : Does this need to be CMR'ed to 1.4 and/or 1.5? On Mar 16, 2011, at 10:27 PM, Ralph Castain wrote: Okay, I fixed this in r24536. Sorry for the problem, Damien - thanks for catching it! Went unnoticed because the folks at the Labs always use IB. On Mar 16, 2011, at 7:20 PM, Ralph Castain wrote: I believe I see the problem - and why it wouldn't show up for IB. It looks like the hier module passes an incorrect flag to the modex unpack function, which causes that function to place the modex values as attributes assigned to the node instead of a process, rather than placing the values into the modex database. So when you look up a value, you get a single value for the entire node. Works for IB because the interface info is at the node level. Doesn't work for TCP because the "interface" info is at the proc level. Since it was only tested on IB before, this didn't show up. Should be easy to fix. On Mar 16, 2011, at 6:15 PM, Jeff Squyres wrote: On Mar 16, 2011, at 5:37 PM, George Bosilca wrote: I just checked and IB does work correctly. But then I remembered that IB is different, the connection are peer based, so they don't happens during the modex exchange. The data is exchanged over RML messages, but outside the modex. Not quite. The openib BTL does use the modex to send around connection information. The actual connections are made lazily -- just like the TCP BTL -- but the OOB CPC (i.e., the default connection mode in the openib BTL) uses RML to do the 2/3 way handshake. That's all. But the point here is: the openib BTL does rely on the modex. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Bug btl:tcp with grpcomm:hier
Would you mind filing these? I suspect you'll have to create patches - it might apply cleanly to 1.5, but I'm far less confident about 1.4. You might check to see if this even exists in 1.4 as I honestly don't remember. Thanks Ralph On Mar 17, 2011, at 8:57 AM, Damien Guinier wrote: > Yes please, this fixes is asked by Bull clients. > > damien > > Le 17/03/2011 15:44, Jeff Squyres a écrit : >> Does this need to be CMR'ed to 1.4 and/or 1.5? >> >> >> On Mar 16, 2011, at 10:27 PM, Ralph Castain wrote: >> >> >>> Okay, I fixed this in r24536. >>> >>> Sorry for the problem, Damien - thanks for catching it! Went unnoticed >>> because the folks at the Labs always use IB. >>> >>> >>> On Mar 16, 2011, at 7:20 PM, Ralph Castain wrote: >>> >>> I believe I see the problem - and why it wouldn't show up for IB. It looks like the hier module passes an incorrect flag to the modex unpack function, which causes that function to place the modex values as attributes assigned to the node instead of a process, rather than placing the values into the modex database. So when you look up a value, you get a single value for the entire node. Works for IB because the interface info is at the node level. Doesn't work for TCP because the "interface" info is at the proc level. Since it was only tested on IB before, this didn't show up. Should be easy to fix. On Mar 16, 2011, at 6:15 PM, Jeff Squyres wrote: > On Mar 16, 2011, at 5:37 PM, George Bosilca wrote: > > >> I just checked and IB does work correctly. But then I remembered that IB >> is different, the connection are peer based, so they don't happens >> during the modex exchange. The data is exchanged over RML messages, but >> outside the modex. >> > Not quite. The openib BTL does use the modex to send around connection > information. The actual connections are made lazily -- just like the TCP > BTL -- but the OOB CPC (i.e., the default connection mode in the openib > BTL) uses RML to do the 2/3 way handshake. That's all. > > But the point here is: the openib BTL does rely on the modex. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel