Re: [OMPI devel] matching code rewrite in OB1
Tarballs available at: http://www.open-mpi.org/~jsquyres/unofficial/ On Dec 12, 2007, at 4:08 PM, Jeff Squyres (jsquyres) wrote: Heh, ok. I'll make a tarball against your patch later. Its against the trunk? -jms Sent from my PDA -Original Message- From: Gleb Natapov [mailto:gl...@voltaire.com] Sent: Wednesday, December 12, 2007 03:54 PM Eastern Standard Time To: Open MPI Developers Subject:Re: [OMPI devel] matching code rewrite in OB1 On Wed, Dec 12, 2007 at 03:52:17PM -0500, Jeff Squyres wrote: > On Dec 12, 2007, at 3:20 PM, Gleb Natapov wrote: > > >> How about making a tarball with this patch in it that can be thrown > >> at > >> everyone's MTT? (we can put the tarball on www.open-mpi.org > >> somewhere) > > I don't have access to www.open-mpi.org, but I can send you the patch. > > I can send you a tarball too, but I prefer to not abuse email. > > Do you have access to staging.openfabrics.org? I could download it > from there and put it on www.open-mpi.org. > No. I don't :( -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] SCTP noisy failure
On Dec 12, 2007, at 8:58 PM, Brad Penoff wrote: That's not really the issue: I don't *want* SCTP support. :) I have a default RHEL4U4 install and now Open MPI is complaining on a default mpirun. Open MPI should work out of the box -- warning free -- on all supported operating systems. Haha, I caught that part as well (about the exclusivity "fix"). I was just curious why the error is there in the first place because, after all, everyone should want SCTP support, right ;-) ? ;-) I didn't know that any Linux distro had lksctp-tools installed by default, but the module not loaded... learn something new every day though. Gotta love those screwy software authors! (I'm sure lots of people say that about us, too :-) ) So there's two issues (exclusivity not working as expected and then the SCTP failure if you actually wanted SCTP support) and I'm concerned about the one that most of you are not, I'm guessing ;-). I think exclusivity *is* working -- this is before that comes into play, IIRC. The _init function is querying your BTL to see if it wants to run. -- Jeff Squyres Cisco Systems
Re: [OMPI devel] SCTP noisy failure
On Dec 12, 2007 5:44 PM, Jeff Squyreswrote: > On Dec 12, 2007, at 7:16 PM, Brad Penoff wrote: > > > Does your system have sctp in the kernel as a module? This is the > > default for most Linux systems so you may have to "modprobe sctp" to > > get rid of the ESOCKTNOSUPPORT... > > That's not really the issue: I don't *want* SCTP support. :) > > I have a default RHEL4U4 install and now Open MPI is complaining on a > default mpirun. Open MPI should work out of the box -- warning free > -- on all supported operating systems. Haha, I caught that part as well (about the exclusivity "fix"). I was just curious why the error is there in the first place because, after all, everyone should want SCTP support, right ;-) ? I didn't know that any Linux distro had lksctp-tools installed by default, but the module not loaded... learn something new every day though. So there's two issues (exclusivity not working as expected and then the SCTP failure if you actually wanted SCTP support) and I'm concerned about the one that most of you are not, I'm guessing ;-). I'll try to look at the other problem too though... brad > > -- > > Jeff Squyres > Cisco Systems > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > >
Re: [OMPI devel] SCTP noisy failure
On Dec 12, 2007, at 7:16 PM, Brad Penoff wrote: Does your system have sctp in the kernel as a module? This is the default for most Linux systems so you may have to "modprobe sctp" to get rid of the ESOCKTNOSUPPORT... That's not really the issue: I don't *want* SCTP support. :) I have a default RHEL4U4 install and now Open MPI is complaining on a default mpirun. Open MPI should work out of the box -- warning free -- on all supported operating systems. -- Jeff Squyres Cisco Systems
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16909 (f77_hello compiler error)
The logic was wrong. I only get half of it. Commit 16950 solve the problem. Sorry for this. Thanks, george. On Dec 12, 2007, at 2:44 PM, Jeff Squyres wrote: Yes -- something changed; I tested all 4 languages extensively before I committed (but not on mac). This fails for me on Linux as well; I'll check into it... On Dec 12, 2007, at 2:15 PM, Ethan Mallove wrote: Hello, Is this change (or r16908) causing the below error in the MTT trivial test (f77_hello)? The error occurs on Solaris and Linux. ... NOTICE: Invoking /ws/ompi-tools/SUNWspro/SOS11/bin/f90 -f77 -ftrap= %none -I/installs/cGmK/install/include/v9 -xarch=amd64 hello.f -o f77_hello -R/installs/cGmK/install/lib/amd64 -R/ opt/mx/lib -L/installs/cGmK/install/lib/amd64 -lmpi_f77 - lmpi -lopen-rte -lopen-pal -lsocket -lnsl -lrt -lm hello.f: MAIN main: Undefined first referenced symbol in file intercept_extra_state_t_class /installs/cGmK/install/ lib/amd64/libmpi_f77.so ld: fatal: Symbol referencing errors. No output written to f77_hello See also http://www.open-mpi.org/mtt/index.php?do_redir=475. Didn't look that closely here, just noted the line change involving "intercept_extra_state". -Ethan On Sun, Dec/09/2007 07:19:59PM, bosi...@osl.iu.edu wrote: Author: bosilca Date: 2007-12-09 19:19:58 EST (Sun, 09 Dec 2007) New Revision: 16909 URL: https://svn.open-mpi.org/trac/ompi/changeset/16909 Log: Avoid a compiler warning about the function being defined but not used when we compile the profiling layer. Text files modified: trunk/ompi/mpi/f77/register_datarep_f.c | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) Modified: trunk/ompi/mpi/f77/register_datarep_f.c = = = = = = = = = = --- trunk/ompi/mpi/f77/register_datarep_f.c (original) +++ trunk/ompi/mpi/f77/register_datarep_f.c 2007-12-09 19:19:58 EST (Sun, 09 Dec 2007) @@ -90,6 +90,9 @@ MPI_Aint *extra_state_f77; } intercept_extra_state_t; +OBJ_CLASS_DECLARATION(intercept_extra_state_t); + +#if !OMPI_PROFILE_LAYER static void intercept_extra_state_constructor(intercept_extra_state_t *obj) { obj->read_fn_f77 = NULL; @@ -98,9 +101,6 @@ obj->extra_state_f77 = NULL; } -OBJ_CLASS_DECLARATION(intercept_extra_state_t); - -#if !OMPI_PROFILE_LAYER OBJ_CLASS_INSTANCE(intercept_extra_state_t, opal_list_item_t, intercept_extra_state_constructor, NULL); ___ svn-full mailing list svn-f...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn-full ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI devel] SCTP noisy failure
hey Jeff, Does your system have sctp in the kernel as a module? This is the default for most Linux systems so you may have to "modprobe sctp" to get rid of the ESOCKTNOSUPPORT... brad On Dec 12, 2007 3:57 PM, Jeff Squyreswrote: > After the exclusivity change today, I notice that I am getting > warnings for *every* mpirun from the SCTP BTL on RHEL4: > > [15:52] svbu-mpi:~/mpi % mpirun -np 2 hello > [svbu-mpi.cisco.com][1,0][btl_sctp_component.c: > 615:mca_btl_sctp_component_create_listen] socket() failed with errno=94 > [svbu-mpi.cisco.com][1,1][btl_sctp_component.c: > 615:mca_btl_sctp_component_create_listen] socket() failed with errno=94 > Hello, world! I am 0 of 2 (svbu-mpi.cisco.com) > Hello, world! I am 1 of 2 (svbu-mpi.cisco.com) > [15:52] svbu-mpi:~/mpi % > > Can these be turned off? I have a default RHEL4 system -- I haven't > done anything special to enable / disable SCTP. Is there a less noisy > way to tell that SCTP is not enabled on a system? > > -- > Jeff Squyres > Cisco Systems > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > >
[OMPI devel] SCTP noisy failure
After the exclusivity change today, I notice that I am getting warnings for *every* mpirun from the SCTP BTL on RHEL4: [15:52] svbu-mpi:~/mpi % mpirun -np 2 hello [svbu-mpi.cisco.com][1,0][btl_sctp_component.c: 615:mca_btl_sctp_component_create_listen] socket() failed with errno=94 [svbu-mpi.cisco.com][1,1][btl_sctp_component.c: 615:mca_btl_sctp_component_create_listen] socket() failed with errno=94 Hello, world! I am 0 of 2 (svbu-mpi.cisco.com) Hello, world! I am 1 of 2 (svbu-mpi.cisco.com) [15:52] svbu-mpi:~/mpi % Can these be turned off? I have a default RHEL4 system -- I haven't done anything special to enable / disable SCTP. Is there a less noisy way to tell that SCTP is not enabled on a system? -- Jeff Squyres Cisco Systems
Re: [OMPI devel] matching code rewrite in OB1
Was Rich referring to ensuring that the test codes checked that their payloads were correct (and not re-assembled in the wrong order)? On Dec 12, 2007, at 4:10 PM, Brian W. Barrett wrote: On Wed, 12 Dec 2007, Gleb Natapov wrote: On Wed, Dec 12, 2007 at 03:46:10PM -0500, Richard Graham wrote: This is better than nothing, but really not very helpful for looking at the specific issues that can arise with this, unless these systems have several parallel networks, with tests that will generate a lot of parallel network traffic, and be able to self check for out-of-order received - i.e. this needs to be encoded into the payload for verification purposes. There are some out-of-order scenarios that need to be generated and checked. I think that George may have a system that will be good for this sort of testing. I am running various test with multiple networks right now. I use several IB BTLs and TCP BTL simultaneously. I see many reordered messages and all tests were OK till now, but they don't encode message sequence in a payload as far as I know. I'll change one of them to do so. Other than Rich's comment that we need sequence numbers, why add them? We haven't had them for non-matching packets for the last 3 years in Open MPI (ie, forever), and I can't see why we would need them. Yes, we need sequence numbers for match headers to make sure MPI ordering is correct. But for the rest of the payload, there's no need with OMPI's datatype engine. It's just more payload for no gain. Brian ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] New BTL parameter
Gleb Natapov wrote: On Wed, Dec 12, 2007 at 02:03:02PM -0500, Jeff Squyres wrote: On Dec 9, 2007, at 10:34 AM, Gleb Natapov wrote: Currently BTL has parameter btl_min_send_size that is no longer used. I want to change it to be btl_rndv_eager_limit. This new parameter will determine a size of a first fragment of rendezvous protocol. Now we use btl_eager_limit to set its size. btl_rndv_eager_limit will have to be smaller or equal to btl_eager_limit. By default it will be equal to btl_eager_limit so no behavior change will be observed if default is used. Can you describe why it would be better to have the value less than the eager limit? It is just one more knob to tune OB1 algorithm. I sometimes don't want to send any data by copy in/out at all. This is not possible right now. With this new param I will be able to control this. From my experience tuning RDMA-rendezvous for the GASNet communications library, I know that it was beneficial to piggyback some portion of the payload on the rendezvous request. However, the best [insert your favorite performance metric here] was not always achieved by piggybacking the maximum that could be buffered at the receiver (equivalent of blt_eager_limit). If I understand correctly, Gleb's btl_rndv_eager_limit parameter would allow tuning for this behavior in OMPI. An artificial/simplified example would be if the eager limit is 32K and you have a 64K xfer. Is it better to send 32K copy in/out plus 32K by RDMA, or to send 8K copy in/out plus 56K by RDMA? If the memcpy() overhead for 32K of eager payload exceeds what can be overlapped with the rendezvous setup then the second may be the better choice (higher bandwidth, lower latency, and lower CPU overheads on both sender and receiver). -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] matching code rewrite in OB1
On Wed, 12 Dec 2007, Gleb Natapov wrote: On Wed, Dec 12, 2007 at 03:46:10PM -0500, Richard Graham wrote: This is better than nothing, but really not very helpful for looking at the specific issues that can arise with this, unless these systems have several parallel networks, with tests that will generate a lot of parallel network traffic, and be able to self check for out-of-order received - i.e. this needs to be encoded into the payload for verification purposes. There are some out-of-order scenarios that need to be generated and checked. I think that George may have a system that will be good for this sort of testing. I am running various test with multiple networks right now. I use several IB BTLs and TCP BTL simultaneously. I see many reordered messages and all tests were OK till now, but they don't encode message sequence in a payload as far as I know. I'll change one of them to do so. Other than Rich's comment that we need sequence numbers, why add them? We haven't had them for non-matching packets for the last 3 years in Open MPI (ie, forever), and I can't see why we would need them. Yes, we need sequence numbers for match headers to make sure MPI ordering is correct. But for the rest of the payload, there's no need with OMPI's datatype engine. It's just more payload for no gain. Brian
Re: [OMPI devel] matching code rewrite in OB1
Heh, ok. I'll make a tarball against your patch later. Its against the trunk? -jms Sent from my PDA -Original Message- From: Gleb Natapov [mailto:gl...@voltaire.com] Sent: Wednesday, December 12, 2007 03:54 PM Eastern Standard Time To: Open MPI Developers Subject:Re: [OMPI devel] matching code rewrite in OB1 On Wed, Dec 12, 2007 at 03:52:17PM -0500, Jeff Squyres wrote: > On Dec 12, 2007, at 3:20 PM, Gleb Natapov wrote: > > >> How about making a tarball with this patch in it that can be thrown > >> at > >> everyone's MTT? (we can put the tarball on www.open-mpi.org > >> somewhere) > > I don't have access to www.open-mpi.org, but I can send you the patch. > > I can send you a tarball too, but I prefer to not abuse email. > > Do you have access to staging.openfabrics.org? I could download it > from there and put it on www.open-mpi.org. > No. I don't :( -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] matching code rewrite in OB1
On Wed, Dec 12, 2007 at 03:52:17PM -0500, Jeff Squyres wrote: > On Dec 12, 2007, at 3:20 PM, Gleb Natapov wrote: > > >> How about making a tarball with this patch in it that can be thrown > >> at > >> everyone's MTT? (we can put the tarball on www.open-mpi.org > >> somewhere) > > I don't have access to www.open-mpi.org, but I can send you the patch. > > I can send you a tarball too, but I prefer to not abuse email. > > Do you have access to staging.openfabrics.org? I could download it > from there and put it on www.open-mpi.org. > No. I don't :( -- Gleb.
Re: [OMPI devel] matching code rewrite in OB1
On Dec 12, 2007, at 3:20 PM, Gleb Natapov wrote: How about making a tarball with this patch in it that can be thrown at everyone's MTT? (we can put the tarball on www.open-mpi.org somewhere) I don't have access to www.open-mpi.org, but I can send you the patch. I can send you a tarball too, but I prefer to not abuse email. Do you have access to staging.openfabrics.org? I could download it from there and put it on www.open-mpi.org. -- Jeff Squyres Cisco Systems
Re: [OMPI devel] matching code rewrite in OB1
On Wed, Dec 12, 2007 at 11:57:11AM -0500, Jeff Squyres wrote: > Gleb -- > > How about making a tarball with this patch in it that can be thrown at > everyone's MTT? (we can put the tarball on www.open-mpi.org somewhere) I don't have access to www.open-mpi.org, but I can send you the patch. I can send you a tarball too, but I prefer to not abuse email. > > > On Dec 11, 2007, at 4:14 PM, Richard Graham wrote: > > > I will re-iterate my concern. The code that is there now is mostly > > nine > > years old (with some mods made when it was brought over to Open > > MPI). It > > took about 2 months of testing on systems with 5-13 way network > > parallelism > > to track down all KNOWN race conditions. This code is at the center > > of MPI > > correctness, so I am VERY concerned about changing it w/o some very > > strong > > reasons. Not apposed, just very cautious. > > > > Rich > > > > > > On 12/11/07 11:47 AM, "Gleb Natapov"wrote: > > > >> On Tue, Dec 11, 2007 at 08:36:42AM -0800, Andrew Friedley wrote: > >>> Possibly, though I have results from a benchmark I've written > >>> indicating > >>> the reordering happens at the sender. I believe I found it was > >>> due to > >>> the QP striping trick I use to get more bandwidth -- if you back > >>> down to > >>> one QP (there's a define in the code you can change), the reordering > >>> rate drops. > >> Ah, OK. My assumption was just from looking into code, so I may be > >> wrong. > >> > >>> > >>> Also I do not make any recursive calls to progress -- at least not > >>> directly in the BTL; I can't speak for the upper layers. The > >>> reason I > >>> do many completions at once is that it is a big help in turning > >>> around > >>> receive buffers, making it harder to run out of buffers and drop > >>> frags. > >>> I want to say there was some performance benefit as well but I > >>> can't > >>> say for sure. > >> Currently upper layers of Open MPI may call BTL progress function > >> recursively. I hope this will change some day. > >> > >>> > >>> Andrew > >>> > >>> Gleb Natapov wrote: > On Tue, Dec 11, 2007 at 08:03:52AM -0800, Andrew Friedley wrote: > > Try UD, frags are reordered at a very high rate so should be a > > good test. > Good Idea I'll try this. BTW I thing the reason for such a high > rate of > reordering in UD is that it polls for MCA_BTL_UD_NUM_WC completions > (500) and process them one by one and if progress function is > called > recursively next 500 completion will be reordered versus previous > completions (reordering happens on a receiver, not sender). > > > Andrew > > > > Richard Graham wrote: > >> Gleb, > >> I would suggest that before this is checked in this be tested > >> on a > >> system > >> that has N-way network parallelism, where N is as large as you > >> can find. > >> This is a key bit of code for MPI correctness, and out-of-order > >> operations > >> will break it, so you want to maximize the chance for such > >> operations. > >> > >> Rich > >> > >> > >> On 12/11/07 10:54 AM, "Gleb Natapov" wrote: > >> > >>> Hi, > >>> > >>> I did a rewrite of matching code in OB1. I made it much > >>> simpler and 2 > >>> times smaller (which is good, less code - less bugs). I also > >>> got rid > >>> of huge macros - very helpful if you need to debug something. > >>> There > >>> is no performance degradation, actually I even see very small > >>> performance > >>> improvement. I ran MTT with this patch and the result is the > >>> same as on > >>> trunk. I would like to commit this to the trunk. The patch is > >>> attached > >>> for everybody to try. > >>> > >>> -- > >>> Gleb. > >>> ___ > >>> devel mailing list > >>> de...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> ___ > >> devel mailing list > >> de...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > -- > Gleb. > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>> ___ > >>> devel mailing list > >>> de...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> > >> -- > >> Gleb. > >> ___ > >> devel mailing list > >> de...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > >
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16909 (f77_hello compiler error)
Yes -- something changed; I tested all 4 languages extensively before I committed (but not on mac). This fails for me on Linux as well; I'll check into it... On Dec 12, 2007, at 2:15 PM, Ethan Mallove wrote: Hello, Is this change (or r16908) causing the below error in the MTT trivial test (f77_hello)? The error occurs on Solaris and Linux. ... NOTICE: Invoking /ws/ompi-tools/SUNWspro/SOS11/bin/f90 -f77 -ftrap= %none -I/installs/cGmK/install/include/v9 -xarch=amd64 hello.f -o f77_hello -R/installs/cGmK/install/lib/amd64 -R/ opt/mx/lib -L/installs/cGmK/install/lib/amd64 -lmpi_f77 - lmpi -lopen-rte -lopen-pal -lsocket -lnsl -lrt -lm hello.f: MAIN main: Undefined first referenced symbolin file intercept_extra_state_t_class /installs/cGmK/install/ lib/amd64/libmpi_f77.so ld: fatal: Symbol referencing errors. No output written to f77_hello See also http://www.open-mpi.org/mtt/index.php?do_redir=475. Didn't look that closely here, just noted the line change involving "intercept_extra_state". -Ethan On Sun, Dec/09/2007 07:19:59PM, bosi...@osl.iu.edu wrote: Author: bosilca Date: 2007-12-09 19:19:58 EST (Sun, 09 Dec 2007) New Revision: 16909 URL: https://svn.open-mpi.org/trac/ompi/changeset/16909 Log: Avoid a compiler warning about the function being defined but not used when we compile the profiling layer. Text files modified: trunk/ompi/mpi/f77/register_datarep_f.c | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) Modified: trunk/ompi/mpi/f77/register_datarep_f.c = = = = = = = = = = --- trunk/ompi/mpi/f77/register_datarep_f.c (original) +++ trunk/ompi/mpi/f77/register_datarep_f.c 2007-12-09 19:19:58 EST (Sun, 09 Dec 2007) @@ -90,6 +90,9 @@ MPI_Aint *extra_state_f77; } intercept_extra_state_t; +OBJ_CLASS_DECLARATION(intercept_extra_state_t); + +#if !OMPI_PROFILE_LAYER static void intercept_extra_state_constructor(intercept_extra_state_t *obj) { obj->read_fn_f77 = NULL; @@ -98,9 +101,6 @@ obj->extra_state_f77 = NULL; } -OBJ_CLASS_DECLARATION(intercept_extra_state_t); - -#if !OMPI_PROFILE_LAYER OBJ_CLASS_INSTANCE(intercept_extra_state_t, opal_list_item_t, intercept_extra_state_constructor, NULL); ___ svn-full mailing list svn-f...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn-full ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16909 (f77_hello compiler error)
Hello, Is this change (or r16908) causing the below error in the MTT trivial test (f77_hello)? The error occurs on Solaris and Linux. ... NOTICE: Invoking /ws/ompi-tools/SUNWspro/SOS11/bin/f90 -f77 -ftrap=%none -I/installs/cGmK/install/include/v9 -xarch=amd64 hello.f -o f77_hello -R/installs/cGmK/install/lib/amd64 -R/opt/mx/lib -L/installs/cGmK/install/lib/amd64 -lmpi_f77 -lmpi -lopen-rte -lopen-pal -lsocket -lnsl -lrt -lm hello.f: MAIN main: Undefined first referenced symbol in file intercept_extra_state_t_class /installs/cGmK/install/lib/amd64/libmpi_f77.so ld: fatal: Symbol referencing errors. No output written to f77_hello See also http://www.open-mpi.org/mtt/index.php?do_redir=475. Didn't look that closely here, just noted the line change involving "intercept_extra_state". -Ethan On Sun, Dec/09/2007 07:19:59PM, bosi...@osl.iu.edu wrote: > Author: bosilca > Date: 2007-12-09 19:19:58 EST (Sun, 09 Dec 2007) > New Revision: 16909 > URL: https://svn.open-mpi.org/trac/ompi/changeset/16909 > > Log: > Avoid a compiler warning about the function being defined but not > used when we compile the profiling layer. > > Text files modified: >trunk/ompi/mpi/f77/register_datarep_f.c | 6 +++--- > >1 files changed, 3 insertions(+), 3 deletions(-) > > Modified: trunk/ompi/mpi/f77/register_datarep_f.c > == > --- trunk/ompi/mpi/f77/register_datarep_f.c (original) > +++ trunk/ompi/mpi/f77/register_datarep_f.c 2007-12-09 19:19:58 EST (Sun, > 09 Dec 2007) > @@ -90,6 +90,9 @@ > MPI_Aint *extra_state_f77; > } intercept_extra_state_t; > > +OBJ_CLASS_DECLARATION(intercept_extra_state_t); > + > +#if !OMPI_PROFILE_LAYER > static void intercept_extra_state_constructor(intercept_extra_state_t *obj) > { > obj->read_fn_f77 = NULL; > @@ -98,9 +101,6 @@ > obj->extra_state_f77 = NULL; > } > > -OBJ_CLASS_DECLARATION(intercept_extra_state_t); > - > -#if !OMPI_PROFILE_LAYER > OBJ_CLASS_INSTANCE(intercept_extra_state_t, > opal_list_item_t, > intercept_extra_state_constructor, NULL); > ___ > svn-full mailing list > svn-f...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/svn-full
Re: [OMPI devel] New BTL parameter
On Dec 9, 2007, at 10:34 AM, Gleb Natapov wrote: Currently BTL has parameter btl_min_send_size that is no longer used. I want to change it to be btl_rndv_eager_limit. This new parameter will determine a size of a first fragment of rendezvous protocol. Now we use btl_eager_limit to set its size. btl_rndv_eager_limit will have to be smaller or equal to btl_eager_limit. By default it will be equal to btl_eager_limit so no behavior change will be observed if default is used. Can you describe why it would be better to have the value less than the eager limit? -- Jeff Squyres Cisco Systems
Re: [OMPI devel] [PATCH] openib: clean-up connect to allow for new cm's
Ok, glad I got this conversation started :) So, we need a slight redesign to determine the cm method (unless forced via commandline arg). This can be determined by calling all the individual open routines, and having them return a priority based on their ability to function. For example, the xoob open function will check the mca_btl_openib_component.num_xrc_qps for a non-zero value and return the priority based on that. Of course, if forced then it will only call that specific open function and throw any relevant errors as necessary. If this sounds sane, then let me know and I'll start coding it up. Thanks, Jon
Re: [OMPI devel] matching code rewrite in OB1
Gleb -- How about making a tarball with this patch in it that can be thrown at everyone's MTT? (we can put the tarball on www.open-mpi.org somewhere) On Dec 11, 2007, at 4:14 PM, Richard Graham wrote: I will re-iterate my concern. The code that is there now is mostly nine years old (with some mods made when it was brought over to Open MPI). It took about 2 months of testing on systems with 5-13 way network parallelism to track down all KNOWN race conditions. This code is at the center of MPI correctness, so I am VERY concerned about changing it w/o some very strong reasons. Not apposed, just very cautious. Rich On 12/11/07 11:47 AM, "Gleb Natapov"wrote: On Tue, Dec 11, 2007 at 08:36:42AM -0800, Andrew Friedley wrote: Possibly, though I have results from a benchmark I've written indicating the reordering happens at the sender. I believe I found it was due to the QP striping trick I use to get more bandwidth -- if you back down to one QP (there's a define in the code you can change), the reordering rate drops. Ah, OK. My assumption was just from looking into code, so I may be wrong. Also I do not make any recursive calls to progress -- at least not directly in the BTL; I can't speak for the upper layers. The reason I do many completions at once is that it is a big help in turning around receive buffers, making it harder to run out of buffers and drop frags. I want to say there was some performance benefit as well but I can't say for sure. Currently upper layers of Open MPI may call BTL progress function recursively. I hope this will change some day. Andrew Gleb Natapov wrote: On Tue, Dec 11, 2007 at 08:03:52AM -0800, Andrew Friedley wrote: Try UD, frags are reordered at a very high rate so should be a good test. Good Idea I'll try this. BTW I thing the reason for such a high rate of reordering in UD is that it polls for MCA_BTL_UD_NUM_WC completions (500) and process them one by one and if progress function is called recursively next 500 completion will be reordered versus previous completions (reordering happens on a receiver, not sender). Andrew Richard Graham wrote: Gleb, I would suggest that before this is checked in this be tested on a system that has N-way network parallelism, where N is as large as you can find. This is a key bit of code for MPI correctness, and out-of-order operations will break it, so you want to maximize the chance for such operations. Rich On 12/11/07 10:54 AM, "Gleb Natapov" wrote: Hi, I did a rewrite of matching code in OB1. I made it much simpler and 2 times smaller (which is good, less code - less bugs). I also got rid of huge macros - very helpful if you need to debug something. There is no performance degradation, actually I even see very small performance improvement. I ran MTT with this patch and the result is the same as on trunk. I would like to commit this to the trunk. The patch is attached for everybody to try. -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] SCTP BTL exclusivity value problem
I just read this thread... many thanks for applying the fix. Jeff Squyres wrote: > Done in r16942. > > On Dec 12, 2007, at 10:45 AM, Gleb Natapov wrote: > >> On Wed, Dec 12, 2007 at 10:31:37AM -0500, Jeff Squyres wrote: >>> I'd be in favor of setting the TCP exclusivity to LOW+100 and setting >>> SCTP exclusivity to LOW. >> Fine with me. >> >>> >>> On Dec 12, 2007, at 10:07 AM, Gleb Natapov wrote: >>> On Wed, Dec 12, 2007 at 10:02:07AM -0500, Jeff Squyres wrote: > Yes -- this came up in a prior thread. See what I proposed: > >http://www.open-mpi.org/community/lists/devel/2007/12/2698.php > > (no one replied, so no action was taken) > > Are you on a system where the SCTP BTL is being built? What kind > of > environment is it? Red Hat Enterprise Linux AS release 4 (Nahant Update 5) # rpm -qa | grep sctp lksctp-tools-devel-1.0.2-6.4E.1 lksctp-tools-doc-1.0.2-6.4E.1 lksctp-tools-1.0.2-6.4E.1 > > > On Dec 12, 2007, at 9:38 AM, Gleb Natapov wrote: > >> Hi, >> >> SCTP BTL sets its exclusivity value to MCA_BTL_EXCLUSIVITY_LOW - 1 >> but MCA_BTL_EXCLUSIVITY_LOW is zero so actually it is set to max >> exclusivity possible. Can somebody fix this please? May be we >> should >> not >> define MCA_BTL_EXCLUSIVITY_LOW to zero? >> >> -- >> Gleb. >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > -- > Jeff Squyres > Cisco Systems > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> -- >>> Jeff Squyres >>> Cisco Systems >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> -- >> Gleb. >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > -- Karol Mroz km...@cs.ubc.ca
Re: [OMPI devel] initial SCTP BTL commit comments?
Jeff Squyres wrote: Alternatively, you could do what the ofud BTL does (a currently experimental BTL): look for the string "ofud" in the "btl" MCA parameter -- i.e., see if the user explicitly asked for the ofud BTL. If not found (doing the Right Things with the "^" operator, of course), then disable the ofud BTL by returning NULL from the component_init() function. Either seems fine to me; the ofud method seems a little less elegant -- was there a reason not to use exclusivity here? Was it just the fact that TCP's exclusivity is already the lowest possible value (0)? Sorry.. try putting my name in the email or something so I know you're asking me. I think there was but I don't remember right now. If a low exclusivity for the UD BTL means it won't get used with the RC BTL, then that's fine. I don't like that string parsing code anyway. Suggestions on what to set the exclusivity to? Andrew
Re: [OMPI devel] SCTP BTL exclusivity value problem
Done in r16942. On Dec 12, 2007, at 10:45 AM, Gleb Natapov wrote: On Wed, Dec 12, 2007 at 10:31:37AM -0500, Jeff Squyres wrote: I'd be in favor of setting the TCP exclusivity to LOW+100 and setting SCTP exclusivity to LOW. Fine with me. On Dec 12, 2007, at 10:07 AM, Gleb Natapov wrote: On Wed, Dec 12, 2007 at 10:02:07AM -0500, Jeff Squyres wrote: Yes -- this came up in a prior thread. See what I proposed: http://www.open-mpi.org/community/lists/devel/2007/12/2698.php (no one replied, so no action was taken) Are you on a system where the SCTP BTL is being built? What kind of environment is it? Red Hat Enterprise Linux AS release 4 (Nahant Update 5) # rpm -qa | grep sctp lksctp-tools-devel-1.0.2-6.4E.1 lksctp-tools-doc-1.0.2-6.4E.1 lksctp-tools-1.0.2-6.4E.1 On Dec 12, 2007, at 9:38 AM, Gleb Natapov wrote: Hi, SCTP BTL sets its exclusivity value to MCA_BTL_EXCLUSIVITY_LOW - 1 but MCA_BTL_EXCLUSIVITY_LOW is zero so actually it is set to max exclusivity possible. Can somebody fix this please? May be we should not define MCA_BTL_EXCLUSIVITY_LOW to zero? -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] SCTP BTL exclusivity value problem
On Wed, Dec 12, 2007 at 10:31:37AM -0500, Jeff Squyres wrote: > I'd be in favor of setting the TCP exclusivity to LOW+100 and setting > SCTP exclusivity to LOW. Fine with me. > > > On Dec 12, 2007, at 10:07 AM, Gleb Natapov wrote: > > > On Wed, Dec 12, 2007 at 10:02:07AM -0500, Jeff Squyres wrote: > >> Yes -- this came up in a prior thread. See what I proposed: > >> > >> http://www.open-mpi.org/community/lists/devel/2007/12/2698.php > >> > >> (no one replied, so no action was taken) > >> > >> Are you on a system where the SCTP BTL is being built? What kind of > >> environment is it? > > Red Hat Enterprise Linux AS release 4 (Nahant Update 5) > > > > # rpm -qa | grep sctp > > lksctp-tools-devel-1.0.2-6.4E.1 > > lksctp-tools-doc-1.0.2-6.4E.1 > > lksctp-tools-1.0.2-6.4E.1 > > > >> > >> > >> > >> On Dec 12, 2007, at 9:38 AM, Gleb Natapov wrote: > >> > >>> Hi, > >>> > >>> SCTP BTL sets its exclusivity value to MCA_BTL_EXCLUSIVITY_LOW - 1 > >>> but MCA_BTL_EXCLUSIVITY_LOW is zero so actually it is set to max > >>> exclusivity possible. Can somebody fix this please? May be we should > >>> not > >>> define MCA_BTL_EXCLUSIVITY_LOW to zero? > >>> > >>> -- > >>> Gleb. > >>> ___ > >>> devel mailing list > >>> de...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> > >> > >> -- > >> Jeff Squyres > >> Cisco Systems > >> ___ > >> devel mailing list > >> de...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > -- > > Gleb. > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > Cisco Systems > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb.
Re: [OMPI devel] SCTP BTL exclusivity value problem
I'd be in favor of setting the TCP exclusivity to LOW+100 and setting SCTP exclusivity to LOW. On Dec 12, 2007, at 10:07 AM, Gleb Natapov wrote: On Wed, Dec 12, 2007 at 10:02:07AM -0500, Jeff Squyres wrote: Yes -- this came up in a prior thread. See what I proposed: http://www.open-mpi.org/community/lists/devel/2007/12/2698.php (no one replied, so no action was taken) Are you on a system where the SCTP BTL is being built? What kind of environment is it? Red Hat Enterprise Linux AS release 4 (Nahant Update 5) # rpm -qa | grep sctp lksctp-tools-devel-1.0.2-6.4E.1 lksctp-tools-doc-1.0.2-6.4E.1 lksctp-tools-1.0.2-6.4E.1 On Dec 12, 2007, at 9:38 AM, Gleb Natapov wrote: Hi, SCTP BTL sets its exclusivity value to MCA_BTL_EXCLUSIVITY_LOW - 1 but MCA_BTL_EXCLUSIVITY_LOW is zero so actually it is set to max exclusivity possible. Can somebody fix this please? May be we should not define MCA_BTL_EXCLUSIVITY_LOW to zero? -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] SCTP BTL exclusivity value problem
On Wed, Dec 12, 2007 at 10:02:07AM -0500, Jeff Squyres wrote: > Yes -- this came up in a prior thread. See what I proposed: > > http://www.open-mpi.org/community/lists/devel/2007/12/2698.php > > (no one replied, so no action was taken) > > Are you on a system where the SCTP BTL is being built? What kind of > environment is it? Red Hat Enterprise Linux AS release 4 (Nahant Update 5) # rpm -qa | grep sctp lksctp-tools-devel-1.0.2-6.4E.1 lksctp-tools-doc-1.0.2-6.4E.1 lksctp-tools-1.0.2-6.4E.1 > > > > On Dec 12, 2007, at 9:38 AM, Gleb Natapov wrote: > > > Hi, > > > > SCTP BTL sets its exclusivity value to MCA_BTL_EXCLUSIVITY_LOW - 1 > > but MCA_BTL_EXCLUSIVITY_LOW is zero so actually it is set to max > > exclusivity possible. Can somebody fix this please? May be we should > > not > > define MCA_BTL_EXCLUSIVITY_LOW to zero? > > > > -- > > Gleb. > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > Cisco Systems > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb.
Re: [OMPI devel] SCTP BTL exclusivity value problem
Yes -- this came up in a prior thread. See what I proposed: http://www.open-mpi.org/community/lists/devel/2007/12/2698.php (no one replied, so no action was taken) Are you on a system where the SCTP BTL is being built? What kind of environment is it? On Dec 12, 2007, at 9:38 AM, Gleb Natapov wrote: Hi, SCTP BTL sets its exclusivity value to MCA_BTL_EXCLUSIVITY_LOW - 1 but MCA_BTL_EXCLUSIVITY_LOW is zero so actually it is set to max exclusivity possible. Can somebody fix this please? May be we should not define MCA_BTL_EXCLUSIVITY_LOW to zero? -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] [PATCH] openib: clean-up connect to allow for new cm's
On Wed, Dec 12, 2007 at 04:08:31PM +0200, Pavel Shamis (Pasha) wrote: > Gleb Natapov wrote: >> On Wed, Dec 12, 2007 at 03:37:26PM +0200, Pavel Shamis (Pasha) wrote: >> >>> Gleb Natapov wrote: >>> On Tue, Dec 11, 2007 at 08:16:07PM -0500, Jeff Squyres wrote: > Isn't there a better way somehow? Perhaps we should have "select" > call *all* the functions and accept back a priority. The one with the > highest priority then wins. This is quite similar to much of the > other selection logic in OMPI. > > Sidenote: Keep in mind that there are some changes coming to select > CPCs on a per-endpoint basis (I can't look up the trac ticket right > now...). This makes things a little complicated -- do we need > btl_openib_cpc_include and btl_openib_cpc_exclude MCA params to > include/exclude CPCs (because you might need more than one CPC in a > single job)? That wouldn't be hard to do. > > But then what to do about if someone sets to use some XRC QPs and > selects to use OOB or RDMA CM? How do we catch this and print an > error? It doesn't seem right to put the "if num_xrc_qps>0" check in > every CPC. What happens if you try to make an XRC QP when not using > xoob? Where is the error detected and what kind of error message do > we print? > > In my opinion "X" notation for QP specification should be removed. I didn't want this to prevent XRC merging so I haven't raced this point. It is enough to have two types of QPs "P" - SW credit management "S" - HW credit management. >>> How will you decide witch QP type to use ? (SRQ or XRC) >>> >>> >> If both sides support XOOB and priority of XOOB is higher then all other >> CPC >> then create XRC, otherwise use regular RC. >> > If some body have connectX hca but he want to use SRQ and not XRC ? This will be the default. (prio of OOB will be bigger than of XOOB), but if uses will want to use XRC it will increase XOOB priority by specifying MCA parameter. > I guess anyway we will be need some additional parameter that will allow > enable/disable XRC, correct ? (So why just not leave the X qp type ?) Because we want to support mixed setups and create XRC between nodes that support it and RC between all other nodes. -- Gleb.
Re: [OMPI devel] [PATCH] openib: clean-up connect to allow for new cm's
Gleb Natapov wrote: On Wed, Dec 12, 2007 at 03:37:26PM +0200, Pavel Shamis (Pasha) wrote: Gleb Natapov wrote: On Tue, Dec 11, 2007 at 08:16:07PM -0500, Jeff Squyres wrote: Isn't there a better way somehow? Perhaps we should have "select" call *all* the functions and accept back a priority. The one with the highest priority then wins. This is quite similar to much of the other selection logic in OMPI. Sidenote: Keep in mind that there are some changes coming to select CPCs on a per-endpoint basis (I can't look up the trac ticket right now...). This makes things a little complicated -- do we need btl_openib_cpc_include and btl_openib_cpc_exclude MCA params to include/exclude CPCs (because you might need more than one CPC in a single job)? That wouldn't be hard to do. But then what to do about if someone sets to use some XRC QPs and selects to use OOB or RDMA CM? How do we catch this and print an error? It doesn't seem right to put the "if num_xrc_qps>0" check in every CPC. What happens if you try to make an XRC QP when not using xoob? Where is the error detected and what kind of error message do we print? In my opinion "X" notation for QP specification should be removed. I didn't want this to prevent XRC merging so I haven't raced this point. It is enough to have two types of QPs "P" - SW credit management "S" - HW credit management. How will you decide witch QP type to use ? (SRQ or XRC) If both sides support XOOB and priority of XOOB is higher then all other CPC then create XRC, otherwise use regular RC. If some body have connectX hca but he want to use SRQ and not XRC ? I guess anyway we will be need some additional parameter that will allow enable/disable XRC, correct ? (So why just not leave the X qp type ?)
Re: [OMPI devel] [PATCH] openib: clean-up connect to allow for new cm's
On Wed, Dec 12, 2007 at 03:37:26PM +0200, Pavel Shamis (Pasha) wrote: > Gleb Natapov wrote: > > On Tue, Dec 11, 2007 at 08:16:07PM -0500, Jeff Squyres wrote: > > > >> Isn't there a better way somehow? Perhaps we should have "select" > >> call *all* the functions and accept back a priority. The one with the > >> highest priority then wins. This is quite similar to much of the > >> other selection logic in OMPI. > >> > >> Sidenote: Keep in mind that there are some changes coming to select > >> CPCs on a per-endpoint basis (I can't look up the trac ticket right > >> now...). This makes things a little complicated -- do we need > >> btl_openib_cpc_include and btl_openib_cpc_exclude MCA params to > >> include/exclude CPCs (because you might need more than one CPC in a > >> single job)? That wouldn't be hard to do. > >> > >> But then what to do about if someone sets to use some XRC QPs and > >> selects to use OOB or RDMA CM? How do we catch this and print an > >> error? It doesn't seem right to put the "if num_xrc_qps>0" check in > >> every CPC. What happens if you try to make an XRC QP when not using > >> xoob? Where is the error detected and what kind of error message do > >> we print? > >> > >> > > In my opinion "X" notation for QP specification should be removed. I > > didn't want this to prevent XRC merging so I haven't raced this point. > > It is enough to have two types of QPs "P" - SW credit management "S" - > > HW credit management. > How will you decide witch QP type to use ? (SRQ or XRC) > If both sides support XOOB and priority of XOOB is higher then all other CPC then create XRC, otherwise use regular RC. -- Gleb.
Re: [OMPI devel] [PATCH] openib: clean-up connect to allow for new cm's
On Tue, Dec 11, 2007 at 08:16:07PM -0500, Jeff Squyres wrote: > Isn't there a better way somehow? Perhaps we should have "select" > call *all* the functions and accept back a priority. The one with the > highest priority then wins. This is quite similar to much of the > other selection logic in OMPI. > > Sidenote: Keep in mind that there are some changes coming to select > CPCs on a per-endpoint basis (I can't look up the trac ticket right > now...). This makes things a little complicated -- do we need > btl_openib_cpc_include and btl_openib_cpc_exclude MCA params to > include/exclude CPCs (because you might need more than one CPC in a > single job)? That wouldn't be hard to do. > > But then what to do about if someone sets to use some XRC QPs and > selects to use OOB or RDMA CM? How do we catch this and print an > error? It doesn't seem right to put the "if num_xrc_qps>0" check in > every CPC. What happens if you try to make an XRC QP when not using > xoob? Where is the error detected and what kind of error message do > we print? > In my opinion "X" notation for QP specification should be removed. I didn't want this to prevent XRC merging so I haven't raced this point. It is enough to have two types of QPs "P" - SW credit management "S" - HW credit management. I think connection management should work like this: Each BTL knows what type of CPC it can use and it should share this info during modex stage. During connection establishment modex info is used to figure out the list of CPCs that both endpoints support and one with highest prio is selected. -- Gleb.
Re: [OMPI devel] [PATCH] openib: clean-up connect to allow for new cm's
On Dec 12, 2007, at 5:13 AM, Pavel Shamis (Pasha) wrote: Hmm. I don't think that we want to put knowledge of XRC in the OOB CPC (and vice versa). That seems like an abstraction violation. I didn't like that XRC knowledge was put in the connect base either, but I was too busy to argue with it. :-) Isn't there a better way somehow? Perhaps we should have "select" call *all* the functions and accept back a priority. The one with the highest priority then wins. This is quite similar to much of the other selection logic in OMPI. Sidenote: Keep in mind that there are some changes coming to select CPCs on a per-endpoint basis (I can't look up the trac ticket right now...). This makes things a little complicated -- do we need btl_openib_cpc_include and btl_openib_cpc_exclude MCA params to include/exclude CPCs (because you might need more than one CPC in a single job)? That wouldn't be hard to do. But then what to do about if someone sets to use some XRC QPs and selects to use OOB or RDMA CM? Error message will be reported , that for using XRC you _must_ select xoob. I understand that that is what it does today; I was asking my somewhat- rhetorical question with the above text in mind (that we remove the abstraction violations -- remove knowledge of XRC from the OOB CPC, etc.). How do we catch this and print an error? It doesn't seem right to put the "if num_xrc_qps>0" check in every CPC. What happens if you try to make an XRC QP when not using xoob? Where is the error detected and what kind of error message do we print? I would like to remind 2 things: 1. XRC little bit change QP logic. We creates one XRC qp for send and one for recv. As result it require absolutely different oob mechanism. 2. Current implementation doesn't allow to run with XRC + RC (or srq) and I don't think that we need such mixed configuration support at all. So as results the the XRC may work only with XOOB. If you will try to run it with oob error message will be reported. As well if you will try to run !(XRC) with XOOB error message will be reported too. The check is located in ompi_btl_openib_connect_base_open. I understand all of that. I think the question is if there is a way to de-centralize these checks such that the XOOB CPC can be the one that figures this stuff out (for example) rather than having to put this in the base. The original code in the function used oob as default connection method. I changed it to check in which mode we are running (xrc enabled/disabled) and make xoob default connection manager for xrc mode and oob make default for not xrc mode. Right -- this is problematic for adding IBCM and RDMA CM; that's Jon's point. I not sure that oob cpc is the best place for the logic. also I don't think that the "select + priority" solution will resolve the dependences problem: XRC enabled -> xoob XRC disabled -> oob , cm. We may push the logic outside of cpc and pass to ompi_btl_openib_connect_base_open() witch connection manger we want to use. I guess that the change also will be usefull for future "CPCs on a per-endpoint basis" changes. From an abstraction point of view, it would be nice to get all this CPC-specific information out of the base and into the CPCs that they belong to. Also, I'm not sure why the #if/#else is there for xoob (i.e., having empty/printf functions there when XRC support is compiled out) -- if xoob was disabled during compilation, then it simply should not be compiled and therefore not be there at all at run-time. If a user selects the xoob CPC, then we should print a message from the base that that CPC doesn't exist in the installation. Correspondingly, we can make an info MCA param in the btl openib that shows which CPCs are available (we already have this information -- it's easy enough to put this in an info MCA param). Sounds reasonable for me. Pasha. On Dec 11, 2007, at 6:59 PM, Jon Mason wrote: Currently, alternate CMs cannot be called because ompi_btl_openib_connect_base_open forces a choice of either oob or xoob (and goes into an erroneous error path if you pick something else). This patch reorganizes ompi_btl_openib_connect_base_open so that new functions can easily be added. New Open functions were added to oob and xoob for the error handling. I tested calling oob, xoob, and rdma_cm. oob happily allows connections to be established and throws no errors. xoob fails because ompi does not have it compiled in (and I have no connectx cards). rdma_cm calls the empty hooks and exits without connecting (thus throwing non-connection errors). All expected behavior. Since this patch fixes the existing behavior, and is not necessarily tied to my implementing of rdma_cm, I think it is acceptable to go in now. Thanks, Jon Index: ompi/mca/btl/openib/connect/btl_openib_connect_base.c === ---