Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
On Aug 20, 2013, at 12:57 PM, Steve Wisewrote: > You won't let me forget. ;) I will do it. Awesome, thanks. >> Specifically: At some point iWARP support will break because we'll be >> removing >> ompi/mca/btl/openib/cpc and exclusively using ompi/mca/common/ofacm. > > When is this going to happen? Don't know yet. It's been "pending / real soon now..." for a little while, but other higher-priority things have crept in. > I can probably get to this project around the end of Sep (vacation is > pending). K. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
> > Don't forget that Chelsio is still on the hook for adding iWARP support into ompi/mca/common/ofacm, > however. :-) > You won't let me forget. ;) I will do it. > Specifically: At some point iWARP support will break because we'll be removing > ompi/mca/btl/openib/cpc and exclusively using ompi/mca/common/ofacm. > When is this going to happen? I can probably get to this project around the end of Sep (vacation is pending). Steve
Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
> -Original Message- > From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com] > Sent: Tuesday, August 20, 2013 11:07 AM > To: Steve Wise > Cc: Open MPI Developers; Indranil Choudhury > Subject: Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC > > I think you hit the nail on the head -- we typo'ed the macro name in the C > code. Doh! > > If you can confirm that this fixes the issue for you, please commit and CMR. > Will do! > Thank you for tracking this down! > U R welcome. :) > > On Aug 20, 2013, at 11:06 AM, Steve Wise <sw...@opengridcomputing.com> wrote: > > > So is this the correct fix? > > > > [root@r9 ompi-trunk]# svn diff > > Index: ompi/mca/btl/openib/btl_openib_component.c > > === > > --- ompi/mca/btl/openib/btl_openib_component.c (revision 29050) > > +++ ompi/mca/btl/openib/btl_openib_component.c (working copy) > > @@ -716,7 +716,7 @@ > > return OMPI_ERR_NOT_FOUND; > > } > > > > -#if defined(HAVE_IBV_LINK_LAYER_ETHERNET) > > +#if defined(HAVE_DECL_IBV_LINK_LAYER_ETHERNET) > > if (IBV_LINK_LAYER_ETHERNET == ib_port_attr->link_layer) { > > subnet_id = mca_btl_openib_get_ip_subnet_id(device->ib_dev, > >port_num); > > Index: ompi/mca/btl/openib/btl_openib.c > > === > > --- ompi/mca/btl/openib/btl_openib.c(revision 29050) > > +++ ompi/mca/btl/openib/btl_openib.c(working copy) > > @@ -444,7 +444,7 @@ > > #ifdef HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE > > switch(openib_btl->device->ib_dev->transport_type) { > > case IBV_TRANSPORT_IB: > > -#if defined(HAVE_IBV_LINK_LAYER_ETHERNET) > > +#if defined(HAVE_DECL_IBV_LINK_LAYER_ETHERNET) > > switch(openib_btl->ib_port_attr.link_layer) { > > case IBV_LINK_LAYER_ETHERNET: > > return MCA_BTL_OPENIB_TRANSPORT_RDMAOE; > > Index: ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c > > === > > --- ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c (revision > > 29050) > > +++ ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c (working > > copy) > > @@ -389,7 +389,7 @@ > >/* If we do not have struct ibv_device.transport_device, then > > we're in an old version of OFED that is IB only (i.e., no > > iWarp), so we can safely assume that we can use this CPC. */ > > -#if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) && > defined(HAVE_IBV_LINK_LAYER_ETHERNET) > > +#if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) && > defined(HAVE_DECL_IBV_LINK_LAYER_ETHERNET) > >if (BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)) { > >BTL_VERBOSE(("UD CPC only supported on InfiniBand; skipped on > > %s:%d", > > ibv_get_device_name(btl->device->ib_dev), > > Index: ompi/mca/btl/openib/connect/btl_openib_connect_oob.c > > === > > --- ompi/mca/btl/openib/connect/btl_openib_connect_oob.c(revision > > 29050) > > +++ ompi/mca/btl/openib/connect/btl_openib_connect_oob.c(working > > copy) > > @@ -127,7 +127,7 @@ > >IB (this CPC will not work with iWarp). If we do not have the > >transport_type member, then we must be < OFED v1.2, and > >therefore we must be IB. */ > > -#if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) && > defined(HAVE_IBV_LINK_LAYER_ETHERNET) > > +#if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) && > defined(HAVE_DECL_IBV_LINK_LAYER_ETHERNET) > > if (BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)) { > > opal_output_verbose(5, ompi_btl_base_framework.framework_output, > > "openib BTL: oob CPC only supported on > > InfiniBand; skipped on %s:%d", > > Index: ompi/mca/common/verbs/common_verbs_find_ports.c > > === > > --- ompi/mca/common/verbs/common_verbs_find_ports.c (revision 29050) > > +++ ompi/mca/common/verbs/common_verbs_find_ports.c (working copy) > > @@ -170,7 +170,7 @@ > > } > > } > > > > -#if defined(HAVE_IBV_LINK_LAYER_ETHERNET) > > +#if defined(HAVE_DECL_IBV_LINK_LAYER_ETHERNET) > > static const char *l
Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
I think you hit the nail on the head -- we typo'ed the macro name in the C code. Doh! If you can confirm that this fixes the issue for you, please commit and CMR. Thank you for tracking this down! On Aug 20, 2013, at 11:06 AM, Steve Wise <sw...@opengridcomputing.com> wrote: > So is this the correct fix? > > [root@r9 ompi-trunk]# svn diff > Index: ompi/mca/btl/openib/btl_openib_component.c > === > --- ompi/mca/btl/openib/btl_openib_component.c (revision 29050) > +++ ompi/mca/btl/openib/btl_openib_component.c (working copy) > @@ -716,7 +716,7 @@ > return OMPI_ERR_NOT_FOUND; > } > > -#if defined(HAVE_IBV_LINK_LAYER_ETHERNET) > +#if defined(HAVE_DECL_IBV_LINK_LAYER_ETHERNET) > if (IBV_LINK_LAYER_ETHERNET == ib_port_attr->link_layer) { > subnet_id = mca_btl_openib_get_ip_subnet_id(device->ib_dev, >port_num); > Index: ompi/mca/btl/openib/btl_openib.c > === > --- ompi/mca/btl/openib/btl_openib.c(revision 29050) > +++ ompi/mca/btl/openib/btl_openib.c(working copy) > @@ -444,7 +444,7 @@ > #ifdef HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE > switch(openib_btl->device->ib_dev->transport_type) { > case IBV_TRANSPORT_IB: > -#if defined(HAVE_IBV_LINK_LAYER_ETHERNET) > +#if defined(HAVE_DECL_IBV_LINK_LAYER_ETHERNET) > switch(openib_btl->ib_port_attr.link_layer) { > case IBV_LINK_LAYER_ETHERNET: > return MCA_BTL_OPENIB_TRANSPORT_RDMAOE; > Index: ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c > === > --- ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c (revision > 29050) > +++ ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c (working copy) > @@ -389,7 +389,7 @@ >/* If we do not have struct ibv_device.transport_device, then > we're in an old version of OFED that is IB only (i.e., no > iWarp), so we can safely assume that we can use this CPC. */ > -#if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) && > defined(HAVE_IBV_LINK_LAYER_ETHERNET) > +#if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) && > defined(HAVE_DECL_IBV_LINK_LAYER_ETHERNET) >if (BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)) { >BTL_VERBOSE(("UD CPC only supported on InfiniBand; skipped on > %s:%d", > ibv_get_device_name(btl->device->ib_dev), > Index: ompi/mca/btl/openib/connect/btl_openib_connect_oob.c > === > --- ompi/mca/btl/openib/connect/btl_openib_connect_oob.c(revision > 29050) > +++ ompi/mca/btl/openib/connect/btl_openib_connect_oob.c(working copy) > @@ -127,7 +127,7 @@ >IB (this CPC will not work with iWarp). If we do not have the >transport_type member, then we must be < OFED v1.2, and >therefore we must be IB. */ > -#if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) && > defined(HAVE_IBV_LINK_LAYER_ETHERNET) > +#if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) && > defined(HAVE_DECL_IBV_LINK_LAYER_ETHERNET) > if (BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)) { > opal_output_verbose(5, ompi_btl_base_framework.framework_output, > "openib BTL: oob CPC only supported on > InfiniBand; skipped on %s:%d", > Index: ompi/mca/common/verbs/common_verbs_find_ports.c > === > --- ompi/mca/common/verbs/common_verbs_find_ports.c (revision 29050) > +++ ompi/mca/common/verbs/common_verbs_find_ports.c (working copy) > @@ -170,7 +170,7 @@ > } > } > > -#if defined(HAVE_IBV_LINK_LAYER_ETHERNET) > +#if defined(HAVE_DECL_IBV_LINK_LAYER_ETHERNET) > static const char *link_layer_to_str(int link_type) > { > switch(link_type) { > @@ -417,7 +417,7 @@ > /* If they specified neither link layer, then we want this > port */ > want = true; > } > -#if defined(HAVE_IBV_LINK_LAYER_ETHERNET) > +#if defined(HAVE_DECL_IBV_LINK_LAYER_ETHERNET) > else if (flags & OMPI_COMMON_VERBS_FLAGS_LINK_LAYER_IB) { > if (IBV_LINK_LAYER_INFINIBAND == port_attr.link_layer) { > want = true; > > > >> -Original Message- >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Steve Wise >> Sent: Tuesday, August 20, 2013 9:25 AM >&g
Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
So is this the correct fix? [root@r9 ompi-trunk]# svn diff Index: ompi/mca/btl/openib/btl_openib_component.c === --- ompi/mca/btl/openib/btl_openib_component.c (revision 29050) +++ ompi/mca/btl/openib/btl_openib_component.c (working copy) @@ -716,7 +716,7 @@ return OMPI_ERR_NOT_FOUND; } -#if defined(HAVE_IBV_LINK_LAYER_ETHERNET) +#if defined(HAVE_DECL_IBV_LINK_LAYER_ETHERNET) if (IBV_LINK_LAYER_ETHERNET == ib_port_attr->link_layer) { subnet_id = mca_btl_openib_get_ip_subnet_id(device->ib_dev, port_num); Index: ompi/mca/btl/openib/btl_openib.c === --- ompi/mca/btl/openib/btl_openib.c(revision 29050) +++ ompi/mca/btl/openib/btl_openib.c(working copy) @@ -444,7 +444,7 @@ #ifdef HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE switch(openib_btl->device->ib_dev->transport_type) { case IBV_TRANSPORT_IB: -#if defined(HAVE_IBV_LINK_LAYER_ETHERNET) +#if defined(HAVE_DECL_IBV_LINK_LAYER_ETHERNET) switch(openib_btl->ib_port_attr.link_layer) { case IBV_LINK_LAYER_ETHERNET: return MCA_BTL_OPENIB_TRANSPORT_RDMAOE; Index: ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c === --- ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c (revision 29050) +++ ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c (working copy) @@ -389,7 +389,7 @@ /* If we do not have struct ibv_device.transport_device, then we're in an old version of OFED that is IB only (i.e., no iWarp), so we can safely assume that we can use this CPC. */ -#if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) && defined(HAVE_IBV_LINK_LAYER_ETHERNET) +#if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) && defined(HAVE_DECL_IBV_LINK_LAYER_ETHERNET) if (BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)) { BTL_VERBOSE(("UD CPC only supported on InfiniBand; skipped on %s:%d", ibv_get_device_name(btl->device->ib_dev), Index: ompi/mca/btl/openib/connect/btl_openib_connect_oob.c === --- ompi/mca/btl/openib/connect/btl_openib_connect_oob.c(revision 29050) +++ ompi/mca/btl/openib/connect/btl_openib_connect_oob.c(working copy) @@ -127,7 +127,7 @@ IB (this CPC will not work with iWarp). If we do not have the transport_type member, then we must be < OFED v1.2, and therefore we must be IB. */ -#if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) && defined(HAVE_IBV_LINK_LAYER_ETHERNET) +#if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) && defined(HAVE_DECL_IBV_LINK_LAYER_ETHERNET) if (BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)) { opal_output_verbose(5, ompi_btl_base_framework.framework_output, "openib BTL: oob CPC only supported on InfiniBand; skipped on %s:%d", Index: ompi/mca/common/verbs/common_verbs_find_ports.c === --- ompi/mca/common/verbs/common_verbs_find_ports.c (revision 29050) +++ ompi/mca/common/verbs/common_verbs_find_ports.c (working copy) @@ -170,7 +170,7 @@ } } -#if defined(HAVE_IBV_LINK_LAYER_ETHERNET) +#if defined(HAVE_DECL_IBV_LINK_LAYER_ETHERNET) static const char *link_layer_to_str(int link_type) { switch(link_type) { @@ -417,7 +417,7 @@ /* If they specified neither link layer, then we want this port */ want = true; } -#if defined(HAVE_IBV_LINK_LAYER_ETHERNET) +#if defined(HAVE_DECL_IBV_LINK_LAYER_ETHERNET) else if (flags & OMPI_COMMON_VERBS_FLAGS_LINK_LAYER_IB) { if (IBV_LINK_LAYER_INFINIBAND == port_attr.link_layer) { want = true; > -Original Message- > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Steve Wise > Sent: Tuesday, August 20, 2013 9:25 AM > To: 'Open MPI Developers'; 'Jeff Squyres (jsquyres)' > Cc: 'Indranil Choudhury' > Subject: Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC > > > Ah: > > Here's the config.log: > > configure:133950: checking whether IBV_LINK_LAYER_ETHERNET is declared > configure:133950: gcc -std=gnu99 -c -g -Wall -Wundef -Wno-long-long > -Wsign-compare > -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic > -Werror-implicit-function-declaration > -finline-functions -fno-strict-aliasing -pthread > -I/usr/local/src/ompi-trunk/opal/mca/hwloc/hwloc152/hwloc/include > -I/usr/local/src/ompi-trunk/opal/mca/event/libevent2021/libevent > -I/usr/local/src/ompi-t
Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
Ah: Here's the config.log: configure:133950: checking whether IBV_LINK_LAYER_ETHERNET is declared configure:133950: gcc -std=gnu99 -c -g -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -finline-functions -fno-strict-aliasing -pthread -I/usr/local/src/ompi-trunk/opal/mca/hwloc/hwloc152/hwloc/include -I/usr/local/src/ompi-trunk/opal/mca/event/libevent2021/libevent -I/usr/local/src/ompi-trunk/opal/mca/event/libevent2021/libevent/include conftest.c >&5 conftest.c:611: warning: function declaration isn't a prototype configure:133950: $? = 0 configure:133950: result: yes And I see it in opal_config.h: /* Define to 1 if you have the declaration of `IBV_LINK_LAYER_ETHERNET', and to 0 if you don't. */ #define HAVE_DECL_IBV_LINK_LAYER_ETHERNET 1 Note the #define is HAVE_DECL_IBV_LINK_LAYER_ETHERNET but the code is checking for HAVE_IBV_LINK_LAYER_ETHERNET! No _DECL_... > -Original Message- > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Steve Wise > Sent: Tuesday, August 20, 2013 9:07 AM > To: 'Jeff Squyres (jsquyres)' > Cc: 'Open MPI Developers'; 'Indranil Choudhury' > Subject: Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC > > > > > -Original Message- > > From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com] > > Sent: Tuesday, August 20, 2013 8:59 AM > > To: Steve Wise > > Cc: Open MPI Developers; Indranil Choudhury > > Subject: Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC > > > > On Aug 20, 2013, at 9:51 AM, Steve Wise <sw...@opengridcomputing.com> wrote: > > > > > I checked in the correct fix, > > > > Er, no. Please re-read my email -- your fix was incorrect (you're > > overriding the output of an AC > macro). > > :-) > > > > What is the correct fix then? I've never worked with any of this AC stuff... > > With the existing code (prior to my broken fix), HAVE_IBV_LINK_LAYER_ETHERNET > does not get defined. > Yet the enum and the link_type field are in verbs.h... > > Thanks. > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
On Aug 20, 2013, at 10:06 AM, Steve Wisewrote: > What is the correct fix then? I've never worked with any of this AC stuff... > > With the existing code (prior to my broken fix), HAVE_IBV_LINK_LAYER_ETHERNET > does not get defined. > Yet the enum and the link_type field are in verbs.h... What's the result of the IBV_LINK_LAYER_ETHERNET test in your configure? Is it failing for some reason? Look in config.log to see exactly what that test tried and what its result was. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
> -Original Message- > From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com] > Sent: Tuesday, August 20, 2013 8:59 AM > To: Steve Wise > Cc: Open MPI Developers; Indranil Choudhury > Subject: Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC > > On Aug 20, 2013, at 9:51 AM, Steve Wise <sw...@opengridcomputing.com> wrote: > > > I checked in the correct fix, > > Er, no. Please re-read my email -- your fix was incorrect (you're overriding > the output of an AC macro). > :-) > What is the correct fix then? I've never worked with any of this AC stuff... With the existing code (prior to my broken fix), HAVE_IBV_LINK_LAYER_ETHERNET does not get defined. Yet the enum and the link_type field are in verbs.h... Thanks.
Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
On Aug 20, 2013, at 9:51 AM, Steve Wisewrote: > I checked in the correct fix, Er, no. Please re-read my email -- your fix was incorrect (you're overriding the output of an AC macro). :-) -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
> Thanks for finding r27212. It was about a year ago, and had clearly fallen > out of my cache (I have very > little to do with the openib BTL these days). > > Your solution isn't correct, because HAVE_IBV_LINK_LAYER_ETHERNET is defined > (nor not) via this m4 > macro in config/ompi_check_openfabrics.m4: > >AC_CHECK_DECLS([IBV_LINK_LAYER_ETHERNET], > [$1_have_rdmaoe=1], [], > [#include ]) > > This m4 macro will #define HAVE_IBV_LINK_LAYER_ETHERNET if it exists, or > #undef that name if it > doesn't. I checked in the correct fix, just below the code snipit you cited,in ompi_check_openfabrics.m4, we see this snipit which is incorrect: AC_DEFINE_UNQUOTED([OMPI_HAVE_RDMAOE], [$$1_have_rdmaoe], [Enable RDMAoE support]) It should be adding HAVE_IBV_LINK_LAYER_ETHERNET, not OMPI_HAVE_RDMAOE. STevo
Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
Thanks for finding r27212. It was about a year ago, and had clearly fallen out of my cache (I have very little to do with the openib BTL these days). Your solution isn't correct, because HAVE_IBV_LINK_LAYER_ETHERNET is defined (nor not) via this m4 macro in config/ompi_check_openfabrics.m4: AC_CHECK_DECLS([IBV_LINK_LAYER_ETHERNET], [$1_have_rdmaoe=1], [], [#include ]) This m4 macro will #define HAVE_IBV_LINK_LAYER_ETHERNET if it exists, or #undef that name if it doesn't. Do you not see the check for IBV_LINK_LAYER_ETHERNET in your configure stdout? The code in the oob CPC in question is: - /* If we have the transport_type member, check to ensure we're on IB (this CPC will not work with iWarp). If we do not have the transport_type member, then we must be < OFED v1.2, and therefore we must be IB. */ #if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) && defined(HAVE_IBV_LINK_LAYER_ETHERNET) if (BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)) { opal_output_verbose(5, ompi_btl_base_framework.framework_output, "openib BTL: oob CPC only supported on InfiniBand; skipped on %s:%d", ibv_get_device_name(btl->device->ib_dev), btl->port_num); return OMPI_ERR_NOT_SUPPORTED; } #endif So are you saying you have a libibverbs that does not have IBV_LINK_LAYER_ETHERNET, but it *does* support iWARP? If so, as the comment clearly states, that would violate the assumption of that logic... But I'm not sure how that could happen. On Aug 19, 2013, at 5:38 PM, Steve Wise <sw...@opengridcomputing.com> wrote: > > >> -Original Message- >> From: Steve Wise [mailto:sw...@opengridcomputing.com] >> Sent: Monday, August 19, 2013 4:02 PM >> To: 'Open MPI Developers'; 'Jeff Squyres (jsquyres)' >> Cc: 'Indranil Choudhury' >> Subject: RE: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC >> >> I guess HAVE_IBV_LINK_LAYER_ETHERNET is guarding against a libibverbs that >> doesn't have >> IBV_LINK_LAYER_ETHERNET defined. So the proper fix, I think, is to enhance >> configure to check > this and >> #define HAVE_IBV_LINK_LAYER_ETHERNET if it exists. Or have it check >> existence of a link_layer > field in >> the ibv_port_attr structure. >> >> > > Maybe something like this? > > Index: ompi_check_openfabrics.m4 > === > --- ompi_check_openfabrics.m4 (revision 29048) > +++ ompi_check_openfabrics.m4 (working copy) > @@ -198,7 +198,7 @@ > [#include ]) > >AC_MSG_CHECKING([if RDMAoE support is enabled]) > - AC_DEFINE_UNQUOTED([OMPI_HAVE_RDMAOE], [$$1_have_rdmaoe], [Enable > RDMAoE support]) > + AC_DEFINE_UNQUOTED([HAVE_IBV_LINK_LAYER_ETHERNET], > [$$1_have_rdmaoe], [Enable RDMAoE > support]) >if test "1" = "$$1_have_rdmaoe"; then > AC_MSG_RESULT([yes]) >else > -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
> -Original Message- > From: Steve Wise [mailto:sw...@opengridcomputing.com] > Sent: Monday, August 19, 2013 4:02 PM > To: 'Open MPI Developers'; 'Jeff Squyres (jsquyres)' > Cc: 'Indranil Choudhury' > Subject: RE: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC > > I guess HAVE_IBV_LINK_LAYER_ETHERNET is guarding against a libibverbs that > doesn't have > IBV_LINK_LAYER_ETHERNET defined. So the proper fix, I think, is to enhance > configure to check this and > #define HAVE_IBV_LINK_LAYER_ETHERNET if it exists. Or have it check > existence of a link_layer field in > the ibv_port_attr structure. > > Maybe something like this? Index: ompi_check_openfabrics.m4 === --- ompi_check_openfabrics.m4 (revision 29048) +++ ompi_check_openfabrics.m4 (working copy) @@ -198,7 +198,7 @@ [#include ]) AC_MSG_CHECKING([if RDMAoE support is enabled]) - AC_DEFINE_UNQUOTED([OMPI_HAVE_RDMAOE], [$$1_have_rdmaoe], [Enable RDMAoE support]) + AC_DEFINE_UNQUOTED([HAVE_IBV_LINK_LAYER_ETHERNET], [$$1_have_rdmaoe], [Enable RDMAoE support]) if test "1" = "$$1_have_rdmaoe"; then AC_MSG_RESULT([yes]) else
Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
I guess HAVE_IBV_LINK_LAYER_ETHERNET is guarding against a libibverbs that doesn't have IBV_LINK_LAYER_ETHERNET defined. So the proper fix, I think, is to enhance configure to check this and #define HAVE_IBV_LINK_LAYER_ETHERNET if it exists. Or have it check existence of a link_layer field in the ibv_port_attr structure. > -Original Message- > From: Steve Wise [mailto:sw...@opengridcomputing.com] > Sent: Monday, August 19, 2013 3:53 PM > To: 'Open MPI Developers'; 'Jeff Squyres (jsquyres)' > Cc: 'Indranil Choudhury' > Subject: RE: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC > > > This patch fixes iwarp. dunno if it breaks RoCE though :) > > > [root@r9 ompi-trunk]# svn diff > Index: ompi/mca/btl/openib/connect/btl_openib_connect_oob.c > === > --- ompi/mca/btl/openib/connect/btl_openib_connect_oob.c(revision > 29048) > +++ ompi/mca/btl/openib/connect/btl_openib_connect_oob.c(working copy) > @@ -127,7 +127,7 @@ > IB (this CPC will not work with iWarp). If we do not have the > transport_type member, then we must be < OFED v1.2, and > therefore we must be IB. */ > -#if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) && > defined(HAVE_IBV_LINK_LAYER_ETHERNET) > +#if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) > if (BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)) { > opal_output_verbose(5, ompi_btl_base_framework.framework_output, > "openib BTL: oob CPC only supported on > InfiniBand; skipped on %s:%d",
Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
This patch fixes iwarp. dunno if it breaks RoCE though :) [root@r9 ompi-trunk]# svn diff Index: ompi/mca/btl/openib/connect/btl_openib_connect_oob.c === --- ompi/mca/btl/openib/connect/btl_openib_connect_oob.c(revision 29048) +++ ompi/mca/btl/openib/connect/btl_openib_connect_oob.c(working copy) @@ -127,7 +127,7 @@ IB (this CPC will not work with iWarp). If we do not have the transport_type member, then we must be < OFED v1.2, and therefore we must be IB. */ -#if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) && defined(HAVE_IBV_LINK_LAYER_ETHERNET) +#if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) if (BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)) { opal_output_verbose(5, ompi_btl_base_framework.framework_output, "openib BTL: oob CPC only supported on InfiniBand; skipped on %s:%d",
Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
> > I could if I had a patch/fix. :) I don't (yet) understand why > > HAVE_IBV_LINK_LAYER_ETHERNET was > added. > > Can the developer who made these changes explain the intent? I think it > > might have to do with RoCE > > support. > > > > Seems like there should be some change to configure for adding this #define... This commit added the new #define: r27212 | jsquyres | 2012-08-31 18:42:37 -0700 (Fri, 31 Aug 2012) | 22 lines Per some discussions between LANL, Cisco, ORNAL, and Mellanox, move some new common OpenFabrics functionality to ompi/mca/common/verbs. Also move everything that was in ompi/mca/common/ofautils under ompi/mca/common/verbs. * Move ofautils -> verbs * Add new functionality in ompi/mca/common/verbs (see doxygen * comments in ompi/mca/common/verbs/common_verbs.h for details): * ompi_common_verbs_find_ibv_ports() * ompi_common_verbs_port_bw() * ompi_common_verbs_mtu() * '''If you're writing verbs-based code, you should be using this common functionality''' * Adapt openib BTL to use some trivial common functionality in common/verbs * Don't use "#ifdef OMPI_HAVE_RDMAOE",use "#if defined(HAVE_IBV_LINK_LAYER_ETHERNET)" * Update the following to include/link against common/verbs * bcol/iboffload * sbgp/ibnet * btl/openib > > > > > > > On Aug 19, 2013, at 4:17 PM, Steve Wise <sw...@opengridcomputing.com> > > > wrote: > > > > > > >> -Original Message- > > > >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Steve Wise > > > >> Sent: Monday, August 19, 2013 2:42 PM > > > >> To: 'Open MPI Developers'; 'Jeff Squyres (jsquyres)' > > > >> Cc: 'Indranil Choudhury' > > > >> Subject: Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC > > > >> > > > >> I confirmed that this is a regression from 1.7.1... > > > >> > > > >> I'll see if I can figure out what's going on... > > > >> > > > > > > > > > > > > Looks like this is not defined anywhere: HAVE_IBV_LINK_LAYER_ETHERNET, > > > > which causes > > > > btl_openib_connect_oob.c:oob_component_query() to falsely claim oob > > > > support for iwarp devices. > > > > > > > > In 1.7.1 we see this in oob_component_query(): > > > > > > > > #if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) > > > >if (BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)) { > > > >opal_output_verbose(5, mca_btl_base_output, > > > >"openib BTL: oob CPC only supported on > > > > InfiniBand; skipped on > %s:%d", > > > >ibv_get_device_name(btl->device->ib_dev), > > > >btl->port_num); > > > >return OMPI_ERR_NOT_SUPPORTED; > > > >} > > > > #endif > > > > > > > > In 1.7.2, it adds the HAVE_IBV_LINK_LAYER_ETHERNET define: > > > > > > > > #if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) && > > > defined(HAVE_IBV_LINK_LAYER_ETHERNET) > > > >if (BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)) { > > > >opal_output_verbose(5, mca_btl_base_output, > > > >"openib BTL: oob CPC only supported on > > > > InfiniBand; skipped on > %s:%d", > > > >ibv_get_device_name(btl->device->ib_dev), > > > >btl->port_num); > > > >return OMPI_ERR_NOT_SUPPORTED; > > > >} > > > > #endif > > > > > > > > > > > > > > > > > -- > > > Jeff Squyres > > > jsquy...@cisco.com > > > For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
> -Original Message- > From: Steve Wise [mailto:sw...@opengridcomputing.com] > Sent: Monday, August 19, 2013 3:25 PM > To: 'Jeff Squyres (jsquyres)' > Cc: 'Open MPI Developers'; 'Indranil Choudhury' > Subject: RE: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC > > > > > -Original Message- > > From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com] > > Sent: Monday, August 19, 2013 3:23 PM > > To: Steve Wise > > Cc: Open MPI Developers; Indranil Choudhury > > Subject: Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC > > > > No need to both post to the ticket and to devel -- just pick one. :-) > > > > Mkay. > > > Can you send a patch/fix? > > > > I could if I had a patch/fix. :) I don't (yet) understand why > HAVE_IBV_LINK_LAYER_ETHERNET was added. > Can the developer who made these changes explain the intent? I think it might > have to do with RoCE > support. > Seems like there should be some change to configure for adding this #define... > > > > On Aug 19, 2013, at 4:17 PM, Steve Wise <sw...@opengridcomputing.com> wrote: > > > > >> -Original Message- > > >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Steve Wise > > >> Sent: Monday, August 19, 2013 2:42 PM > > >> To: 'Open MPI Developers'; 'Jeff Squyres (jsquyres)' > > >> Cc: 'Indranil Choudhury' > > >> Subject: Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC > > >> > > >> I confirmed that this is a regression from 1.7.1... > > >> > > >> I'll see if I can figure out what's going on... > > >> > > > > > > > > > Looks like this is not defined anywhere: HAVE_IBV_LINK_LAYER_ETHERNET, > > > which causes > > > btl_openib_connect_oob.c:oob_component_query() to falsely claim oob > > > support for iwarp devices. > > > > > > In 1.7.1 we see this in oob_component_query(): > > > > > > #if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) > > >if (BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)) { > > >opal_output_verbose(5, mca_btl_base_output, > > >"openib BTL: oob CPC only supported on > > > InfiniBand; skipped on %s:%d", > > >ibv_get_device_name(btl->device->ib_dev), > > >btl->port_num); > > >return OMPI_ERR_NOT_SUPPORTED; > > >} > > > #endif > > > > > > In 1.7.2, it adds the HAVE_IBV_LINK_LAYER_ETHERNET define: > > > > > > #if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) && > > defined(HAVE_IBV_LINK_LAYER_ETHERNET) > > >if (BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)) { > > >opal_output_verbose(5, mca_btl_base_output, > > >"openib BTL: oob CPC only supported on > > > InfiniBand; skipped on %s:%d", > > >ibv_get_device_name(btl->device->ib_dev), > > >btl->port_num); > > >return OMPI_ERR_NOT_SUPPORTED; > > >} > > > #endif > > > > > > > > > > > > -- > > Jeff Squyres > > jsquy...@cisco.com > > For corporate legal information go to: > > http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
> -Original Message- > From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com] > Sent: Monday, August 19, 2013 3:23 PM > To: Steve Wise > Cc: Open MPI Developers; Indranil Choudhury > Subject: Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC > > No need to both post to the ticket and to devel -- just pick one. :-) > Mkay. > Can you send a patch/fix? > I could if I had a patch/fix. :) I don't (yet) understand why HAVE_IBV_LINK_LAYER_ETHERNET was added. Can the developer who made these changes explain the intent? I think it might have to do with RoCE support. > > On Aug 19, 2013, at 4:17 PM, Steve Wise <sw...@opengridcomputing.com> wrote: > > >> -Original Message- > >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Steve Wise > >> Sent: Monday, August 19, 2013 2:42 PM > >> To: 'Open MPI Developers'; 'Jeff Squyres (jsquyres)' > >> Cc: 'Indranil Choudhury' > >> Subject: Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC > >> > >> I confirmed that this is a regression from 1.7.1... > >> > >> I'll see if I can figure out what's going on... > >> > > > > > > Looks like this is not defined anywhere: HAVE_IBV_LINK_LAYER_ETHERNET, > > which causes > > btl_openib_connect_oob.c:oob_component_query() to falsely claim oob support > > for iwarp devices. > > > > In 1.7.1 we see this in oob_component_query(): > > > > #if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) > >if (BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)) { > >opal_output_verbose(5, mca_btl_base_output, > >"openib BTL: oob CPC only supported on > > InfiniBand; skipped on %s:%d", > >ibv_get_device_name(btl->device->ib_dev), > >btl->port_num); > >return OMPI_ERR_NOT_SUPPORTED; > >} > > #endif > > > > In 1.7.2, it adds the HAVE_IBV_LINK_LAYER_ETHERNET define: > > > > #if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) && > defined(HAVE_IBV_LINK_LAYER_ETHERNET) > >if (BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)) { > >opal_output_verbose(5, mca_btl_base_output, > >"openib BTL: oob CPC only supported on > > InfiniBand; skipped on %s:%d", > >ibv_get_device_name(btl->device->ib_dev), > >btl->port_num); > >return OMPI_ERR_NOT_SUPPORTED; > >} > > #endif > > > > > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
No need to both post to the ticket and to devel -- just pick one. :-) Can you send a patch/fix? On Aug 19, 2013, at 4:17 PM, Steve Wise <sw...@opengridcomputing.com> wrote: >> -Original Message- >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Steve Wise >> Sent: Monday, August 19, 2013 2:42 PM >> To: 'Open MPI Developers'; 'Jeff Squyres (jsquyres)' >> Cc: 'Indranil Choudhury' >> Subject: Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC >> >> I confirmed that this is a regression from 1.7.1... >> >> I'll see if I can figure out what's going on... >> > > > Looks like this is not defined anywhere: HAVE_IBV_LINK_LAYER_ETHERNET, which > causes > btl_openib_connect_oob.c:oob_component_query() to falsely claim oob support > for iwarp devices. > > In 1.7.1 we see this in oob_component_query(): > > #if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) >if (BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)) { >opal_output_verbose(5, mca_btl_base_output, >"openib BTL: oob CPC only supported on InfiniBand; > skipped on %s:%d", >ibv_get_device_name(btl->device->ib_dev), >btl->port_num); >return OMPI_ERR_NOT_SUPPORTED; >} > #endif > > In 1.7.2, it adds the HAVE_IBV_LINK_LAYER_ETHERNET define: > > #if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) && > defined(HAVE_IBV_LINK_LAYER_ETHERNET) >if (BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)) { >opal_output_verbose(5, mca_btl_base_output, >"openib BTL: oob CPC only supported on InfiniBand; > skipped on %s:%d", >ibv_get_device_name(btl->device->ib_dev), >btl->port_num); >return OMPI_ERR_NOT_SUPPORTED; >} > #endif > > -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
> -Original Message- > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Steve Wise > Sent: Monday, August 19, 2013 2:42 PM > To: 'Open MPI Developers'; 'Jeff Squyres (jsquyres)' > Cc: 'Indranil Choudhury' > Subject: Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC > > I confirmed that this is a regression from 1.7.1... > > I'll see if I can figure out what's going on... > Looks like this is not defined anywhere: HAVE_IBV_LINK_LAYER_ETHERNET, which causes btl_openib_connect_oob.c:oob_component_query() to falsely claim oob support for iwarp devices. In 1.7.1 we see this in oob_component_query(): #if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) if (BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)) { opal_output_verbose(5, mca_btl_base_output, "openib BTL: oob CPC only supported on InfiniBand; skipped on %s:%d", ibv_get_device_name(btl->device->ib_dev), btl->port_num); return OMPI_ERR_NOT_SUPPORTED; } #endif In 1.7.2, it adds the HAVE_IBV_LINK_LAYER_ETHERNET define: #if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) && defined(HAVE_IBV_LINK_LAYER_ETHERNET) if (BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)) { opal_output_verbose(5, mca_btl_base_output, "openib BTL: oob CPC only supported on InfiniBand; skipped on %s:%d", ibv_get_device_name(btl->device->ib_dev), btl->port_num); return OMPI_ERR_NOT_SUPPORTED; } #endif
Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
I confirmed that this is a regression from 1.7.1... I'll see if I can figure out what's going on... > -Original Message- > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Steve Wise > Sent: Monday, August 19, 2013 12:15 PM > To: 'Jeff Squyres (jsquyres)' > Cc: de...@open-mpi.org; 'Indranil Choudhury' > Subject: Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC > > > > > -Original Message- > > From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com] > > Sent: Monday, August 19, 2013 12:06 PM > > To: Steve Wise > > Cc: <de...@open-mpi.org> > > Subject: Re: openmpi-1.7.2 fails to use the RDMACM CPC > > > > Not offhand. > > > > Given the lack of iWARP testing in the community, it's not entirely > > unsurprising that this broke. > Will > > Chelsio setup some Open MPI + MTT to track this kind of stuff regularly? > > > > +Indranil from Chelsio. > > They used to run MTT. They do regularly test OpenMPI, but haven't tested > 1.7.2 yet or I would have > seen an internal bug on this. :) They have run 1.7.1. > > Indranil, do you all still run Open MPI + MTT? > > Thanks, > > Steve. > > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
> -Original Message- > From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com] > Sent: Monday, August 19, 2013 12:06 PM > To: Steve Wise > Cc:> Subject: Re: openmpi-1.7.2 fails to use the RDMACM CPC > > Not offhand. > > Given the lack of iWARP testing in the community, it's not entirely > unsurprising that this broke. Will > Chelsio setup some Open MPI + MTT to track this kind of stuff regularly? > +Indranil from Chelsio. They used to run MTT. They do regularly test OpenMPI, but haven't tested 1.7.2 yet or I would have seen an internal bug on this. :) They have run 1.7.1. Indranil, do you all still run Open MPI + MTT? Thanks, Steve.