[ofa-general] Re: [OMPI devel] OMPI over ofed udapl - bugs opened
Although as Boris pointed out, perhaps the hack in OMPI is no longer needed at all... On Wed, 2007-05-09 at 08:41 -0500, Steve Wise wrote: 606 opened to track the udapl change. 607 opened to track the ompi change to remove the port number stashing hack. Status: I have a patch from Arlin to test today. I will test with that patch and with the OMPI port hack removed. Stay tuned... Steve. On Tue, 2007-05-08 at 15:47 -0700, Arlin Davis wrote: Steve Wise wrote: I would like the group to consider including changes needed to OMPI and/or ofa udapl to get OMPI working again on udapl for ofed-1.2. This will provide OMPI support over iwarp devices via udapl until we can get rdma-cm support added to OMPI. Steve. Steve,cCan you open a bug to track this? ___ devel mailing list [EMAIL PROTECTED] http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: [OMPI devel] OMPI over ofed udapl - bugs opened
FWIW, I would marginally prefer if this bug is tracked in the Open MPI trac ticket system, not the OFA bugzilla (Steve W. will have write access there as soon as Chelsio submits their OMPI 3rd party contribution agreement). We've traditionally [mostly] tracked OMPI bugs in the OMPI bug system and OFED-specific OMPI packaging problems in the OFA bugzilla. It's a gray area, I admit. But since I'm not the uDAPL maintainer in Open MPI, moving the bug over there will allow the Right people to see it (some OMPI developers are cross subscribed to the OFA general list, but not all). For example, this udapl problem is likely related to the existing OMPI trac ticket 890 (https://svn.open-mpi.org/trac/ompi/ ticket/890). On May 9, 2007, at 10:37 AM, Steve Wise wrote: Although as Boris pointed out, perhaps the hack in OMPI is no longer needed at all... On Wed, 2007-05-09 at 08:41 -0500, Steve Wise wrote: 606 opened to track the udapl change. 607 opened to track the ompi change to remove the port number stashing hack. Status: I have a patch from Arlin to test today. I will test with that patch and with the OMPI port hack removed. Stay tuned... Steve. On Tue, 2007-05-08 at 15:47 -0700, Arlin Davis wrote: Steve Wise wrote: I would like the group to consider including changes needed to OMPI and/or ofa udapl to get OMPI working again on udapl for ofed-1.2. This will provide OMPI support over iwarp devices via udapl until we can get rdma-cm support added to OMPI. Steve. Steve,cCan you open a bug to track this? ___ devel mailing list [EMAIL PROTECTED] http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: [OMPI devel] OMPI over ofed udapl - bugs opened
I agree OMPI trac ticket #890 should cover this. I will test the suggested fix, just removing that one line from btl_udapl.c, on Solaris. I am still not set up on Linux so hopefully Steve can confirm there. -DON Jeff Squyres wrote: FWIW, I would marginally prefer if this bug is tracked in the Open MPI trac ticket system, not the OFA bugzilla (Steve W. will have write access there as soon as Chelsio submits their OMPI 3rd party contribution agreement). We've traditionally [mostly] tracked OMPI bugs in the OMPI bug system and OFED-specific OMPI packaging problems in the OFA bugzilla. It's a gray area, I admit. But since I'm not the uDAPL maintainer in Open MPI, moving the bug over there will allow the Right people to see it (some OMPI developers are cross subscribed to the OFA general list, but not all). For example, this udapl problem is likely related to the existing OMPI trac ticket 890 (https://svn.open-mpi.org/trac/ompi/ ticket/890). On May 9, 2007, at 10:37 AM, Steve Wise wrote: Although as Boris pointed out, perhaps the hack in OMPI is no longer needed at all... On Wed, 2007-05-09 at 08:41 -0500, Steve Wise wrote: 606 opened to track the udapl change. 607 opened to track the ompi change to remove the port number stashing hack. Status: I have a patch from Arlin to test today. I will test with that patch and with the OMPI port hack removed. Stay tuned... Steve. On Tue, 2007-05-08 at 15:47 -0700, Arlin Davis wrote: Steve Wise wrote: I would like the group to consider including changes needed to OMPI and/or ofa udapl to get OMPI working again on udapl for ofed-1.2. This will provide OMPI support over iwarp devices via udapl until we can get rdma-cm support added to OMPI. Steve. Steve,cCan you open a bug to track this? ___ devel mailing list [EMAIL PROTECTED] http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [OMPI devel] OMPI over ofed udapl - bugs opened
On Wed, 2007-05-09 at 16:20 -0400, Donald Kerr wrote: I missing some context here. Where are you plugging iwarp and OMPI together? ofed-1.2 supports iwarp and the chelsio rnic. It can be accessed directly via the ofa verbs and ofa rdma-cm _as well as_ via udapl. I'm attempting to run OMPI over udapl over chelsio's rnic. Steve. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [OMPI devel] OMPI over ofed udapl - bugs opened
So then I agree with Andrew, I think you are trying to impose restrictions on uDAPL which are not part of the Spec. -DON Steve Wise wrote: On Wed, 2007-05-09 at 16:20 -0400, Donald Kerr wrote: I missing some context here. Where are you plugging iwarp and OMPI together? ofed-1.2 supports iwarp and the chelsio rnic. It can be accessed directly via the ofa verbs and ofa rdma-cm _as well as_ via udapl. I'm attempting to run OMPI over udapl over chelsio's rnic. Steve. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: [OMPI devel] OMPI over ofed udapl - bugs opened
On Wed, 2007-05-09 at 16:15 -0700, Andrew Friedley wrote: Steve Wise wrote: There have been a series of discussions on the ofa general list about this issue, and the conclusion to date is that it cannot be resolved in the rdma-cm or iwarp-cm code of the linux rdma stack. Mainly because sending an RDMA message involves the ULP's work queue and completion queue, so the CM cannot do this under the covers in a mannor that doesn't affect the application. Thus, the applications must deal with this. Why can't uDAPL deal with this? As a uDAPL user, I really don't care what API uDAPL is using under the hood to move data from one place to another, nor the quirks of that API. The whole point of uDAPL is to form a network-agnostic abstraction layer. AFAIK, the uDAPL spec doesn't enforce any such requirement on RDMA communication either. In my opinion, exposing such behavior above uDAPL is incorrect and is part of why uDAPL has seen limited adoption -- every single uDAPL implementation behaves in different ways, making it extremely difficult to write an application to work on any uDAPL implementation. Sorry if this sounds harsh, but this comes from many hours of banging my head on the wall due to working around these sorts of problems :) I understand your frustration. I think the MPA protocol is deficient in this respect and should have required the necessary first FPDU to be sent under the covers by the RNICs. A RTR packet if you will. To resolve this issue properly, in my opinion, would involve changing the IETF MPA spec and also breaking all the existing iwarp HW. We can't do that. The reason it is hard or impossible to solve this in the DAPL layer is that any rdma operation on the QP affects the state of that QP and the associate CQs. In addition, if you use an RDMA send to enforce this you impact the other side by consuming a RECV buffer. So its hard if not impossible to do this under the covers without affecting the application's resources. Also, the DAPL specification had a goal to not impose any additional protocol on the wire. If you add this under the covers, then you add such a protocol and break interoperability between a connection accessed via DAPL on one end and some other API on the other end. Here is a possible solution: I assume in OMPI that connections are only initiated when the mpi application does a send operation. Given that, then udapl btl must ensure that if a given rank accepts a connection, it cannot not send anything until the rank at the other end of the connection sends first. Since the other side initiated the connection, it will have pending data to send... I haven't looked into how painful this will be to implement. Thoughts? Following on what I wrote above, I think Open MPI is the wrong place to be dealing with this. There's enough of these hacks as it is; I'm not interested in seeing more get added. Unfortunately, I haven't been able to come up with a solution that works with existing iWARP HW and is interoperable. Steve. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: [OMPI devel] OMPI over ofed udapl - bugs opened
Steve Wise wrote: On Wed, 2007-05-09 at 16:15 -0700, Andrew Friedley wrote: Steve Wise wrote: There have been a series of discussions on the ofa general list about this issue, and the conclusion to date is that it cannot be resolved in the rdma-cm or iwarp-cm code of the linux rdma stack. Mainly because sending an RDMA message involves the ULP's work queue and completion queue, so the CM cannot do this under the covers in a mannor that doesn't affect the application. Thus, the applications must deal with this. Why can't uDAPL deal with this? As a uDAPL user, I really don't care what API uDAPL is using under the hood to move data from one place to another, nor the quirks of that API. The whole point of uDAPL is to form a network-agnostic abstraction layer. AFAIK, the uDAPL spec doesn't enforce any such requirement on RDMA communication either. In my opinion, exposing such behavior above uDAPL is incorrect and is part of why uDAPL has seen limited adoption -- every single uDAPL implementation behaves in different ways, making it extremely difficult to write an application to work on any uDAPL implementation. Sorry if this sounds harsh, but this comes from many hours of banging my head on the wall due to working around these sorts of problems :) I understand your frustration. I think the MPA protocol is deficient in this respect and should have required the necessary first FPDU to be sent under the covers by the RNICs. A RTR packet if you will. To resolve this issue properly, in my opinion, would involve changing the IETF MPA spec and also breaking all the existing iwarp HW. We can't do that. Understood. The reason it is hard or impossible to solve this in the DAPL layer is that any rdma operation on the QP affects the state of that QP and the associate CQs. In addition, if you use an RDMA send to enforce this you impact the other side by consuming a RECV buffer. So its hard if not impossible to do this under the covers without affecting the application's resources. Is there no way to do this before passing connection established events to the uDAPL consumer? I need to go read up on the uDAPL API to really understand why this wouldn't work. Also, the DAPL specification had a goal to not impose any additional protocol on the wire. If you add this under the covers, then you add such a protocol and break interoperability between a connection accessed via DAPL on one end and some other API on the other end. So I guess there's no 'right' solution, at least at the uDAPL level. With RDMACM/OFA verbs, there's at least the argument that you can design the API/semantics however you please, while uDAPL is already standardized. I hope you guys are documenting this in a way that makes this issue extremely clear to both uDAPL and OFA verbs (is this the right naming?) users. Maybe it's been done already, but is it possible to emit some sort of loud warning/error when the accept()'ing side tries to send before a receive? Andrew ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [OMPI devel] OMPI over ofed udapl - bugs opened
The reason it is hard or impossible to solve this in the DAPL layer is that any rdma operation on the QP affects the state of that QP and the associate CQs. In addition, if you use an RDMA send to enforce this you impact the other side by consuming a RECV buffer. So its hard if not impossible to do this under the covers without affecting the application's resources. I agree that this is hard, but I don't believe that it's impossible. Also, the DAPL specification had a goal to not impose any additional protocol on the wire. If you add this under the covers, then you add such a protocol and break interoperability between a connection accessed via DAPL on one end and some other API on the other end. IMO, this is a unrealized dream. DAPL does generate wire protocol. For example, when running over IB, DAPL's selection of a service ID and CM protocol is visible on the wire. A DAPL that establishes connections using the RDMA CM will likely have a different wire protocol than a version of DAPL that establishes connections talking directly to the IB CM. The two DAPLs will not interoperate unless they agree on how they will map to service IDs and, in the case of using the RDMA CM, the format of the private data carried in the CM messages. Even in the case of iWarp, DAPL's selection of a local port number affects the data visible on the wire. TO communicate, a remote end point must know how this mapping occurs. - Sean ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general