[ofa-general] Re: [OMPI devel] OMPI over ofed udapl - bugs opened

2007-05-09 Thread Steve Wise

Although as Boris pointed out, perhaps the hack in OMPI is no longer
needed at all...


On Wed, 2007-05-09 at 08:41 -0500, Steve Wise wrote:
 606 opened to track the udapl change.
 
 607 opened to track the ompi change to remove the port number stashing
 hack.
 
 Status: I have a patch from Arlin to test today.  I will test with that
 patch and with the OMPI port hack removed.  Stay tuned...
 
 
 
 Steve.
 
 On Tue, 2007-05-08 at 15:47 -0700, Arlin Davis wrote:
  Steve Wise wrote:
  
  I would like the group to consider including changes needed to OMPI
  and/or ofa udapl to get OMPI working again on udapl for ofed-1.2.  
  
  This will provide OMPI support over iwarp devices via udapl until we can
  get rdma-cm support added to OMPI.  
  
  
  Steve.


  
  Steve,cCan you open a bug to track this?
 
 ___
 devel mailing list
 [EMAIL PROTECTED]
 http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: [OMPI devel] OMPI over ofed udapl - bugs opened

2007-05-09 Thread Jeff Squyres
FWIW, I would marginally prefer if this bug is tracked in the Open  
MPI trac ticket system, not the OFA bugzilla (Steve W. will have  
write access there as soon as Chelsio submits their OMPI 3rd party  
contribution agreement).  We've traditionally [mostly] tracked OMPI  
bugs in the OMPI bug system and OFED-specific OMPI packaging problems  
in the OFA bugzilla.  It's a gray area, I admit.


But since I'm not the uDAPL maintainer in Open MPI, moving the bug  
over there will allow the Right people to see it (some OMPI  
developers are cross subscribed to the OFA general list, but not  
all).  For example, this udapl problem is likely related to the  
existing OMPI trac ticket 890 (https://svn.open-mpi.org/trac/ompi/ 
ticket/890).



On May 9, 2007, at 10:37 AM, Steve Wise wrote:



Although as Boris pointed out, perhaps the hack in OMPI is no longer
needed at all...


On Wed, 2007-05-09 at 08:41 -0500, Steve Wise wrote:

606 opened to track the udapl change.

607 opened to track the ompi change to remove the port number  
stashing

hack.

Status: I have a patch from Arlin to test today.  I will test with  
that

patch and with the OMPI port hack removed.  Stay tuned...



Steve.

On Tue, 2007-05-08 at 15:47 -0700, Arlin Davis wrote:

Steve Wise wrote:


I would like the group to consider including changes needed to OMPI
and/or ofa udapl to get OMPI working again on udapl for ofed-1.2.

This will provide OMPI support over iwarp devices via udapl  
until we can

get rdma-cm support added to OMPI.


Steve.




Steve,cCan you open a bug to track this?


___
devel mailing list
[EMAIL PROTECTED]
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: [OMPI devel] OMPI over ofed udapl - bugs opened

2007-05-09 Thread Donald Kerr


I agree OMPI trac ticket #890 should cover this. I will test the 
suggested fix, just removing that one line from btl_udapl.c, on Solaris. 
I am still not set up on Linux so hopefully Steve can confirm there.


-DON

Jeff Squyres wrote:

FWIW, I would marginally prefer if this bug is tracked in the Open  
MPI trac ticket system, not the OFA bugzilla (Steve W. will have  
write access there as soon as Chelsio submits their OMPI 3rd party  
contribution agreement).  We've traditionally [mostly] tracked OMPI  
bugs in the OMPI bug system and OFED-specific OMPI packaging problems  
in the OFA bugzilla.  It's a gray area, I admit.


But since I'm not the uDAPL maintainer in Open MPI, moving the bug  
over there will allow the Right people to see it (some OMPI  
developers are cross subscribed to the OFA general list, but not  
all).  For example, this udapl problem is likely related to the  
existing OMPI trac ticket 890 (https://svn.open-mpi.org/trac/ompi/ 
ticket/890).



On May 9, 2007, at 10:37 AM, Steve Wise wrote:

 


Although as Boris pointed out, perhaps the hack in OMPI is no longer
needed at all...


On Wed, 2007-05-09 at 08:41 -0500, Steve Wise wrote:
   


606 opened to track the udapl change.

607 opened to track the ompi change to remove the port number  
stashing

hack.

Status: I have a patch from Arlin to test today.  I will test with  
that

patch and with the OMPI port hack removed.  Stay tuned...



Steve.

On Tue, 2007-05-08 at 15:47 -0700, Arlin Davis wrote:
 


Steve Wise wrote:

   


I would like the group to consider including changes needed to OMPI
and/or ofa udapl to get OMPI working again on udapl for ofed-1.2.

This will provide OMPI support over iwarp devices via udapl  
until we can

get rdma-cm support added to OMPI.


Steve.



 


Steve,cCan you open a bug to track this?
   


___
devel mailing list
[EMAIL PROTECTED]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
 




 


___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [OMPI devel] OMPI over ofed udapl - bugs opened

2007-05-09 Thread Steve Wise
On Wed, 2007-05-09 at 16:20 -0400, Donald Kerr wrote:
 I missing some context here. Where are you plugging iwarp and OMPI 
 together? 

ofed-1.2 supports iwarp and the chelsio rnic.  It can be accessed
directly via the ofa verbs and ofa rdma-cm _as well as_ via udapl.  

I'm attempting to run OMPI over udapl over chelsio's rnic.

Steve.



___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [OMPI devel] OMPI over ofed udapl - bugs opened

2007-05-09 Thread Donald Kerr
So then I agree with Andrew, I think you are trying to impose 
restrictions on uDAPL which are not part of the Spec.


-DON

Steve Wise wrote:


On Wed, 2007-05-09 at 16:20 -0400, Donald Kerr wrote:
 

I missing some context here. Where are you plugging iwarp and OMPI 
together? 
   



ofed-1.2 supports iwarp and the chelsio rnic.  It can be accessed
directly via the ofa verbs and ofa rdma-cm _as well as_ via udapl.  


I'm attempting to run OMPI over udapl over chelsio's rnic.

Steve.



 


___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: [OMPI devel] OMPI over ofed udapl - bugs opened

2007-05-09 Thread Steve Wise
On Wed, 2007-05-09 at 16:15 -0700, Andrew Friedley wrote:
 
 Steve Wise wrote:
  There have been a series of discussions on the ofa general list about
  this issue, and the conclusion to date is that it cannot be resolved in
  the rdma-cm or iwarp-cm code of the linux rdma stack.  Mainly because
  sending an RDMA message involves the ULP's work queue and completion
  queue, so the CM cannot do this under the covers in a mannor that
  doesn't affect the application.  Thus, the applications must deal with
  this.
 
 Why can't uDAPL deal with this?  As a uDAPL user, I really don't care 
 what API uDAPL is using under the hood to move data from one place to 
 another, nor the quirks of that API.  The whole point of uDAPL is to 
 form a network-agnostic abstraction layer.  AFAIK, the uDAPL spec 
 doesn't enforce any such requirement on RDMA communication either.  In 
 my opinion, exposing such behavior above uDAPL is incorrect and is part 
 of why uDAPL has seen limited adoption -- every single uDAPL 
 implementation behaves in different ways, making it extremely difficult 
 to write an application to work on any uDAPL implementation.  Sorry if 
 this sounds harsh, but this comes from many hours of banging my head on 
 the wall due to working around these sorts of problems :)
 

I understand your frustration.  I think the MPA protocol is deficient in
this respect and should have required the necessary first FPDU to be
sent under the covers by the RNICs. A RTR packet if you will.  To
resolve this issue properly, in my opinion, would involve changing the
IETF MPA spec and also breaking all the existing iwarp HW.  We can't do
that.

The reason it is hard or impossible to solve this in the DAPL layer is
that any rdma operation on the QP affects the state of that QP and the
associate CQs.  In addition, if you use an RDMA send to enforce this you
impact the other side by consuming a RECV buffer. So its hard if not
impossible to do this under the covers without affecting the
application's resources.

Also, the DAPL specification had a goal to not impose any additional
protocol on the wire.  If you add this under the covers, then you add
such a protocol and break interoperability between a connection
accessed via DAPL on one end and some other API on the other end.

  
  Here is a possible solution: 
  
  I assume in OMPI that connections are only initiated when the mpi
  application does a send operation.   Given that, then udapl btl must
  ensure that if a given rank accepts a connection, it cannot not send
  anything until the rank at the other end of the connection sends first.
  Since the other side initiated the connection, it will have pending data
  to send...
  
  I haven't looked into how painful this will be to implement.
  
  Thoughts?
 
 Following on what I wrote above, I think Open MPI is the wrong place to 
 be dealing with this.  There's enough of these hacks as it is; I'm not 
 interested in seeing more get added.
 

Unfortunately, I haven't been able to come up with a solution that works
with existing iWARP HW and is interoperable. 

Steve.

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: [OMPI devel] OMPI over ofed udapl - bugs opened

2007-05-09 Thread Andrew Friedley



Steve Wise wrote:

On Wed, 2007-05-09 at 16:15 -0700, Andrew Friedley wrote:

Steve Wise wrote:

There have been a series of discussions on the ofa general list about
this issue, and the conclusion to date is that it cannot be resolved in
the rdma-cm or iwarp-cm code of the linux rdma stack.  Mainly because
sending an RDMA message involves the ULP's work queue and completion
queue, so the CM cannot do this under the covers in a mannor that
doesn't affect the application.  Thus, the applications must deal with
this.
Why can't uDAPL deal with this?  As a uDAPL user, I really don't care 
what API uDAPL is using under the hood to move data from one place to 
another, nor the quirks of that API.  The whole point of uDAPL is to 
form a network-agnostic abstraction layer.  AFAIK, the uDAPL spec 
doesn't enforce any such requirement on RDMA communication either.  In 
my opinion, exposing such behavior above uDAPL is incorrect and is part 
of why uDAPL has seen limited adoption -- every single uDAPL 
implementation behaves in different ways, making it extremely difficult 
to write an application to work on any uDAPL implementation.  Sorry if 
this sounds harsh, but this comes from many hours of banging my head on 
the wall due to working around these sorts of problems :)




I understand your frustration.  I think the MPA protocol is deficient in
this respect and should have required the necessary first FPDU to be
sent under the covers by the RNICs. A RTR packet if you will.  To
resolve this issue properly, in my opinion, would involve changing the
IETF MPA spec and also breaking all the existing iwarp HW.  We can't do
that.


Understood.


The reason it is hard or impossible to solve this in the DAPL layer is
that any rdma operation on the QP affects the state of that QP and the
associate CQs.  In addition, if you use an RDMA send to enforce this you
impact the other side by consuming a RECV buffer. So its hard if not
impossible to do this under the covers without affecting the
application's resources.


Is there no way to do this before passing connection established events 
to the uDAPL consumer?  I need to go read up on the uDAPL API to really 
understand why this wouldn't work.




Also, the DAPL specification had a goal to not impose any additional
protocol on the wire.  If you add this under the covers, then you add
such a protocol and break interoperability between a connection
accessed via DAPL on one end and some other API on the other end.


So I guess there's no 'right' solution, at least at the uDAPL level. 
With RDMACM/OFA verbs, there's at least the argument that you can design 
the API/semantics however you please, while uDAPL is already standardized.


I hope you guys are documenting this in a way that makes this issue 
extremely clear to both uDAPL and OFA verbs (is this the right naming?) 
users.  Maybe it's been done already, but is it possible to emit some 
sort of loud warning/error when the accept()'ing side tries to send 
before a receive?


Andrew
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [OMPI devel] OMPI over ofed udapl - bugs opened

2007-05-09 Thread Sean Hefty

The reason it is hard or impossible to solve this in the DAPL layer is
that any rdma operation on the QP affects the state of that QP and the
associate CQs.  In addition, if you use an RDMA send to enforce this you
impact the other side by consuming a RECV buffer. So its hard if not
impossible to do this under the covers without affecting the
application's resources.


I agree that this is hard, but I don't believe that it's impossible.


Also, the DAPL specification had a goal to not impose any additional
protocol on the wire.  If you add this under the covers, then you add
such a protocol and break interoperability between a connection
accessed via DAPL on one end and some other API on the other end.


IMO, this is a unrealized dream.  DAPL does generate wire protocol.  For 
example, when running over IB, DAPL's selection of a service ID and CM protocol 
is visible on the wire.  A DAPL that establishes connections using the RDMA CM 
will likely have a different wire protocol than a version of DAPL that 
establishes connections talking directly to the IB CM.  The two DAPLs will not 
interoperate unless they agree on how they will map to service IDs and, in the 
case of using the RDMA CM, the format of the private data carried in the CM 
messages.


Even in the case of iWarp, DAPL's selection of a local port number affects the 
data visible on the wire.  TO communicate, a remote end point must know how this 
mapping occurs.


- Sean
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general