Roland, this looks good! A few comments below...
> -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Roland Dreier > Sent: Wednesday, August 24, 2005 12:07 AM > To: [email protected] > Subject: [openib-general] RDMA connection and address translation API > > At the OpenIB workshop on Monday, we had some discussion about a > high-level transport-neutral API for connection handling. After > giving the topic some more thought, I've come to the conclusion that > neither the kDAPL API nor the new API that was presented are usable. > In this email, I'll try to detail my reasoning and sketch what I > believe is the correct API. > > The new API that we looked at was essentially the following (I'm > recreating this from memory, so I apologize if I misrepresent it): > > listen(local_ip_address, service_id, listen_callback) > connect(local_qp, remote_ip_address, qos, service_id, > private_data, connect_callback) > > We already discussed the problem with having the listen callback pass > the consumer a remote source address -- doing this requires the > connection handling module to do an ATS reverse lookup in the IB case, > which the consumer might not want. I think there's agreement that the > correct thing here is for the listen callback to pass a transport > address to the consumer and provide a function that the consumer can > call to perform an ATS reverse lookup if desired. This isn't a major > problem and can be dealt with. > > However, there's another problem with trying to lump address > translation and connection into a single "connect" call, and this > problem looks fundamental and fatal to me. The connect call takes a > QP pointer, but to create a QP the consumer needs to know which local > device to use. However, the consumer doesn't know which device to use > until the destination address has been resolved to a route, including > a local interface. > > As far as I can tell, kDAPL punts on this and simply requires the > consumer to handle the route lookup itself before calling > dat_ep_connect(). It seems that current kDAPL consumers similarly > punt on this issue: the iSER initiator and the NFS-RDMA client both > just use a single device which is statically discovered at init time. > Yes, DAPL punts on this. > It seems that the kDAPL connection model has a serious flaw, in that > it pushes the complexity of route lookup into the consumer. Further, > we have strong evidence that this routing code is hard to write and > that consumers will just ignore this complexity and hard-code > solutions that don't work under all configurations. > I agree! > With this in mind, I believe that the connection API needs to be > something more like the following: > > rdma_resolve_address(): > inputs: dest IP address, qos, npaths, > done callback, opaque context > done callback params: status, local RDMA device, > RDMA transport address, context > > This function starts the process of resolving an IP address to > an RDMA device and address. When the resolution is complete, > the callback is called with a status. If the status is > "success" then the callback also gets the device pointer and > transport address (as well as the original context that the > consumer passed in). > > The "RDMA transport address" type is a union containing > transport-dependent data. In the IB case, it's all of the > SGID, DGID, SLID, DLID, SL etc. that we know and love. In the > iWARP case, it's the source IP, destination IP and QOS. > > npaths can be either 1 or 2 in the IB case; if it's 2, then > the resolver will try to find a primary and alternate path for > APM. In the iWARP case, I guess npaths will always be 1, and > I guess anyone who wants to use iWARP over multihomed SCTP > will probably have to use some lower-level API. > > By the way, we may also have to have the option of passing in > a local netdev so that we can handle link-local IPv6 > addresses. There may be other cases I haven't thought of yet. > I just hope we can avoid going all the way to the horror of > the getaddrinfo() API. > > I also hope we can agree to use IPoIB ARP to resolve the > address in the IB case; having a flag or some other hack in > the API to expose the option of ATS seems unacceptably ugly. > > rdma_connect(): > inputs: local QP, RDMA transport address, destination service, > private data, timeout, event callback, opaque context > > This function takes the resolved address and actually > connects. > > I'm not sure how we want to abstract the IB service vs. iWARP > TCP port number difference. I guess it's OK to have iWARP > consumers stick their (16-bit) port number in a 64-bit > parameter, even if it's not the prettiest API. > > To head off the knee-jerk objection: this API does NOT require any > transport-specific code in consumers (unless a particular consumer > WANTS to look inside the RDMA transport address). Code to connect > would be as simple as: > > rdma_resolve_address(...); > /* wait for resolution */ > ib_create_qp(...) /* use device pointer we got from > rdma_resolve_address() */ > rdma_connect(...); /* pass transport address we got from > rdma_resolve_address() */ > /* wait for connection to finish... */ > > The listen side is even simpler: > > rdma_listen(): > inputs: local service, event callback, consumer context > > Wait for connection requests and pass events to the consumer's > callback. I'm not sure if/home we want to support binding to > a particular IP address. The current IB CM in Linux doesn't > support binding a listen to a single device or port, and even > if it did it's not clear how to handle binding to one IP > address when a port has more than one IP. > > I guess the event callback would receive a device pointer and > the same RDMA transport address union I talked about above > when discussing address resolution. > > It would be possible to have another function like > rdma_getpeername() that takes the transport address and > returns a source IP address. In the IB case this would do an > ATS reverse lookup. However, I hate this idea. iSER already > uses the CM private data to pass the source IP in the IB case, > and I would much rather fix NFS/RDMA to do the same thing (so > we can just kill ATS as an address resolution method). > I think we should allow an ULP application can listen on a specific IP address and device, this is done often in servers to limit the scope of the service. I was thinking such a ULP would simply walk the list of ib devices and issue a rdma_listen() on each device. But the ULP should also be able to pass down the local ipaddr upon which the device should listen. Consider an NFS/RDMA server ULP that has exports that are limited to a single subnet or interface, etc... _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
