Isn't this getting a bit more complex than it needs to be.  Let me see if I have this correct:

1. Applications want to use existing API to identify remote endnodes / services. 

2. Endnodes are identified by an IPv4 / v6 address and services by a port number

3. The existing network stacks already comprehend how to discover routes to endnodes using ARP / ND.  These protocols can determine whether there is a single or multiple IP addresses and store these in the local network stack route table.

4. Route tables can contain any number of layer 2 and 3 address information (function of implementation) and various policies can be constructed to make an intelligent decision on which layer 2 and 3 addresses to return to an application.

5. iWARP can use the existing infrastructure without modification so no changes are required to make it work.

6. IB uses a different layer 2 address - not just a 48-bit MAC - thus while different than Ethernet, conceptually works just the same.   Both can support multiple IP addresses per layer 2 address as it is really just a matter of replicating the information on a per IP address basis.

7. When a route look up occurs, a set of IP addresses are returned.  Depending upon the kernel interface, one can also return the layer 2 information either as part of this look up or as a separate query to the route table.

8. Layer 2 information provides the necessary data to construct CM messages or to identify the path for the IP over IB ULP.

So, from the above, it seems the IP and IB world can operate using the same code and work just fine.  So, where is the problem?  Is it really just how management assigns IP address to IB interfaces and how an application should select or be informed of which IP address to use and thus transparently identifies the IB port?  Where is the connection establishment problem?  The application does not see any difference.   The network stack only acts as a repository for routing information unless running directly over IP over IB thus is not impacted.  The middleware simply needs to extract the layer 2 information thus obtains the requisite data to construct the CM messages when going straight to IB (there is no change required here for iWARP as this is all native to its operation).   What am I missing here?

Mike



At 10:10 AM 10/9/2005, Tom Tucker wrote:
On Sun, 2005-10-09 at 07:57 -0700, Sean Hefty wrote:
> >It is theoretically possible to support all this on an IPoIB based
> >network. Multiple subnets, multiple routes to remote peers, ICMP
> >redirect, multiple IP addresses for each physical interface, yada yada
> >yada. But IMHO, the only way to do this would be to tie directly into
> >the existing routing,  ARP, ICMP, etc... subsystems in Linux. Otherwise
> >you'll end up recreating a gigantic (and I mean GIGANTIC) amount of
>
> The current implementation ties into the standard Linux ARP tables.  If
> connections were made over TCP/IP, using IPoIB, then I don't think that there
> would be any issues.  The issues only arise because of the desire to use TCP/IP
> network addresses over a non-TCP/IP network.
>
> >code. This belief is why I've been a proponent of mapping GIDs to one
> >and only one IP address and treating it for management purposes as the
> >equivalent of an IP address. Without this, the whole mechanism for
> >determining routes, etc.. breaks down. If you treat the GID like a MAC
> >address -- it breaks, because a MAC address can have multiple IP
> >addresses -- the observation that lead to the conclusion that ATS was
> >broken in the first place.
>
> We should be able to handle the case where a GID has multiple IP addresses bound
> to it.  But even if we added a 1:1 restriction, the connection over IB issue
> still exists.

I agree, except for RARP.

>
> >I know there is significant resistance to this idea, but I just don't
> >see how we get this generically resolved without binding the two
> >addressing schemes more closely. With the current binding, I just don't
> >think it works.
>
> Again, I don't think that the binding is the issue, so much as the desire to use
> an address for a protocol that isn't actually being used for communication. 

Not to be pedantic, but if binding or mapping or somesuch weren't an
issue we wouldn't need AT.

> I
> don't view a GID as an IP address because we're not sending and receiving IP
> packets on the GID.  IPoIB treats GIDs as only part of a MAC address, which I
> think is the proper view.
>
> Anyway, returning back to the original problem of connecting to an IB gateway if
> a given a destination IP address on a different subnet...  I'm slowly convincing
> myself that either the CMA or AT should do this.  (I believe that the ib_addr
> code will do this now, but still wasn't sure that we wanted it to.)
>

IMHO, you need a service separate from the CMA to do address
translation. My (iWARP's) rationale for this is that there are two
clients of the service, the CM and IP. For CM, you need it to elect a
route and thereby a local interface. For IP you need it because routes
change and ARP entries time out.

BTW, can you educate me ... is the following what you're thinking:

On the client side...

- route is discovered by looking at the Linux routing table
- local interface is IPoIB (looks at rdma_ptr embedded in netdev struct)
- send ARP AT message over local IB interface

At the gateway...bridging to IP

- ARP AT query received on IB interface
- Lookup route to destination IP address in gateway's route table.
- If next hop's Ethernet address is already known, it is returned
- Otherwise, local interface identified is IPoEthernet
- New ARP query goes out on the local interface from the route
- When response comes back, answer is returned.

At the gateway...bridging to IPoIB

- ARP AT message received on IB interface, delivered to AT
- Lookup route to destination IP address in gateway's route table
- If next hop's Ethernet address is already known, it is returned
- otherwise, local interface identified in route is IPoIB
- New ARP AT query goes out on the local interface
- When response comes back, answer is returned.

Thanks,



> - Sean
>
>
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to