On Fri, 2005-10-07 at 20:13 -0400, Hal Rosenstock wrote: > On Fri, 2005-10-07 at 19:57, Sean Hefty wrote: > > Hal Rosenstock wrote: > > > Would an iWARP connection jump across IP subnets ? It would need to > > > determine that it could do this (ala NHRP with ATM). Also, could there > > > be other RDMA networks between them (like IB) ? > > > > if iWarp is on top of TCP, I don't think that it would care about IP > > subnets. > > I think iWARP can be on top of TCP or SCTP. But why wouldn't it care ? > Doesn't a routing decision still need to be made at the IP layer ? > Doesn't the IP next hop need to be determined (e.g. gateway when the > destination is off the local IP subnet) ? Is there something that > precludes iWARP from working across IP subnets ? > > -- Hal > I've just read through entire this thread for the first time, and I sense considerable confusion about how IP routing works. I know I'm confused ;-)
With sockets, the path to the remote peer is determined *after* the connection request is submitted by the app (connect(...)). The app has no idea which local interface will ultimately handle this connection or what the path (route) is to the remote peer. It simply says connect(67.65.105.4, ...). In fact, TCP doesn't know this either! Like Hal suggests, the connect request (SYN packet) gets all the way down to IP where the least cost route is selected, and if not already known, the Ethernet address is determined (arp) for the next hop. The reasons for this are varied but include: routes may change, Ethernet addresses for next hops change, all within the lifetime of a connection. Almost certainly if the connection lasts more than 15 minutes. The route identifies the local interface, and next hop IP. An interface is only ever on a single subnet. The ARP broadcast is issued on this interface and is only on this one subnet. We're not broadcasting across subnets. Note that the local interface is "logical", and a single Ethernet NIC may have multiple IP addresses and may in fact be on multiple subnets if using VLAN. It is theoretically possible to support all this on an IPoIB based network. Multiple subnets, multiple routes to remote peers, ICMP redirect, multiple IP addresses for each physical interface, yada yada yada. But IMHO, the only way to do this would be to tie directly into the existing routing, ARP, ICMP, etc... subsystems in Linux. Otherwise you'll end up recreating a gigantic (and I mean GIGANTIC) amount of code. This belief is why I've been a proponent of mapping GIDs to one and only one IP address and treating it for management purposes as the equivalent of an IP address. Without this, the whole mechanism for determining routes, etc.. breaks down. If you treat the GID like a MAC address -- it breaks, because a MAC address can have multiple IP addresses -- the observation that lead to the conclusion that ATS was broken in the first place. I know there is significant resistance to this idea, but I just don't see how we get this generically resolved without binding the two addressing schemes more closely. With the current binding, I just don't think it works. If I'm off in the weeds, please let me know ... and I'll cease spouting off. > _______________________________________________ > openib-general mailing list > [email protected] > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
