Lixia et al, thanks for spending your valuable time on reading my draft. Some questions outside the list have been raised and now I realize that I have left out perhaps the most important chapter from the draft - how will this proposal help to scale the routing architecture and solve the multi-homing issue....
I'll try to summarize the benefits here, maybe later add a new chapter to the draft to highlight what might be achieved. When the hIPv4 framework is fully completed the RIB of an ISP, that has created an ALOC realm, will have the following entries: - the PA-addresses of directly attached customers (e.g. residential and enterprises) - the PI-addresses of directly attached customers (e.g. enterprises) - the globally unique ALOC prefixes, from other service providers and enterprises using classical multi-homing (i.e. PI-addresses, AS-number and BGP) The ISP will not have any PA- or PI-addresses from other service providers, in order to do routing and forwarding of packets between ISPs only ALOC information of other ISPs is needed. So the ALOC is a sort of a super-aggregate, locating the ALOC realm of a service provider in Internet and thus reducing the RIB size in the DFZ But this will not help that much in multi-homing scenarios, which are causing the biggest impact on growth of the size of the RIB in the DFZ - replacing a /20 IPv4 prefix with a /32 ALOC prefix will do no good. With hIPv4 you could do a new type of multi-homing solution, there is no longer a need to have an AS-number and to use BGP in order to achieve multi-homing. What is needed is a PI-address space (ELOC), which is unique in a region, e.g. 10.1.1.0/24 . The enterprise install Internet connections from two or more ISPs, the ISPs are providing the ALOC information for the enterprise, e.g. 192.168.1.1 and 192.168.1.2. What the enterprise need to have is two border routers that are capable of doing policy based routing based upon the ALOC field of the locator header. When the endpoint of the enterprise is assembling the hIPv4 header it uses the local IP address as the source address (10.1.1.1) and either ALOC prefix (192.168.1.1 or 192.168.1.2), depending on which is the preferred service provider. If the preferred ISP is broken the endpoint should just try to switch to the other ISP by changing the ALOC value in the locator header - the session is lost, but some can survive. The upstream ISP can still do uRPF, the source address is the PI-address (10.1.1.0/24) and, if preferred, the ISP border routers should also do uRPF on the ALOC value in the locator header. This is a "not-so-dynamic" solution, the border routers of the enterprise do not know if the upstream ISP have all the routes of Internet - if a critical link is broken at an ISP the border router (BR) do have no way of knowing that - since there is no dynamic routing protocol between the ISP and enterprise's BR. So from the endpoint point of view, if the primary ISP is broken the endpoint needs to try the other ISP, this becomes more or less try-and-error multi-homing solution. But then again, how often does an ISP loose connectivity to Internet - how much should we care about this problem?? Think that most ISP backbones are properly designed and it is very rare that an ISP looses so many links that it becomes partly useless. And if it does, well, I wouldn't use that ISP any longer - since having a PI address I would just replace that ISP with another one that can do a proper backbone design. More likely cause is the first mile between the enterprise BR and the ISP, if that gets broken the BR will become aware of the broken link by using e.g. BFD and then it might inform somehow the endpoints that the preferred ALOC (ISP) have become useless or then perhaps replace the ALOC prefix in the locator header with the ALOC prefix of the secondary ISP - uh, oh, here I go -proposing a NAT solution:-) Throw MPTCP on the try-and-error multi-homing solution and it becomes a lot more interesting :-) The MPTCP enabled endpoint can setup subflows, e.g. the first subflow uses the SA=10.1.1.1 in the IP header and ALOC=192.168.1.1 in the locator header and the second uses subflow SA=10.1.1.1 and ALOC=192.168.1.2. By using different ALOC prefixes for the subflows the endpoint can decide which ISPs are used and ensure that different paths are taken for the upstream traffic. So by adding MPTCP to the try&error multi-homing scenario and you will have redundant paths to the other endpoint via different ISPs, true *dynamic* load-balancing without the need to tweak any routing protocols, only a single NIC on the endpoints and if there is a network failure MPTCP takes care of that. Summary, try&error multi-homing solution have the following characteristics - AS number is not needed - PI address space is required - no BGP configuration&tuning is required at the enterprises BR - no ALOC is required/allowed for the enterprises, instead several ALOC prefixes are "borrowed" from the upstream ISPs - MPTCP provides dynamic load-balancing without tuning routing protocols, several paths can be simultaneously used and thus resilience is achieved - zero growth of RIB entries at the DFZ - the FIB size at the BR is not depended upon the size of the FIB in DFZ - the enterprise's BR can not cause BGP churn in the DFZ or adjacent ISP - the cost of BR gets down By having the ALOC prefixes from the DFZ dynamically shared and installed at the BR - using BGP between the BR and ISP - but without allocating an ALOC prefix for the enterprise another scenario is created, a stub multi-homing solution. In this scenario you would then need to have an AS number and use BGP, then it will get a little bit more complex, more expensive but the other side of the coin is that becomes more dynamic. The stub multi-homing scenario have the following characheristics - AS number is required - PI address space is required - BGP configuration&tuning is required at the enterprises BR - no ALOC is required/allowed for the enterprises, instead several ALOC prefixes are "borrowed" from the upstream ISPs - MPTCP provides dynamic load-balancing, several paths can be simultaneously used and thus resilience is achieved - zero growth of RIB entries at the DFZ - the FIB size at the BR is depended upon the size of the FIB in DFZ and adjacent ISPs - the enterprise's BR can cause BGP churn for the adjacent ISP but not in the DFZ - the cost of BR is higher than in the try&error multi-homing scenario Then the question is, how to keep the growth of ALOC reasonable - if you are using PI-addresses, having an AS number and running BGP - why not ask for an ALOC prefix and play with the Big Boys in the Big League?? Guess the only way to prevent this scenario is to speak the language that the CIOs best understand, i.e. the allocation of an ALOC should have a yearly cost. And it is granted to have cost for allocating an ALOC prefix, because when you are using an ALOC your are reserving a FIB entry throughout the DFZ - and the ALOC FIB entry needs to have power, space, hardware and cooling on all the routers in the DFZ - IMHO, you ought to pay for that since you are really reserving a lot of resources. I'm not sure that I have covered all corner cases, there could be issues that could turn down the two scenarios, more research work is definitely needed. So please give this approach hard times, thanks. -- patte On Sat, Oct 24, 2009 at 7:20 AM, Lixia Zhang <[email protected]> wrote: > Patrick, > > your msg broke the long silence of this mailing list! > I've yet to read your new draft and comment (just finished my 3 week-long > back-2-back trips), but will try to do so in coming days, as part of my > efforts to get ready for Hiroshima. > > Talking about Hiroshima: according to the RRG plan, the Hiroshima RRG > meeting will be focusing on the discussions of RRG recommendation to IETF on > scalable routing solutions. There is precisely two weeks before Hiroshima > now, lets get the discussion started on the list first. I'm going through > all the exchanges since Stockholm by subject groups, in an attempt to make a > summary. > > Lixia > > On Oct 20, 2009, at 7:55 AM, Patrick Frejborg wrote: > >> Hi all, >> >> during the great discussion about identifiers back in July >> participants pointed out interesting solutions/proposals around the >> topic - such as ILNP, how Apple is solving the mobility challenge, >> Multipath TCP, Nimrod, etc - and now when I have studied and absorbed >> the material I felt a need to update the hierarchical IPv4 framework. >> I think the discussion was very useful for me - I learned a lot so >> thanks to all participants who took part in the discussion >> (unfortunately I started my vacation at the time and wish I have had >> more time to be in the discussion). >> In a nutshell what has been changed at >> http://www.ietf.org/id/draft-frejborg-hipv4-03.txt >> >> 1. Backwards compatibility >> MPTCP is doing a very nice job with backwards compatibility, "hiding" >> the new features in the TCP option field. Inspired by this I stumbled >> over RFC 1385 "The Extended Internet Protocol" and moved the ALOC&ELOC >> field away from the IP header into the IP option field. No longer a >> need to have new protocol ID assigned - greater backwards >> compatibility should be achieved by using the IP option field. >> >> 2. IPsec AH >> ILNP is taking care of the IPsec AH challenge. In the hIPV4 framework >> IPsec AH is no-go, due to that the LSR is a middlebox swapping the IP >> source and destination header. We could get around this and make IPsec >> AH work also in the hIPv4 framework by first assembling a legacy IPv4 >> packet, then copying the pseudoheader checksum to the IP option field >> (there is now a 16 bit padding field where the checksum would fit in) >> and then insert the ALOC information to the header, recalculate the >> pseudoheader checksum and send the packet to the other endpoint. When >> LSR is swapping the packet the padding field remains intact and when >> the remote endpoints receives the packet the original pseudoheader >> checksum can be retrieved from the padding field. But I think this >> would be an awful kludge, because it would >> - break the IPsec AH specs >> - not solve the NAT traversal issue, and the IPv4 world is full of NAT >> middleboxes >> Also, I haven't seen many IPsec AH implementations lately - most IPsec >> installations are LAN-to-LAN solutions using ESP and remote access >> system are becoming more and more deployed upon SSL/TLS based RAS >> So I let darwinism take care of IPsec AH - it is not well suited for >> the IPv4 world and SSL/TLS has been able to adapt better to the NAT >> traversal challenge. >> >> 3. The identifier >> Host or session identifier? After studying the MPTCP drafts I found >> potential in the sender token, it might be used as session identifier >> to solve site and endpoint mobility issues. But the sender token can >> not be used to improve NAT traversal, here you would prefer to have >> HIP in place. On the other hand, if you prefer to do NAT you should >> not expect to have all features available as when not using NAT - and >> should we encourage the use of NAT? So I think the the sender token is >> good enough to create a semi-session layer protocol (as AppleTalk had) >> that could be used to achieve better mobility - MPTCP looks promising >> at the moment. If a host really needs to be identified - well, I would >> use a PKI solution for that purpose. >> >> 4. Traffic Engineering >> MPTCP might create subflows for a connection, how to route the >> subflows on different links in the backbone - especially if both >> endpoints have just one IP-address at each host? IGP tuning will not >> be useful, MPLS TE might be used but it gets tricky since both >> subflows uses the same protocol. What if you could apply Valiant >> Load-Balancing on the subflows and separate the subflows that way over >> different links in the backbone?? >> >> Suggestions, questions, feedback and/or criticism is highly appreciated. >> >> -- patte >> _______________________________________________ >> rrg mailing list >> [email protected] >> http://www.irtf.org/mailman/listinfo/rrg > > _______________________________________________ rrg mailing list [email protected] http://www.irtf.org/mailman/listinfo/rrg
