Re: [rrg] Updated draft of hIPv4 framework

Patrick Frejborg Mon, 26 Oct 2009 06:42:48 -0700

Lixia et al,

thanks for spending your valuable time on reading my draft. Some
questions outside the list have been raised and  now I realize that I
have left out perhaps the most important chapter from the draft  - how
will this proposal help to scale the routing architecture and solve
the multi-homing issue....

I'll try to summarize the benefits here, maybe later add a new chapter
to the draft to highlight what might be achieved.

When the hIPv4 framework is fully completed the RIB of an ISP, that
has created an ALOC realm, will have the following entries:
- the PA-addresses of directly attached customers (e.g. residential
and enterprises)
- the PI-addresses of directly attached customers (e.g. enterprises)
- the globally unique ALOC prefixes, from other service providers and
enterprises using classical multi-homing (i.e. PI-addresses, AS-number
and BGP)
The ISP will not have any PA- or PI-addresses from other service
providers, in order to do routing and forwarding of packets between
ISPs only ALOC information of other ISPs is needed. So the ALOC is a
sort of a super-aggregate, locating the ALOC realm of a service
provider in Internet and thus reducing the RIB size in the DFZ

But this will not help that much in multi-homing scenarios, which are
causing the biggest impact on growth of the size of the RIB in the DFZ
- replacing a /20 IPv4 prefix with a /32 ALOC prefix will do no good.

With hIPv4 you could do a new type of multi-homing solution, there is
no longer a need to have an AS-number and to use BGP in order to
achieve multi-homing. What is needed is a PI-address space (ELOC),
which is unique in a region, e.g. 10.1.1.0/24 . The enterprise install
Internet connections from two or more ISPs, the ISPs are providing the
ALOC information for the enterprise, e.g. 192.168.1.1 and 192.168.1.2.
What the enterprise need to have is two border routers that are
capable of doing policy based routing based upon the ALOC field of the
locator header. When the endpoint of the enterprise is assembling the
hIPv4 header it uses the local IP address as the source address
(10.1.1.1) and either ALOC prefix (192.168.1.1 or 192.168.1.2),
depending on which is the preferred service provider. If the preferred
ISP is broken the endpoint should just try to switch to the other ISP
by changing the ALOC value in the locator header - the session is
lost, but some can survive. The upstream ISP can still do uRPF, the
source address is the PI-address (10.1.1.0/24) and, if preferred, the
ISP border routers should also do uRPF on the ALOC value in the
locator header.
This is a "not-so-dynamic" solution, the border routers of the
enterprise do not know if the upstream ISP have all the routes of
Internet - if a critical link is broken at an ISP the border router
(BR) do have no way of knowing that -  since there is no dynamic
routing protocol between the ISP and enterprise's BR. So from the
endpoint point of view, if the primary ISP is broken the endpoint
needs to try the other ISP, this becomes more or less try-and-error
multi-homing solution. But then again, how often does an ISP loose
connectivity to Internet - how much should we care about this
problem?? Think that most ISP backbones are properly designed and it
is very rare that an ISP looses so many links that it becomes partly
useless. And if it does, well, I wouldn't use that ISP any longer -
since having a PI address I would just replace that ISP with another
one that can do a proper backbone design.

More likely cause is the first mile between the enterprise BR and the
ISP, if that gets broken the BR will become aware of the broken link
by using e.g. BFD and then it might inform somehow the endpoints that
the preferred ALOC (ISP) have become useless or then perhaps replace
the ALOC prefix in the locator header with the ALOC prefix of the
secondary ISP - uh, oh, here I go -proposing a NAT solution:-)

Throw MPTCP on the try-and-error multi-homing solution and it becomes
a lot more interesting :-)

The MPTCP enabled endpoint can setup subflows, e.g. the first subflow
uses the SA=10.1.1.1 in the IP header and ALOC=192.168.1.1 in the
locator header and the second  uses subflow SA=10.1.1.1 and
ALOC=192.168.1.2. By using different ALOC prefixes for the subflows
the endpoint can decide which ISPs are used and ensure that different
paths are taken for the upstream traffic.

So by adding MPTCP to the try&error multi-homing scenario and you will
have redundant paths to the other endpoint via different ISPs, true
*dynamic* load-balancing without the need to tweak any routing
protocols, only a single NIC on the endpoints and if there is a
network failure MPTCP takes care of that.

Summary, try&error multi-homing solution have the following characteristics
- AS number is not needed
- PI address space is required
- no BGP configuration&tuning is required at the enterprises BR
- no ALOC is required/allowed for the enterprises, instead several
ALOC prefixes are "borrowed" from the upstream ISPs
- MPTCP provides dynamic load-balancing without tuning routing
protocols, several paths can be simultaneously used and thus
resilience is achieved
- zero growth of RIB entries at the DFZ
- the FIB size at the BR is not depended upon the size of the FIB in DFZ
- the enterprise's BR can not cause BGP churn in the DFZ or adjacent ISP
- the cost of BR gets down

By having the ALOC prefixes from the DFZ dynamically shared and
installed at the BR - using BGP between the BR and ISP -  but without
allocating an ALOC prefix for the enterprise another scenario is
created, a stub multi-homing solution. In this scenario you would then
need to have an AS number and use BGP, then it will get a little bit
more complex, more expensive but the other side of the coin is that
becomes more dynamic. The stub multi-homing scenario have the
following characheristics
- AS number is required
- PI address space is required
- BGP configuration&tuning is required at the enterprises BR
- no ALOC is required/allowed for the enterprises, instead several
ALOC prefixes are "borrowed" from the upstream ISPs
- MPTCP provides dynamic load-balancing, several paths can be
simultaneously used and thus resilience is achieved
- zero growth of RIB entries at the DFZ
- the FIB size at the BR is depended upon the size of the FIB in DFZ
and adjacent ISPs
- the enterprise's BR can cause BGP churn for the adjacent ISP but not
in the DFZ
- the cost of BR is higher than in the try&error multi-homing scenario

Then the question is, how to keep the growth of ALOC reasonable - if
you are using PI-addresses, having an AS number and running BGP - why
not ask for an ALOC prefix and play with the Big Boys in the Big
League??
Guess the only way to prevent this scenario is to speak the language
that the CIOs best understand, i.e. the allocation of an ALOC should
have a yearly cost. And it is granted to have cost for allocating an
ALOC prefix, because when you are using an ALOC your are reserving a
FIB entry throughout the DFZ  - and the ALOC FIB entry needs to have
power, space, hardware and cooling on all the routers in the DFZ -
IMHO, you ought to pay for that since you are really reserving a lot
of resources.

I'm not sure that I have covered all corner cases, there could be
issues that could turn down the two scenarios, more research work is
definitely needed. So please give this approach hard times, thanks.

-- patte

On Sat, Oct 24, 2009 at 7:20 AM, Lixia Zhang <[email protected]> wrote:
> Patrick,
>
> your msg broke the long silence of this mailing list!
> I've yet to read your new draft and comment (just finished my 3 week-long
> back-2-back trips), but will try to do so in coming days, as part of my
> efforts to get ready for Hiroshima.
>
> Talking about Hiroshima: according to the RRG plan, the Hiroshima RRG
> meeting will be focusing on the discussions of RRG recommendation to IETF on
> scalable routing solutions. There is precisely two weeks before Hiroshima
> now, lets get the discussion started on the list first.  I'm going through
> all the exchanges since Stockholm by subject groups, in an attempt to make a
> summary.
>
> Lixia
>
> On Oct 20, 2009, at 7:55 AM, Patrick Frejborg wrote:
>
>> Hi all,
>>
>> during the great discussion about identifiers back in July
>> participants pointed out interesting solutions/proposals around the
>> topic - such as ILNP, how Apple is solving the mobility challenge,
>> Multipath TCP, Nimrod, etc - and now when I have studied and absorbed
>> the material I felt a need to update the hierarchical IPv4 framework.
>> I think the discussion was very useful for me - I learned a lot so
>> thanks to all participants who took part in the discussion
>> (unfortunately I started my vacation at the time and wish I have had
>> more time to be in the discussion).
>> In a nutshell what has been changed at
>> http://www.ietf.org/id/draft-frejborg-hipv4-03.txt
>>
>> 1. Backwards compatibility
>> MPTCP is doing a very nice job with backwards compatibility, "hiding"
>> the new features in the TCP option field. Inspired by this I stumbled
>> over RFC 1385 "The Extended Internet Protocol" and moved the ALOC&ELOC
>> field away from the IP header into the IP option field. No longer a
>> need to have new protocol ID assigned - greater backwards
>> compatibility should be achieved by using the IP option field.
>>
>> 2. IPsec AH
>> ILNP is taking care of the IPsec AH challenge. In the hIPV4 framework
>> IPsec AH is no-go, due to that the LSR is a middlebox swapping the IP
>> source and destination header. We could get around this and make IPsec
>> AH work also in the hIPv4 framework by first assembling a legacy IPv4
>> packet, then copying the pseudoheader checksum to the IP option field
>> (there is now a 16 bit padding field where the checksum would fit in)
>> and then insert the ALOC information to the header, recalculate the
>> pseudoheader checksum and send the packet to the other endpoint. When
>> LSR is swapping the packet the padding field remains intact and when
>> the remote endpoints receives the packet the original pseudoheader
>> checksum can be retrieved from the padding field. But I think this
>> would be an awful kludge, because it would
>> - break the IPsec AH specs
>> - not solve the NAT traversal issue, and the IPv4 world is full of NAT
>> middleboxes
>> Also, I haven't seen many IPsec AH implementations lately - most IPsec
>> installations are LAN-to-LAN solutions using ESP and remote access
>> system are becoming more and more deployed upon SSL/TLS based RAS
>> So I let darwinism take care of IPsec AH - it is not well suited for
>> the IPv4 world and SSL/TLS has been able to adapt better to the NAT
>> traversal challenge.
>>
>> 3. The identifier
>> Host or session identifier? After studying the MPTCP drafts I found
>> potential in the sender token, it might be used as session identifier
>> to solve site and endpoint mobility issues. But the sender token can
>> not be used to improve NAT traversal, here you would prefer to have
>> HIP in place. On the other hand, if you prefer to do NAT you should
>> not expect to have all features available as when not using NAT - and
>> should we encourage the use of NAT? So I think the the sender token is
>> good enough to create a semi-session layer protocol (as AppleTalk had)
>> that could be used to achieve better mobility -  MPTCP looks promising
>> at the moment. If a host really needs to be identified - well, I would
>> use a PKI solution for that purpose.
>>
>> 4. Traffic Engineering
>> MPTCP might create subflows for a connection, how to route the
>> subflows on different links in the backbone - especially if both
>> endpoints have just one IP-address at each host? IGP tuning will not
>> be useful, MPLS TE might be used but it gets tricky since both
>> subflows uses the same protocol. What if you could apply Valiant
>> Load-Balancing on the subflows and separate the subflows that way over
>> different links in the backbone??
>>
>> Suggestions, questions, feedback and/or criticism is highly appreciated.
>>
>> -- patte
>> _______________________________________________
>> rrg mailing list
>> [email protected]
>> http://www.irtf.org/mailman/listinfo/rrg
>
>
_______________________________________________
rrg mailing list
[email protected]
http://www.irtf.org/mailman/listinfo/rrg

Re: [rrg] Updated draft of hIPv4 framework

Reply via email to