Re: [rrg] RANGER and SEAL critique

Templin, Fred L Wed, 03 Feb 2010 14:31:24 -0800

Robin,

Thanks for giving this a look, and see below:


> -----Original Message-----
> From: Robin Whittle [mailto:[email protected]]
> Sent: Tuesday, February 02, 2010 4:33 AM
> To: RRG
> Cc: Templin, Fred L
> Subject: Re: [rrg] RANGER and SEAL critique
>
> Short version:   Fred provided some further information and I did
>                  my best to understand it.
>
>                  If I understood it approximately correctly, the
>                  RANGER approach is a CES system without ordinary
>                  ITRs or ETRs, and with some interesting design
>                  features.  These include a general lack of
>                  mapping, and lack of any problem with "initial
>                  packet delays".  However, the proposal as I
>                  understand it has serious problems with frequently
>                  very long paths - unless Fred is really
>                  proposing something like my "8 x VP router"
>                  suggestion below, and I just didn't
>                  understand it.
>
>                  I think the RANGER IDs were of no help in
>                  understanding what Fred has in mind.  This
>                  text he wrote below seems unrelated to
>                  the general description of RANGER - which is
>                  applicable to all sorts of things apart from
>                  being a CES solution to the routing scaling
>                  problem.
>
>                  Please see the separate thread: "SEAL critique,
>                  PMTUD, RFC4821 = vapourware" regarding the
>                  SEAL tunneling and PMTUD system, which would
>                  be used in the many tunnels of the system
>                  Fred describes.
>
>                  I will write a critique for the RRG Report once
>                  Fred responds to what follows.
>
> Hi Fred,
>
> I am replying to your response to my discussion of RANGER:
>
>   http://www.ietf.org/mail-archive/web/rrg/current/msg05796.html
>
> I am referring to:
>
>   The RFC-to-be which is linked to from:
>   http://tools.ietf.org/html/draft-templin-ranger-09
>
>   http://tools.ietf.org/html/draft-russert-rangers-01
>   http://tools.ietf.org/html/draft-templin-intarea-seal-08
>
>
> > Here is what I think about RANGER and scalable routing.
> > RANGER expects that the existing state of affairs in the
> > current Internet BGP routing system will persist, but the
> > goal of RANGER is to arrest the growth of the BGP RIB so
> > that it will level off and not continue to expand along
> > super-linear rates. In particular, RANGER expects that
> > the current BGP will continue to maintain the RLOC-based
> > RIB for the Internet, but that future growth due to
> > mobility, multihoming and PI addressing will be handled
> > out of EID space instead of RLOC space.
>
> OK - this is the same as with the other Core-Edge Separation (CES)
> architecture - including LISP, APT, Ivip, TRRP and ILNP.
>
>
> > RANGER asks that a new BGP instance that carries EID
> > prefixes be established within the DFZ, where each
> > participating EID-based BGP router is an ITR/ETR that
> > treats the DFZ as a virtual NBMA link through tunneling.
>
> OK - I hadn't recognised from reading the RANGER IDS that there was
> such a thing.
>
> I will refer to each such router as an "ITR/ETR" and to the network
> of these as the RANGER Overlay Network - RON.  This is a convenience
> for discussion - I think these routers do not really play these roles.
>
> My understanding of the above is that the RON is made of a subset of
> DFZ routers perform both ITR and ETR functions.  (ITR and ETR are the
> the terms used in LISP, Ivip etc. RANGER and SEAL uses "ITE" and
> "ETE" or "iEBR" and "eEBR".)
>
> I understand that while these ITR-ETR routers may also participate in
> the DFZ via the conventional BGP instance, that they also have a
> second BGP instance by which the RON is created.
>
> I understand the RON exists to convey "mapping" information between
> these ITR-ETR routers - and my guess is that it carries traffic
> packets too.  "Mapping" is the term used in other CES architectures
> for the information ITRs need to decide which ETR, or a set of
> multiple ETRs, the packet could be tunneled to.  If all the ITRs and
> ETRs participate in this RON, then I can roughly imagine this second
> BGP system functioning normally to provide routes to ETRs, which is
> comparable to "mapping" in other CES architectures.
>
> However, my interpretation of what you write below makes me think
> that the RON BGP messages don't attempt to carry a route for every
> end-user prefix of "edge" EID space.  There could be millions of
> these - say 10 million for portability, multihoming and TE for
> non-mobile networks.  (Brian Carpenter and I came up with the same
> figure independently.)  I think there is nothing in RANGER to state
> your goals regarding how many of these prefixes you want the system
> to support - to this 10 million figure is my assumption.
>
> I can imagine two ways by which these ITR-ETR routers may be linked
> for the purpose of transferring BGP messages over TCP links, between
> the 2nd BGP instances of these routers:
>
>   1 - The ITR-ETR routers use direct physical links between
>       themselves for the RON sessions where such links exist - and
>       tunnels to one or more other ITR-ETR routers if such a router
>       does not have such direct links.
>
>       See below - it is purely by tunnels, for BGP and I think
>       for traffic packets.
>
>   2 - All DFZ routers have the second instance, so the RON is a
>       second BGP control plane for all DFZ routers.
>
>       See below, this is not what you later describe.
>
>
> I assumed this RON set of DFZ routers are the BRs of ISPs - but you
> later say this need not be the case: these ITR-ETR routers may be BRs
> of ISPs, but need not be.
>
>
> To understand "NBMA" I am referring to:
>
>   http://tools.ietf.org/html/draft-templin-ranger-09#section-3.3
>
>     3.3. Virtual Enterprise Traversal (VET)
>
>       Within the enterprise-within-enterprise framework outlined
>       in Section 3.2, the RANGER architecture is based on overlay
>       networks manifested through Virtual Enterprise Traversal
>       (VET) [I-D.templin-intarea-vet] [RFC5214].  The VET approach
>       uses automatic IP-in-IP tunneling in which ITEs encapsulate
>       EID-based inner IP packets within RLOC-based outer IP
>       headers for transmission across the commons to ETEs.
>
>       For each enterprise they connect to, EBRs that use VET
>       configure a Non-Broadcast, Multiple Access (NBMA) interface
>       known as a "VET interface" that sees all other EBRs within
>       the enterprise as potential single-hop neighbors from the
>       perspective of the inner IP protocol.  This means that for
>       many enterprise scenarios standard neighbor discovery
>       mechanisms (e.g., router advertisements, redirects, etc.)
>       can be used between EBR pairs.  This gives rise to a
>       data-driven model in which neighbor relationships are
>       formed based on traffic demand in the data plane, which in
>       many cases can relax the requirement for dynamic routing
>       exchanges across the overlay in the control plane.
>
> IPv6 over NBMA (Non-Broadcast Multiple Access) networks is described
> in RFC2491, which I had quick look at.  There is new text in this VET
> ID of 26 January which I think is relevant to the RON you are describing:
>
>   http://tools.ietf.org/html/draft-templin-intarea-vet-08#section-6.1
>
>       Routing protocol participation on non-multicast VET
>       interfaces uses the NBMA interface model, e.g., in the
>       same manner as for OSPF over NBMA interfaces [RFC5340],
>       while routing protocol participation on multicast-
>       capable VET interfaces uses the standard multicast
>       interface model.  EBRs on VET interfaces use the list
>       of EBGs in the PRL (see: Section 5.2.2) as an initial list of
>       neighbors for inter-enterprise routing protocol participation.
>
> This Potential Router List (PRL) lists, for the current network (I
> assume an ISP network) all the Enterprise Border Routers.  I guess
> the ITR-ETR routers are all of this type - so I understand this is
> how the BRs in an ISP find out about each other, at least for the
> purposes of linking to each other to establish BGP links, with the
> 2nd BGP instance, to form this ISP's section of the RON.
>
> However, you later state that these ITR-ETR routers need not be DFZ
> routers.  I assume the EBGs are all DFZ routers.  If so, then how
> would the PRL, which only contains the DFZ BRs, include those ITR-ETR
> routers which are not BRs?

Each enterprise network within a RANGER recursive hierarchy
has its own PRL, which is a list of ETRs that can advertise
"default" (called enterprise border gateways by VET). These
ETRs can be used to forward packets out of the enterprise.

In the RON, as you call it, none of the participating
routers advertise "default", but each of the participating
routers is considered a PRL router for the RON since each
will contain a full RIB of all VPs advertised in the RON.
So, a RON router can set up BGP peerings with other RON
routers by static configuration of RON router addresses,
by resolving a FQDN to get back a list of nearby RON routers,
by issuing a multicast "shout out" on the underlying network
to see if there are any other RON routers nearby, etc.

Let me know if you have a different viewpoint on how PRL
should be interpreted.

>       EBRs that connect enterprises to the global Internet DFZ
>       configure EID-based inter-enterprise routing using the BGP
>
> This is the RON system - the routing system by which RANGER's ITR
> functions in the ITR-ETR routers tunnel packets to ETR functions of
> such routers, and by which the ITR-ETR routers share their routing
> information.
>
> "Enterprise" in this context means an ISP.  "Inter-enterprise" means
> between all ISPs.

Not just ISPs, but also corporate enterprises, academic campuses,
major government agencies, etc. etc. If you want to categorize
them all as ISPs that is fine, but I choose to categorize them
as enterprises.

>       [RFC4271] over a VET interface that spans the entire DFZ.
>
> I don't have a clear understanding of the above.  In my understanding
> of the term, an "interface" can't span the entire DFZ.

Sorry about that; I should have said "VET link". A VET
interface is a VET node's point of attachment to a VET
link, but the VET link that manifests the RON spans the
entire DFZ.

>       Each such EBR peers with a set of neighboring routers on the
>       VET interface, where the set is determined through peering
>       arrangements the same as for the current global BGP.
>
> This makes me think that all an ISPs' DFZ routers must be ITR-ETRs in
> the RON - but not, perhaps the DFZ routers of transit providers,
> since these are not ISP BRs.

The RON routers have to connect the ISP/Enterprise to the
RON. But, they need not be the same routers that connect
the ISP/Enterprise to the existing IPv4 Internet DFZ.

The RON routers need to have a global IPv4 address on an
interface that a VET interface (for connecting to the RON)
can be configured over. But, they need not participate in
global IPv4 BGP routing.

> But rather than use the physical links between the routers, as is
> used for the DFZ traffic packets and BGP conversations, I understand
> that these ITR-ETR routers somehow configure tunnels between each
> other (where the packets go over the physical links anyway).

The tunnels are manifested by VET, and may traverse many
underlying physical links.

> I conceive of these ITR-ETR routers as being physically implemented
> in the same hardware, same route processor etc. as the Cisco, Juniper
> or whatever DFZ routers - but that the "ITR-ETR router" behaves as a
> separate entity.  Its connections to other ITR-ETR routers are all
> via tunnels.  However, since they act as ITRs, they must be able to
> advertise prefixes to the real routing system in the same physical
> router.  Also, their ETR function must somehow connect to edge
> networks.

That is certainly one way to stand up a RON router, but
I do not thing the RON routers need to be on the same
physical platform as the DFZ routers. They can be built
up as a separate box instead that connects to the
ISP/Enterprise network on the "inside" and connects to
the public IPv4 Internet (and the RON) on the "outside".
                                                             Note
>       however that this EID-based overlay BGP instance is seperate
>       and distinct from the current RLOC-based BGP instance; \-- typo
>       therefore, the set of peers used for the EID-based and
>       RLOC-based instances need not be the same.
>
> OK - so the previously quoted paragraph indicates they are the same
> set of routers, "as determined through peering arrangements" and this
> sentence indicates that the connections between them need not follow
> the pattern of the peers in the DFZ system.
>
>                                                             /-- typo
>       Each EBR connected to the VET interface spanning the gobal
>
>       Internet DFZ maintains a full routing information base (RIB)
>       of EID-based prefixes.  In order to limit scaling, only
>
> Limit "scaling difficulties"?

Limit scaling to a manageable order-of-magnitude number of
VP entries in the RIB.

>       highly-aggregated EID prefixes allocated according to the
>       Virtual Prefix (VP) principles of Virtual Aggregation (VA)
>       [I-D.ietf-grow-va] are included in the RIB.
>
> I understand that each ITR-ETR's RIB and FIB has prefixes which cover
> all the "edge" space.  However, there is not a separate prefix for
> each of the individual end-user "edge" EID prefixes, of which there
> are up to 10 million or so.

The RIB contains only highly-aggregated VPs, but the FIBs
of specific RON routers will contain separate prefixes
for each of the individual end-user "edge" EID prefixes.

Let's say that a RON router A configures a VP "BAA::/16".
A advertises BAA::/16 in the RON BGP instance and discovers
all other VPs used in the RON to build up its RIB. But, A
holds the more-specific edge EID prefixes corresponding to
customer allocations in its FIB. So, in A's RIB will be
only VP prefixes, and A's FIB will hold more-specifics
taken from BAA::/16, e.g.:

  BAA:0:0:0000::/56
  BAA:0:0:0100::/56
  BAA:0:0:0200::/56
  ...

So, if A's FIB gets to be too big because of lots of
allocations to customers, A's enterprise can simply
split its VP into smaller pieces (e.g., BAA::/17,
BAA:8000::/17), and use multiple RON routers to spread
the load.

> ietf-grow-va-01 is intended to be used within an AS and only affects
> the FIB of the VA routers.  Although I haven't read the whole thing,
> I don't see how you can apply this ID to guide people on doing
> something totally different - reducing the contents of the RIB.

Well, the goal is to reduce the contents of the RIB for
all RON routers and require RON routers to carry FIB
entries only for those more-specific prefixes that it
cares about. So, if RON router A holds the VP BAA::/16,
then it must consider any active more-specifics taken
from that prefix in its FIB (e.g., BAA::/56). But, if
RON router B is not sending any traffic to hosts that
configure addresses from BAA::/56 then B does not have
to hold that prefix in its FIB.

> I had to read ahead and return, rewriting my interpretation to figure
> out, as best I can, what you are describing here.  I found this part
> particularly hard to follow:
>
>       Specifically, only VP prefixes (e.g., PA prefixes delegated to
>       the top-level of an ISP or enterprise network) are maintained
>       in the RIB while more-specific prefixes (e.g., PI prefixes
>       delegated to small sites) are not.  More-specific prefixes will
>       instead be inserted into selective forwarding information bases
>       (FIBs) on-demand of traffic flow such that only those routers
>       that require the prefixes will insert them into their FIBs.
>
> My best guess is that you mean that the ITR-ETR routers' RIBs contain
> routes for ISP's prefixes and those of large PI using end-user
> networks which are not using the scalable "edge" space of the RANGER
> Core-Edge Separation architecture.  I think this is so that each
> ITR-ETR's FIB will forward packets addressed to any of these
> prefixes.  But it must do this via the DFZ - so perhaps I am wrong to
> think of these ITR-ETR routers being separate from the underlying DFZ
> router.

RON routers forward packets by discovering the global
IPv4 addresses of other RON routers that are owners of
the EID VPs that cover the EID destination addresses
of packet. These global IPv4 addresses are nothing
more than RLOCs per the LISP/Ivip/RANGER nomenclature.
>
> I think if you wrote it up in more detail, with an example, it would
> be helpful.
>
> I don't understand how it can be practical to have a router with a
> partially populated FIB, awaiting packets, and when a packet arrives
> with a destination address which does not math a prefix in the FIB,
> the packet is held and by some magic the FIB causes the RIB to emit
> the precise information needed to alter the FIB so it then has the
> correct packet classification information for packets with this
> destination address, and any other address which match whatever
> prefix the RIB has which covers this address.  Then the FIB would be
> able to forward the packet.

I think this helps me to spot out an issue in my
thinking. The VP's carried in the RON RIB also need
to be populated in each RON router's FIB. Otherwise,
if there was no more-specific route there would be no
matching route in the FIB and the packet would be
dropped. I do not want to require any special magic
here, so I will fix this.

> Having RIBs write stuff into FIBs may be costly.  These packets could
> be arriving rapidly, so the FIB would need to buffer many of them
> while the RIB is responding.  My biggest concern, apart from the
> obvious problem of interupting the RIB according to traffic coming
> into the FIB (and many routers have an FIB for each interface) is
> that the RIB can't necessarily respond quickly or efficiently to a
> request to find the most specific matching prefix for a given IP
> address.  This is what the FIB is supposed to do.
>
>
> > Each participating EID-BGP router will set up peering
> > arrangements with a limited set of neighbors using
> > tunnels according to the NBMA link model.
>
> These tunnels are presumably to function like physical links - to
> carry both traffic packets being forwarded from one ITR-ETR router to
> the next, but also to carry the TCP session for bidirectional BGP
> communications.

Yes.

> So this RON system of ITR-ETR routers is linked entirely by tunnels -
> with a BR of one ISP having tunnels to one or more BRs of other ISPs
> - presumably, usually, not too far away.

Correct.

> > There is no
> > requirement that these EID-BGP routers also participate
> > in the current RLOC-based BGP routing instance, so the
> > EID-BGP routers can be deployed incrementally and
> > without disturbing the existing RLOC-BGP routing system.
>
> So how can these ITR-ETR routers (which you refer to as EID-BGP
> routers) are implemented on DFZ routers but as separate entities?

They don't have to be implemented on existing RLOC-BGP
(aka DFZ) routers - they can be stood up as separate
boxes w/o disturbing the existing deployment.

> > This new EID-BGP instance will be used for carrying a
> > relatively small number of highly-aggregated EID
> > prefixes in keeping with the principles of Virtual
> > Aggregation (VA).
>
> You cite:
>
>   http://tools.ietf.org/html/draft-ietf-grow-va-01
>
> which explicitly does no alter BGP or RIB operations, but only puts a
> subset of routes from the RIB into the FIB, but what you are doing is
> very different.  In the above paragraph you are reducing the number
> of routes in the RIB and the BGP communications.   It is also a
> contradiction with your previous statement that the FIB only has a
> subset of the prefixes, but can get them quickly installed from the
> RIB if a packet arrives which needs a prefix not currently in the FIB.

What I am suggesting reduces both the RIB and FIB sizes.
Maybe I didn't take enough time to fully comprehend VA
and should just drop the VA terminology for the purpose
of what I'm trying to describe here?

> > So, this new EID-BGP instance (which
> > again is completely separate from the existing RLOC-BGP
> > instance)
>
> Here you wrote "is completely separate" but above you wrote "There is
> no requirement that these EID-BGP routers also participate in the
> current RLOC-based BGP routing instance."  It would be better to
> state first that they are always separate

OK.

>             will carry only highly-aggregated Virtual
> > Prefixes (VPs) such as 4000::/8, 4100::/8, etc. So, at
> > most there will be perhaps a few thousand of these VPs
> > in the EID-BGP RIB (or perhaps even a few 10's or 100's
> > of thousands) but the RIB size will be kept manageable
> > through VA.
>
> You describe a VA approach to the RIB, which I do not understand - at
> least in terms of the reference you provide, which does involve
> changes to the RIB.

Well, I can drop the VA comparisons if it would help.

> > Now, RANGER has the EID-based VPs populated throughout
> > the EID-BGP RIB, with all of the EID-BGP routers
> > connecting service provider (SP) networks via the virtual
> > NBMA link configured over the core.
>
> I don't clearly understand what the "virtual NBMA link configured
> over the core" means.
>
> RANGER is full of broad, high-level, statements about how things are
> built, but each of these things typically has multiple options and I
> find it very hard to construct a mental model of an physical
> arrangement which would do what I think you are trying to do.
>
> On one hand, you have millions of separate "EID prefixes" - the space
> advertised by end-user networks using the "edge" subset of the
> address space. These advertisements come from the ETR part of one or
> more ITR-ETR routers.
>
> These ITR-ETR routers don't use the DFZ routers directly, but each
> ITR-ETR router has a tunnel (which typically passes over multiple DFZ
> routers) to multiple other ITR-ETR routers in other ISPs.  Maybe
> these ISPs are over the other side of the world, but I guess most of
> them are not too far away.  There's certainly not a full mesh between
> all these ITR-ETR routers.

Correct - not a full mesh the same as for the existing
RLOC-BGP peering arrangements. Hopefully each RON router
will peer with a limited set of other RON routers that
are close by.

> > Customer Edge (CE)
> > routers within the SP networks will want to use EID-based
> > PI prefixes. Each such CE router "registers" its EID PI
> > prefixes both within the SP network and with the EID-BGP
> > routers
>
> I am calling these "EID-BGP" routers "ITR-ETR routers".
>
>            that own the VP from which the PI prefix is
> > aggregated.
>
> OK.  So there is an ITR-ETR router in Seattle which aggregates
> 42.0.0.0/16.  (I am using IPv4 addresses for brevity, though I
> understand RANGER is mainly intended for IPv6.)  This is 65,536 IPv4
> addresses of "edge" space.  Lets say this covers ~10,000 EID
> prefixes, one of which you refer to as "the CE's PI prefix".
>
> This means that ~ 10,000 end-user networks which have space
> within this prefix need to register the "core" (RLOC) address of
> their CE routers (and the EID prefix they are responsible for) with
> this Seattle router.  (I will pass over questions of redundancy in
> the case of the Seattle router failing.)
>
> > Once "registered", the CE's PI prefix will
> > be kept only in selected router FIBs, and will not be
> > injected into the EID-BGP RIB.
>
> Which are these "selected" routers?

Both the RON router(s) that "own" the 42/16 prefix and any
RON router that is actively sending traffic to destinations
covered by the 42/16 prefix. But, no other RON routers.

> I understand that the Seattle router now knows the "core" (RLOC)
> addresses of the CE routers of 10,000 or so end-user networks all
> over the world.  These are to be the tunnel end-points for the
> delivery of traffic packets - the equivalent of ETR addresses.
>
> I understand that the Seattle router does not contain entries in its
> RIB for any of these 10,000 or so EID prefixes.  There is presumably
> only its own 42.0.0.0 /16 prefix, which it advertises through the RON
> to all other ITR-ETR routers.

Yes.

> >                                Moreover, only the FIBs
> > of those routers on the paths over which the CE's EID
> > addressed packets will travel need to contain the PI
> > prefix - no other routers need discover the prefix.
>
> I don't understand.  Which paths are you discussing.  There really
> needs to be more complete descriptions, and probably some examples.

OK.

> It is taking me many hours to try to understand what you wrote.

Sorry for that, but thanks for taking the time.

> I assume the "PI prefix" you are referring to is one of the 10,000 or
> so such "EID prefixes" which the Seattle router is responsible for.

Yes.

> > The
> > location of the CE router's EID prefix is tracked through
> > the FIB entries in the EID-BGP router that holds the VP
> > from which the EID prefix is derived.
>
> My attempt at translation:
>
>    "Location" means the "core" (RLOC) address of the CE router which
>    handles a given end-user network with a given "edge" (EID) prefix.

Actually, it means the RLOC address of the RON router(s)
that connect the ISPs/Enterprises in which the CE routers
reside.

>    These CE router "core" addresses for all the ~10,000 end-user
>    prefixes contained within the 42.0.0.0 /16 prefix which the
>    Seattle router advertises in the RON are recorded in the FIB
>    of the Seattle router.
>
> So if there are 10,000 separate EIDs of end-user networks within this
> /16 prefix, then:
>
>     1 - The Seattle router has all these ~10,000 "edge" (EID)
>         prefixes in its  FIB, and for each one there is is an address
>         of the CE router for this prefix.

No; for each FIB entry there is an RLOC address of another
RON router that sits in front of the CE router.

>     2 - The CE routers of these end-user networks could be in Taiwan,
>         South Africa, Malaysia, Australia, Siberia and the South
>         Island of New Zealand.  It would be convenient for the
>         end-user network if it was located somewhere near the
>         Seattle router, but in general, these CE routers are nowhere
>         near the Seattle router.

But, it doesn't matter where the CE routers are, and they
can travel around the world if they choose to do so. As
long as they continue to tell the Seattle router where they
have moved to by telling the Seattle router about which
RON router(s) they are now in proximity of, the Seattle
router can know the next hop.

>         This is because EID space is portable all over the world -
>         and also, because at any point in time, the role of being
>         the VA router for this 42.0.0.0 prefix could be given to
>         a router somewhere far from Seattle.

That is true I suppose, but I hadn't really thought about
the portability of VP prefixes like 42/16.

>     3 - The hosts which are sending packets to these end-user
>         networks could be all over the world, quite often in the
>         same area as the end-user networks whose hosts the
>         packets are addressed to.
>
>     4 - The Seattle router advertises this /16 prefix to the RON
>         system of ITR-ETR routers.  I guess the ITR function in
>         each of these ITR-ETR routers will now be able to tunnel
>         packets addressed to any one of these ~10,000 end-user
>         network "edge" EID prefixes to the Seattle router.

Right.

>         This tunneling, as far as I know, is through the tunnels
>         of the RON system, since this is how the ITR-ETR routers
>         use BGP to manage their best paths for each such prefix.

Correct.

>         These ITR-ETR routers tunnel all packets addressed to any
>         one of the ~10,000 "edge" (EID) prefixes because of the
>         42.0.0.0 /16 in their RIB and FIB.  They have nothing in
>         their RIB or FIB about the 10,000 individual prefixes.
>         Only the Seattle router's FIB has this.

Correct.

>         This means that none of these ITR-ETR routers need the
>         full set of millions, or tens of millions, of these "edge"
>         EID prefixes either in their RIB or FIB.
>
>         It also means there is no caching, no mapping lookup and
>         no delays waiting for mapping.  Interesting . . .

Correct.

>     5 - So a packet sent from an ISP in any location - including
>         those listed above, and from others such as London,
>         the North Island of New Zealand, Chile etc. will be handled
>         like this:
>
>          a - A host in the North Island of New Zealand sends a packet
>              addressed to 42.0.56.78 which is in an "edge" EID of
>              an end-user network of a company located in the
>              town of Fox Glacier, not far from the said Glacier,
>              in the South Island.  (A magnificent part of the world!)
>
>          b - This packet is from a customer of an ISP in Auckland
>              (North Island).  In that ISP, the packet is forwarded
>              to an ITR-ETR router.  (I assume all these ITR-ETR
>              routers advertise the "edge" (EID) prefixes such as
>              42.0.0.0 /16 to their local routing system, though
>              I don't see where you specified this.

I was actually thinking that the RON routers would only
advertise "default" into the local routing system, but
they could just as well advertise 42/16 if they wanted to.

>          c - This ITR-ETR router has in its FIB a route for
>              42.0.0.0 /16 which forwards the packet towards the
>              Seattle router.
>
>              As best I can tell, this is forwarding from the ITR-ETR
>              router means tunneling it to a neighbouring ITR-ETR
>              router.  Each such router has a single (multiple?)
>              "core" (RLOC) address - and the tunnel packets are
>              sent and received from these.  After encapsulation,
>              the packet goes to the FIB of the DFZ router which
>              is in the same device, or to a nearby DFZ router,
>              which forwards it towards the tunnel destination
>              address, via 0 or more DFZ routers.
>
>              When it gets to that second ITR-ETR router, the
>              packet is taken out of the tunnel and its destination
>              address 42.0.56.78 examined by the second router's
>              FIB.  This causes the packet to be tunneled again
>              to another ITR-ETR.  Eventually, it makes its way
>              across the RON to the Seattle ITR-ETR.
>
>          d - The Seattle router detunnels the packet and presents
>              it to its FIB.  This FIB is the only one in the
>              world with an entry for the particular "edge" (EID)
>              prefix of the network which contains the destination
>              host: 43.0.56.76 /30  This is associated with a
>              "core" (RLOC) address of the CE router which serves
>              this end-user network.
>
>              The CE router is in the office of a tour company in
>              the Fox Glacier township, on a single IPv4 PA address
>              33.22.22.33 which is a stable IP address of a DSL
>              service.
>
>          e - The ETR function of the Seattle router encapsulates
>              the packet with an outer header destination address
>              of 33.22.22.33 and presents the resulting packet to
>              its FIB.
>
>          f - The FIB has a prefix matching 33.22.22.33 - since
>              this is part of a big (short) prefix 33.22.180.0 /17
>              of an ISP on the South Island.
>
>          g - Does the Seattle router's FIB forward this packet
>              to a neighbour on the RON, and therefore via another
>              tunnel?  Or does it forward the packet according
>              to the FIB of the DFZ router - and so out into the
>              DFZ, without any further tunneling?

In my way of thinking, there is a RON router A that connects
the ISP in which the host on the North Island of New Zeland
resides to the RON. A will consult its FIB (not RIB) to
find that 42/16 points to a tunnel to the Seattle RON
router B, and then tunnels the packet over the RON to B.
The Seattle router (B) now determines that it has a prefix
43.0.56.76 /30 in its FIB that points to a RON router C
that connects the ISP for destination address, and then
tunnels the packet over the RON to C.

>              Either way, the Seattle router has a way of getting
>              the traffic packet to the CE router in the office
>              of the tour company, and the CE router decapsulates
>              it and puts it on the LAN, which takes it to the
>              destination host.
>
>     6 - There are arguments for increasing the number of these VP
>         ITR-ETR routers (such as the one in Seattle) - in order to
>         reduce the number of packets each one has to handle.

That would be fine for me.

>     7 - There are arguments for decreasing the number of these VP
>         routers, to reduce the load on the RON control plane -
>         since each one advertises a prefix in the RON BGP
>         system.
>
>         You could however achieve the same goal by putting three
>         other ITR-ETR VP routers in the same data centre, for
>         adjacent prefixes,  and aggregating them in some way into
>         a single shorter prefix there.  Then, from Seattle, you could
>         advertise a single single 42.0.0.0 /14.

That would be fine for me, too.

> It is possible that I have partially or completely misunderstood you.
>  I believe you need to document these things much more extensively,
> ideally with diagrams and definitely with examples.

No, I think you have seen things clearly for the most part.

> Based on the above understanding, here are some observations.
>
> Let's say there are 10 million end-user networks and most of
> them have a single edge (EID) prefix.  So let's say there are 12
> million of these prefixes.
>
> Let's say there are a billion IPv4 addresses in the "edge" subset
> of the global unicast address space.  So the average size of these
> prefixes is around 80 bytes.  However, most of them are 1, 2, 4
> or 8 IPv4 addresses.
>
> Let's say these are contained in 2^14 separately advertised prefixes
> in BGP.  These may be of various sizes.  So this is a 16k prefix
> burden on the DFZ - not much of a problem.  (In Ivip, each such
> prefix would be a MAB - Mapped Address Block.)
>
> On average, these prefixes are /16 - and so contain 2^16 IP
> addresses.  With an average of 80 IPv4 addresses per end-user network
>  "edge" (EID) prefix, on average, each such MAB prefix provides 820
> "edge" (EID) end-user prefixes.  This is pretty good routing
> scalability - 820 for 1 DFZ advertised prefix.
>
> On average, each VP router such as the Seattle one mentioned above
> handles a /18.  A /18 has 16,384 IPv4 addresses, and so on average
> each one is used by about 205 end-user network prefixes.  If we
> assume that each end-user prefix is to be tunneled to a different CE
> router, then the average router such as the one in Seattle has 205 of
> these special CE tunneling entries in its FIB.
>
>
> There are several problems with the arrangement as I described above.
>
> Firstly, there's a lot of tunneling as the packet makes its way
> across the RON from what I called the ITR function of the ITR-ETR
> routers to the ETR function of the Seattle ITR-ETR router.
>
> In fact, ITR and ETR are not accurate terms.  I used them to give
> something familiar which relates to other CES architectures.
>
> These things are just routers.  It so happens that the multiple hops
> between the RON router in Auckland and the one in Seattle each
> involved a tunnel.  But there was no tunnel between the Auckland
> router and the Seattle router.

Actually, I am expecting a single tunnel between the
Auckland router and the Seattle router - a single IP hop.

> Arguably the Seattle router is really playing the ITR role - and it
> tunnels to the CE router which arguably plays the ETR role.  In this
> model, this is very roughly like Ivip where each MAB has only a
> single DITR in the world, and no ISPs or any other networks have ITRs.
>
> The most obvious problems are:
>
>   1 - The dependence of 205 or so end-user networks on a single
>       router (such as in Seattle) creates a potential bottleneck.
>
>   2 - Also, it creates a single point of failure.

I liked that you enumerated a number of alternatives
for distributing the RON router function above. Wouldn't
they help spread the load and/or remove single points
of failure?

>   3 - Considering the random distribution of sending hosts and
>       destination hosts, this arrangement frequently leads to
>       excessively long paths, back and forth across the world.

This is where the RANGER route optimization comes in.
When RON router A sends a packet to RON router B that
holds the 42/16 prefix, B both forwards the packet to
RON router C and sends a Redirect message back to A.
Thereafter, A sends packets directly to C instead of
going through B (with appropriate locator liveness
checks to make sure that C is reachable).

> Still, it is an interesting arrangement.  The router which forwards
> the packets to the Seattle router does so without any delay, mapping,
> caching or the like.

OK.

> The Seattle router has only 205 or so entries in its FIB, so it has
> no scaling problems in terms of FIB size.
>
> You haven't specified how CE routers can securely register their
> particular "edge" prefix with routers such as the one in Seattle.

That is spelled out in detail in VET.

> However, assuming they can do this, then multihoming could be done by
> the end-user network having two ISPs, and therefore having CE routers
> (or really the one CE router) appearing on two separate "core" (RLOC)
> addresses.  Then, to do multihoming failure detection and service
> restoration you could take one of several approaches:
>
>   1 - The end-user network itself senses the failure of its use of
>       the "core" address at ISP1, and somehow uses its other ISP2
>       link to securely re-register with the Seattle router the
>       ISP2 "core" address instead.
>
>   2 - The Seattle router is told by the end-user network about both
>       its addresses, the one from ISP1 and the one from ISP2.  The
>       Seattle router is then instructed to do reachability testing,
>       of these addresses, or perhaps through these addresses to
>       something in the end-user network itself.  Then the Seattle
>       router would choose which link to use - it would do the
>       multihoming service restoration.

This is the closest to what I had in mind.

>   3 - An Ivip-like approach where the end-user network could tell
>       the Seattle router whether to tunnel packets to the ISP1 or
>       ISP2 address - but instead hires a Multihoming Monitoring
>       company to do reachability testing and to control the
>       Seattle router's tunneling accordingly.
>
> All three approaches could be very fast.
>
> To me, the most obvious enhancement of this arrangement would be to
> create multiple routers such as the Seattle one.  Currently there is
> only one of these VP routers for these 205 or whatever end-user networks.
>
> If we had 8 VP routers, each just like the Seattle one, then this
> would spread the load.  If you scattered these around the Net, the
> RON's natural BGP behaviour would be to spread the load and to
> forward packets to the nearest one.  This would generally lead to a
> big reduction in total path length.

Sounds good.

> But then, each VP router would be sending a tunnel to the CE router -
> so it would need to handle 8 tunnels.  Also, when the CE router moves
> to a different "core" address, there are 8 VP routers to securely
> register the new address with.
>
> This starts to introduce the concept of "mapping" information into
> the system, whereas before, there was no such thing.
>
> If I have understood this correctly, then what you are suggesting is
> a novel CES architecture without a mapping system, and without
> delays.  If I misunderstood you, then I just partially invented such
> a thing myself!

You are correctly understanding what I was meaning to convey.

> However, I think it still has problems with excessively long paths,
> with concentration of the workload on too few routers - and I think
> forwarding traffic packets across the RON network, with each hop
> involving an encapsulation and decapsulation, with PMTUD management
> for each tunnel . . . I think it is not a good way to solve the
> routing scaling problem.

The piece you may be missing is the route optimization
piece. Again, the holders of VPs like 42/16 (or BAA::/16)
should in the normal case only see a few initial packets.
Thereafter, redirection will result in route optimization
and therefore reduce the load on the VP holders.

> By doing most or all of the work with DFZ routers, there is a
> potential critique that you are not really taking much load from
> them.

Well, if packets have to go over the core then we
are left with no choice but to handle them with the
RON routers. But with the RANGER recursive hierarchy,
we prefer to handle packets out toward the edges
without involving the core whenever possible. The
idea of RANGER is to keep as much of the churn as
possible pushed out to the outermost levels of
the recursive hierarchy and reduce the load for the
innermost levels of the recursive hierarchy.

> However, while you have a second BGP instance, the total
> number of routes the original and new RIB handles in each DFZ router
> is far smaller than the 10 million or similar end-user networks you
> are serving.

Right.

> As end-user networks change where the "core" address of their CE
> router, this does not at all affect the DFZ BGP control plane or the
> new RON BGP control plane.

Right.

> Mobility could be done by the MN re-registering each new CoA with the
> VP router.  If the VP router makes the tunnel, then this won't work
> with the MN behind NAT.  If you borrow a little from the TTR Mobility
> architecture, you would have the MN tunnel to the VP and authenticate
> itself.  Then the VP can take the MN's egress packets.  Most
> importantly, this enables the MN to operate behind NAT.  In this
> case, the VP is rather like a Home Agent.
>
> If you took up my suggestion of 8 VPs, instead of one, you could have
> a rather interesting mobility system, with typically much shorter
> paths due to the 8 VPs.  However, then the MN would need to establish
> 8 tunnels to the 8 VPs.
>
>
> While what you described is ostensibly RANGER - I think there is not
> much in the RANGER ID which is relevant to the CES architecture you
> described in your email.  RANGER can be used for many more things
> than a CES architecture.

But, the same organizational and operational principles
apply at all levels of the RANGER hierarchy.

> I think that to progress your proposal, you should write a completely
> fresh documentation of it, specifically for a CES architecture for
> scalable routing.  I suggest you list goals and non-goals, including
> for how many non-mobile end-user networks you expect the system to
> scale to.   There needs to be diagrams and examples.
>
> It is not obvious how you do load-sharing inbound TE with this
> system, except with the Ivip approach of splitting the traffic over
> two micronets and mapping one micronet to one ETR and the other
> micronet to the other ETR.
>
>
>
> > In summary, the RANGER approach to scalable routing is
> > to create a new BGP instance between tunnel routers for
> > the purpose of keeping a limited set of highly-aggregated
> > EID VPs in the RIB.
>
> OK.
>
> > PI EID prefixes owned by customer
> > routers are added to selected SP router FIB tables on
> > demand, and are never injected into the RIB.
>
> Hmm - this makes no sense to me based on the (mis)understanding
> I just developed.  What do you mean by "selected SP routers" and what
> sort of "demand" is this?

I mean selected other RON routers. "On-demand" means that
the FIB entries get propagated by RON router A sending a
packet to VP-holding RON router B which sends back a
redirect telling that a more direct path can be taken
through RON router C.

> > The way
> > this works is that CE routers that are holders of PI EID
> > prefixes
>
> OK . . .
>
>           "blow bubbles" that percolate up through a reverse
> > tree ascending through their SP networks until the bubbles
> > reach an EID-BGP router that owns a VP from which the PI
> > prefix is derived.
>
> I have absolutely no idea what this means or how it relates to the
> rest of your email or to anything I read in the RANGER IDs.  If this
> is the case, you really need to explain things much better - because
> I tried very hard to understand what you are proposing and it looks
> like I missed out on at last part of it.

This is well documented in VET, which RANGER points to.
I'm sorry for the fragmentation, but that's just the way
that it unfolded. I will try to clarify the RANGER text
so as to cause less astonishment.

> > In that way, the locations of all PI
> > EID prefix holders are available in EID-BGP router FIBs
> > while only VPs appear in the EID-BGP RIB. This system
> > of knowing where all PI prefix holders are at all times
> > also has clear beneficial properties for supporting
> > mobility and multihoming.
> >
> > Finally, in terms of routing scaling, the end state
> > benefit is that both the EID-BGP and RLOC-BGP RIBs
> > remain manageable in size and only those routers that
> > need to know about certain PI EID prefixes have to
> > carry those prefixes in their FIBs.
> >
> > Any thoughts or comments on this?
>
> Please write up a complete, standalone, documentation of this to save
> people from having to read the RANGER or SEAL IDs - and have
> diagrams, examples and much more detailed descriptions of all the
> network elements.
>
> Please let me know how close I got to understanding what you have in
> mind.

The only pieces you missed were the secure redirection
(for route optimization) and the secure prefix
registration (the piece I labeled as "astonishing"
above). Everything else closely matches what I was
trying to convey.

Thanks - Fred
[email protected]

>   - Robin

_______________________________________________
rrg mailing list
[email protected]
http://www.irtf.org/mailman/listinfo/rrg

Re: [rrg] RANGER and SEAL critique

Reply via email to