Re: [rrg] IRON-RANGER scalability and support for packets from non-upgradednetworks

Robin Whittle Thu, 18 Mar 2010 18:38:35 -0700

Hi Fred,

I will try to use for your simpler arrangement for naming the IRON
routers, but I still find it easiest to think and write in terms of
some roles I mentioned, which have no exact name in your arrangement.


When I referred to "BR" I mean a router in an AS which also connects
to other ASes - so I was referring to its status in the interdomain
routing system, not whether it was a "Border" router from the
perspective of the I-R overlay network.  You use "BR" to refer to the
router's status within the I-R network.

I only spent a few minutes looking at VRRP - and without trying to
understand how it uses multicast, I thought maybe multicast of any
kind would be difficult or impossible on the I-R overlay network.  I
wasn't suggesting this was definitely the case.

You wrote, in part:

> Consider that all IRON
> routers are IRON border routers (IBRs), in that they
> connect zero or more EID-based enterprise networks to
> the IRON. Each IBR:
> 
>   - participates in the IRON overlay routing protocol
>   - advertises zero or more VPs into the routing protocol
>   - connects zero or more EID-based enterprise networks to
>     the IRON
>   - may or may not connect the IRON to the DFZ
> 
> I choose to view this latter category as "gateways" from
> the IRON to the DFZ, so I will call these as IBGs. So,
> we now simply have only IBRs and IBGs.

OK.  But those which advertise VPs have many extra responsibilities,
so I think it is important to think of something like a VP Router, or
the VP role or something as a distinct entity which some IBRs or IBGs
have and which the rest don't.


> What I am calling "Border Router (BR)" is any router that
> can be used for getting off the IRON and onto either an EID-
> based enterprise network or onto the DFZ.

"getting off the IRON" implies to me that packets sent along the I-R
overlay would be handled by this IRON router and forwarded to the
EID-based enterprise network (what I referred to as and "EID-using
end-user network") or towards some arbitrary host on the Internet
(IPv4 and/or IPv6?) via forwarding to other routers (the DFZ).

I don't understand this because the I-R overlay network doesn't
handle traffic packets.  It only handles BGP best path communications
between the IRON routers so each IRON router can discover the IP
address (the Internet address and I-R overlay addresses are the same)
of routers which advertise in the I-R overlay some particular prefix.
 The only IRON routers in the I-R overlay which advertise anything
into the overlay are those which advertise a VP.  If two or more
advertise a VP, then with BGP, every IRON router will get a best path
to one of these VP routers, and so find out the IP address of one of
them.

No traffic packets are sent over the I-R network.  When an IRON
router performing the IBG or IBR role tunnels a packet to an IRON
router which advertises a VP in the overlay, the tunnel is via the
Internet.

So I don't understand what you mean by:

   > getting off the IRON and onto either an EID-
   > based enterprise network or onto the DFZ.

(But see potential explanation a few paragraphs down.)

> In this sense,
> any router that "sinks" EID-addressed packets that do not
> belong to either an EID-based enterprise network nor the
> DFZ is also considered as a BR.

I don't see how this would be needed.  No IRON router receives
traffic packets on the overlay - the overlay is simply a mechanism by
which IRON routers discover the "nearest" IRON router which is
advertising a VP.  As I wrote previously (and as is quoted below in
points 1 and 2), there are two reasons for doing this.  One is to
tunnel a traffic packet to that VP router.  The other is to register
an EID prefix with that VP router and with the other VP routers which
advertise the VP which covers this EID prefix.


>> However, a subset of IRON routers are BRs and are also configured to
>> perform the IDM role.  While a router could perform purely this IDM
>> role and not advertise the edge prefixes locally, I will assume this
>> would not be typical.
> 
> All IRON routers are BRs (IBRs). Some IBRs are also
> gateways for getting off the IRON and onto the DFZ.
> These are called IBGs.

Oh - you mean the IBG advertises all I-R "edge" prefixes, (perhaps as
one or a few prefixes in IPv6, though probably many would be required
for IPv4) and that this means it acts like an Ivip DITR or LISP PTR -
accepting traffic packets sent by other ASes which lack their own
IRON routers and then tunneling them firstly to an VP router, and
then (after the VP router sends back "mapping") to an IRON router
which can deliver the packet to its destination network.

I wouldn't describe this as "getting off the IRON and onto the DFZ" -
except in terms of the flow of advertisements of routes.  I tend to
think more in terms of the flow of packets rather than the flow of
information about routes to particular prefixes.


>> A VPR need not be a BR.  It need not perform any other roles, but I
>> guess it typically would perform some, such as DEL.
>>
>> Assuming that all IRON routers will, or could, perform the DEL role,
>> here are the various combinations:
>>
>>    LFR?   IDM?   VPR?   BR?
>>
>>  0 -      -      -      Maybe    Just playing the DEL role.
>>
>>  1 -      -      VPR    Maybe    Also playing the VPR role.
>>
>>  2 -      IDM    -      Yes      Just playing DEL and IDM roles  -
>>                                  but for some non-obvious reason not
>>                                  advertising I-R edge space to local
>>                                  routing system.
>>
>>  3 -      IDM    VPR    Yes      As for 2, but also VPR role.
>>
>>
>>  4 LFR    -      -      Maybe    DEL and accepting packets from the
>>                                  local network too.
>>
>>  5 LFR    -      VPR    Maybe    As for 1, but also accepting packets
>>                                  from the local network.
>>
>>  6 LFR    IDM    -      Yes      As for 2, but also accepting packets
>>                                  from the local network.
>>
>>  7 LFR    IDM    VPR    Yes      As for 3, but also accepting packets
>>                                  from the local network.
> 
> This gets way too complex, and I believe is greatly
> simplified by what I said above.

Yes, but now I have to write "and IRON router which advertises a VP"
rather than "a VPR" which means the same thing - or an "IRON router
which delivers packets to an EID-using enterprise network" rather
than  a "DEL" router.


>>>> Here is my understanding on what you just wrote:
>>>>
>>>>> The more I think about it, the more these specialized
>>>>> VP routers
>>>>
>>>> I think you mean the "DITR-like" routers are VP routers. Later you
>>>> refer to these as "IRON Default Mappers (IDMs)".  I had assumed they
>>>> either were not VP routers, or that they need not be VP routers.
>>>
>>> The latter - IDMs need not also be VP routers, but they
>>> could be.
>>
>> OK.
>>
>>
>>>> However, this part:
>>>>
>>>>> On the IRON, they advertise "default"
>>>>
>>>> makes no sense to me.  I don't recall any IRON router advertising
>>>> "default" on the IRON overlay network.  I understand that a VP router
>>>> advertises its one or more VPs.
>>>
>>> Yes; this is new. By having the IDMs connected to the DFZ
>>> advertise "default" on the IRON, other IRON routers that do
>>> not connect to the DFZ can discover a nearby IDM that can
>>> reach the non-upgraded IPv6 Internet.
>>
>> Assuming all IRON routers are IPv6 routers, why would they need to
>> find another IRON router via the overlay network which could deliver
>> packets to any IPv6 address?
> 
> Because all IBRs have full knowledge of all VPs advertised
> in the IRON, 

Yes, but with BGP in the overlay, they only get a best path to one of
the multiple routers which advertise a VP.


> but only some IBRs have knowledge of prefixes
> advertised within the DFZ. 

I don't think any IRON routers need to know what prefixes are
advertised in the DFZ, since they don't forward packets to DFZ routers.


> This latter class is known as
> IBGs, and they advertise "default" into the IRON.

Yes, but this is for the purpose of being like an Ivip DITR or LISP
PTR, as described above.  They don't forward packets to DFZ routers
so as far as I know, they don't need to know what prefixes their DFZ
router neighbours are advertising best paths for.


>> I think the reasoning for this must come from your mixed IPv4 / IPv6
>> plans, which I have tried to avoid thinking about so far.
>>
>> Can you explain more about your vision for this?
> 
> My reasons for thinking so strictly about mixed IPv4
> and IPv6 was the nice property of stateless address
> mapping when only an IPv6 address is known and not the
> corresponding IPv4 address. However, with a routing
> protocol now in use in the IRON we have state - so, my
> rationale no longer applies. 

OK - but isn't "stateless address mapping" what is contemplated below
for these?:

    IPv6-EID/IPv4-RLOC
    IPv4-EID/IPv6-RLOC


> With this in mind, IRON
> applies equally well for IPv6-EID/IPv6-RLOC, IPv4-EID/
> IPv4-RLOC and IPv6-EID/IPv4-RLOC (however, I need to
> think more about IPv4/IPv6).

IPv6-EID/IPv6-RLOC  This is what would happen if I-R was purely
                    for IPv6.

IPv4-EID/IPv4-RLOC  This is what would happen if I-R was purely
                    for IPv4.

IPv6-EID/IPv4-RLOC  I understand this as the ability of the mapping
                    (which the VP router always has, as developed
                    the potentially multiple "DEL" router
                    registrations for a given EID prefix, and which
                    it sends as "mapping" - AKA route redirection -
                    to any IRON router which tunnels a traffic packet
                    to this VP router) to tell IRON routers to
                    somehow tunnel IPv6 traffic packets to an IRON
                    router (this is where I want to use my DEL term)
                    which will deliver them to an EID-using
                    enterprise network which is an IPv6 network, but
                    which presumably doesn't have native IPv6
                    connectivity since this delivering role IRON
                    router ("DEL"!) is on an IPv4 address.

                    So this is to support isolated IPv6 networks
                    which use I-R "edge" (EID) space and which
                    receive their incoming packets via an IPv4
                    service.  (Or potentially a multihomed such
                    network where one or more of its "DEL" routers
                    is on IPv4, rather than IPv6.)

IPv4-EID/IPv6-RLOC  This would be an I-R "edge" using (EID)
                    network which lacked IPv4 connectivity (at
                    least for this particular "DEL" router) and
                    so which accepted incoming packets via such
                    a router on an IPv6 address.

                    Maybe a MN doing IPv4 applications when it is
                    physically connected only to an IPv6 network.

                    Or a non-mobile network which for some reason
                    only has an IPv6 service, but wants to run
                    IPv4 space, using I-R "edge" EID space.


>>>>>> They are going to be busy, depending on where they are located, the
>>>>>> traffic patterns, how many of them there are etc.   So they need to
>>>>>> be able to handle the cached mapping of some potentially large number
>>>>>> of I-R end-user network prefixes.
>>>>>
>>>>> In the case of IPv6, I think whether the IRON Default
>>>>> Mappers (IDMs) will be very busy depends on how large
>>>>> the IPv6 DFZ becomes. In my understanding, the IPv6 DFZ
>>>>> is not very big yet. So, if most IPv6 growth occurs in
>>>>> the IRON and not in the IPv6 DFZ the packet forwarding
>>>>> load on the IDMs might not be so great.
>>>>
>>>> This would only be true if you could convince most networks adopting
>>>> IPv6 to adopt I-R at the same time.
>>>
>>> Well, now is the time to put forward the case for
>>> handling new IPv6 growth in the IRON instead of in
>>> the IPv6 DFZ. Otherwise, once growth in the IPv6
>>> DFZ takes off and we start to see significant PI
>>> addressing and multihoming, we will eventually
>>> end up in the same boat we are in with the IPv4
>>> DFZ today.
>>
>> OK.  But I still prefer Ivip for IPv6 since it will be able to give
>> end-user networks, or their appointees, real-time control of
>> tunneling behavior.  This will be advantageous for real-time
>> responsive inbound TE and for quickly getting all traffic packets to
>> the newly selected TTR (Translating Tunnel Router) in TTR Mobility -
>> so the MN can quickly drop the tunnel it made to the previous TTR.
> 
> I will have to finally take the time to understand Ivip.
> I will try to do so soon so I can converse with you on
> more even terms.

OK - I would really appreciate this.


>>>>> The term "bubbles" came from teredo (RFC4380). Maybe we can
>>>>> think of a better term to use for IRON-RANGER?
>>>>
>>>> OK.  I don't think "bubbles" is appropriate for the registration
>>>> methods you have described so far, or that I have suggested.
>>>
>>> OK. How about Channel Queries (CQs)?
>>
>> I don't see any "channels" and it doesn't look like a "query".
>>
>> In my nomenclature, it is a DEL router registering an EID prefix (I
>> think this is the term you use in I-R) with a VPR because this VPR is
>> one of the typically two or more VPRs which handle this VP.
>>
>> What about "EID Registration Message" - ERM?
> 
> I was thinking "Prefix Control Messages (PCMs)", but I like
> yours slightly better. I will give it more thought.

OK.



>> However, I think it is wildly unrealistic to assume that IPv4 will
>> die or become anything but *the* Internet everyone relies upon for a
>> very long time, perhaps forever.  I am not saying this is a good thing.
>>
>> If you can articulate your vision for mixed IPv4 and IPv6 IRON-RANGER
>> operation, I can go along with it.  But I don't believe at all that
>> IPv6 will take over from IPv4 for most end-users before 2020.  As I
>> mentioned, there's still a lot of unused advertised space - and (I
>> assume) unused unadvertised - global unicast IPv4 address space.
>>
>> I can't envisage a situation where it will be better to sell ordinary
>> (non-mobile) users purely an IPv6 service, without even behind-NAT
>> IPv4 connectivity, than to sell them a service which is either a
>> single global unicast IPv4 address or behind-NAT IPv4.
>>
>> Mobile users could be different, since many functions and services
>> suitable for hand-held cellphone-like devices could be done via IPv6
>> - and since there would always be an option to tunnel through IPv6 to
>> an IPv4 NAT box so people can run client-style IPv4 applications on
>> their MN when they want to.
> 
> For many of the reasons you have mentioned, I am going
> to back down and say that IRON-RANGER can be agnostic to
> whatever IPvX/IPvY protocol combination gets used. I
> still believe that the expanded address space of IPv6
> will eventually steer new growth toward IPv6, but I
> won't be so brave as to guess a timeframe for this.
> 
> Still, one of the salient features of IRON-RANGER is
> support for IPv6 transition.

OK.  I understand there is potential value in the two crossover cases
mentioned above:

   IPv6-EID/IPv4-RLOC
   IPv4-EID/IPv6-RLOC

but for now I am imagining Ivip development being completely separate
for IPv4 and IPv6.  Before finalising any protocols, I would be keen
to investigate possible interworking - so I am not saying it is
impossible or a bad idea.  Just that in this stage of development I
am keeping the two systems entirely independent.



>>>> Then there are ways of using space more efficiently, as Ivip, LISP
>>>> and probably IRON-RANGER could do, by slicing and dicing it into much
>>>> smaller chunks than is possible with the /24 limit on prefixes in the
>>>> DFZ.
>>>
>>> OK.
>>
>> So to me, a successful implementation of IRON-RANGER would be as good
>> as Ivip or LISP in enabling really high levels of address utilization
>> in IPv4.  This will considerably extend the ability of IPv4 to handle
>> new users, including new end-user networks which need real global
>> unicast space (not behind-NAT) because they are running servers.
> 
> Can do.

OK.  Ivip will be able to slice and dice IPv4 space into 1, 2, 3, 4,
5, 6, 7 or any integer number of IPv4 addresses in a single micronet.
 It is not restricted to "prefixes".

LISP and IRON-RANGER work in prefixes, but can still do 1, 2, 4, 8
IPv4 addresses, which will be pretty much as good for finely slicing
the space and mapping it to wherever it is to be used.

I predict the vast majority of IPv4 micronets (EID prefixes for LISP
and I-R) will be less than 256 IPv4 addresses.  If you look at the
huge preponderance of /24 currently advertised in the DFZ:

  http://bgp.potaroo.net/as2.0/bgp-active.html

  /19    18191    5.74%
  /20    22214    7.00%
  /21    22356    7.05%
  /22    28914    9.12%
  /23    28732    9.06%
  /24   166028   52.35%

I conclude that the great majority of end-user networks find 256 IP
addresses sufficient.  Since this is the smallest number of addresses
they can advertise in the DFZ (and have the best-paths propagated
through the whole DFZ) I believe it is reasonable to assume that many
would be happy with 128 addresses, 64, 32 or whatever.  Perhaps quite
a few would be happy with 1 or 4 addresses.

At present, these /24s only take up a fraction of a percent of the
IPv4 advertised address space, so arguably they are not wasting much
space now, by being forced to be 256 addresses when less would suit
the needs of the advertising networks.  However, a good scalable
routing solution will be catering to many more end-user networks than
those which currently advertise their own prefixes in the DFZ.
Assuming the new kind of "scalable" "edge" space of LISP, Ivip or I-R
has few or no performance problems, then it could be widely used and
be used in just the right quantities required, without wasting much
space.



>>>> I think that most growth in Internet usage will occur in the IPv4
>>>> Internet for at least the rest of this decade.  The only time it
>>>> would make sense to use IPv6 instead of direct IPv4 or IPv4 behind
>>>> NAT would be for some service where it wasn't important to be able to
>>>> connect to IPv4.  At present, you couldn't sell any such service. I
>>>> guess that it may be possible to do this for large IP cell-phone
>>>> deployments where there are enough IPv6 services available to do a
>>>> reasonable subset of what people want in a hand-held device, and
>>>> where tunneling to a server which provides behind-NAT IPv4
>>>> connectivity would also be possible.
>>>
>>> I agree that the IPv4 Internet is not only not going away
>>> but also continuing to grow. But, I still think that users
>>> will want to have both IPv4 (behind NAT if necessary) and
>>> IPv6 as we move forward from here.
>>
>> At present, there's only one scenario in which I can imagine there
>> being a real demand among non-mobile customers for IPv6.  Let's say
>> that one or more large mobile phone companies decides to make their
>> new, or existing, 3G systems work with each MN having its own global
>> unicast IPv6 address (or perhaps /64).   This would enable direct
>> host-to-host connectivity between any of these MNs.  (Though carriers
>> typically want to avoid this, to stop people running VoIP and instead
>> to use their voice call services, for which they charge more than
>> they can for basic IP connectivity).
>>
>> Now let's say there are hundreds of millions or billions of these
>> MNs, each with its own global unicast IPv6 address.  That address
>> could be stable as long as the MN is in the one carrier network.  If
>> it roams to another network, it would probably get another address.
>> However, the TTR Mobility system would fix this - and give each MN
>> its own permanent /64, no matter how it connected to the Net, as long
>> as it is via IPv6.  (I do not currently plan any connections between
>> Ivip or TTR Mobility for IPv4 and IPv6 - best to keep them as
>> separate systems.)
>>
>> In this situation, people on non-mobile networks would have a genuine
>> reason to get native IPv6 connectivity.  Firstly, they might want to
>> sell or give services to these MN users.  Secondly, from home, they
>> might want to run a web-cam, file sharing, VPN or whatever which the
>> MN could access directly, on a host-to-host basis, without mucking
>> around with IPv4.
>>
>> So I can imagine this trend happening - but only once there are a
>> substantial number of ordinary users with native IPv6 connectivity.
>> I guess this is most likely to occur with cell-phones.
> 
> I honestly don't know what the drivers will be, Robin,
> but I still believe (and I still believe that the *IETF*
> believes) that IPv6 is where we need to go in the long
> run. Again, however, I agree with you that IPv4 will
> still be around for a very long time.

None of us know anything about the future - we have to make do with
educated guesses.

I agree we should plan for widespread IPv6 adoption.  I am only
arguing against assumptions such as:

   IPv6 widespread adoption will being real soon now.

   IPv4 usage is near its peak - so there's no need to plan for
   it to be more widely used, to solve its scaling problem etc.

For an example of the latter position, and implicitly the first, see
Tony Li's recent message:

  http://www.ietf.org/mail-archive/web/rrg/current/msg06192.html

     IPv4 is done.  Over.  Cooked. Fully toast.  It will either
     enter a black market where we deaggregate and no proposal
     will help, or we shift to v6 and v4 is irrelevant.  In
     either case, we're not in time to do anything significant
     for v4.  And we still need a v6 solution, that's clearly
     higher priority.


>>>>> 3) IPv6 addresses can embed IPv4 addresses such that there
>>>>>    is stateless address mapping between an EID nexthop and
>>>>>    an RLOC.
>>>>
>>>> Can you explain this with an example?  I can't clearly envisage what
>>>> you mean.
>>>
>>> I mean, if the IPv6 EID FIB includes entries with a next-hop
>>> address such as: 'fe80::5efe:V4ADDR' (i.e., an IPv6 address
>>> with embedded IPv4 address), then V4ADDR can be statelessly
>>> extracted as the RLOC address of the ETR.
>>
>> So the "mapping", which the LFR-role and IDR-role routers get from
>> the VP router is actually telling them to tunnel subsequent traffic
>> packets to an IPv4 address?   That would only work if every LFR-role
>> and IDR-role router had IPv4 access - unless you were to establish
>> special routers to act as gateways for delivering to IPv4 addresses,
>> which is not out of the question.
> 
> Public IPv4 RLOCs that are routable within the IPv4 DFZ
> is what I am suggesting.

Yes, but if you are able to specify this in the mapping sent by a VP
router to an IRON router which is accepting traffic packets and
tunneling them (initially to the VP router, and then to whichever
"DEL" role router it decides to from the mapping sent by the VP
router in response to this initially tunneled packet), then it will
only work if all these IRON routers (IBR and IBG in your terminology)
can tunnel packets to any IPv4 "RLOC" address.  Yet these are all
IRON routers which are on IPv6 addresses.

They would need either direct IPv4 connectivity to do this, or a
means of forwarding the packet, in tunneled form, to some other IPv6
router which could send them to the IPv4 address of the "DEL" role
IRON router.

Neither of these things are in the current design, as far as I know.


>> Also, an IPv6 VPR would need to be able to do the same thing - tunnel
>> an IPv6 traffic packet to a DEL-role router which is actually on an
>> IPv4 address, but which is nonetheless delivering packets to an
>> end-user network which uses an IPv6 EID.
>>
>> This could be done, I guess, but there are messy PMTUD problems to
>> solve.  I prefer not to think about such things, but for now can
>> imagine you might want to do this, and that you could devise a way of
>> doing it.
> 
> SEAL should help.

OK.


>>>>>> There are two reasons an IRON router M might need to know about which
>>>>>> other IRON routers A, B and C advertise a given VP:
>>>>>>
>>>>>>  1 - When M has a traffic packet.  (M is either an ordinary IRON
>>>>>>      router and advertises the I-R "edge" space in its own network
>>>>>>      or it is a "DITR-like" router advertising this space in the
>>>>>>      DFZ.)  M needs to tunnel the packet to one of these VP routers.
>>>>>>
>>>>>>      The VP router will tunnel it to the IRON router Z it chooses as
>>>>>>      the best one to deliver the packet to the destination network
>>>>>>      and will send a "mapping" packet to M which will cache this
>>>>>>      information and from then on tunnel packets matching the
>>>>>>      end-user network prefix in the "mapping" to Z (or some other
>>>>>>      IRON router like Z, if there were two or more in the "mapping").
>>>>>>
>>>>>>      In this case, M needs only the address of one of the A, B or C
>>>>>>      routers.  Ideally it would have the address of the closest one -
>>>>>>      but it doesn't matter too much if it has the address of a more
>>>>>>      distant one.  That would involve a somewhat longer trip to the
>>>>>>      VP router, and perhaps a longer or shorter trip from there to Z.
>>>>>>      (This would typically be shorter than the path taken through
>>>>>>      LISP-ALT's overlay network.)
>>>>>>
>>>>>>      After M gets the "mapping", it tunnels traffic packets to Z - so
>>>>>>      the distance to the VP router no longer affects the path of
>>>>>>      traffic packets.
>>>>>>
>>>>>>      In this case, BGP on the overlay would be perfectly good - since
>>>>>>      it provides the best path to one of A, B or C - typically that
>>>>>>      of the "closest" (in BGP terms).
>>>>>>
>>>>>>
>>>>>>  2 - When M is one of potentially multiple IRON routers which
>>>>>>      delivers packets to a given end-user network - packets whose
>>>>>>      destination address matches a given end-user network prefix P.
>>>>>>
>>>>>>      M needs to "blow bubbles" (highly technical term from this
>>>>>>      R&D phase of IRON-RANGER) to A, B and C.  The most obvious
>>>>>>      way to do this is for M to be able to know, via the overlay
>>>>>>      network the addresses of all VP routers which advertise a given
>>>>>>      VP.  There may be two or three or a few more of these.  They
>>>>>>      could be anywhere in the world.
>>>>>>
>>>>>>      BGP does not appear to be a suitable mechanism for this, since
>>>>>>      its "best path" basic functions would only provide M with
>>>>>>      the IP address of one of A, B and C.
>>>>>>
>>>>>>      You could do it with BGP, by having A, B and C all know about
>>>>>>      each other, and with all three sending everything they get to
>>>>>>      the others.  This is not too bad in scaling terms for two,
>>>>>>      three of four such VP routers.
>>>>>>
>>>>>>      Then, M sends its registration to one of them - whichever it
>>>>>>      gets the address of via the BGP of the overlay network - and
>>>>>>      A, B and C compare notes so they all get the registration.
>>>>>>
>>>>>>      I will call this the "VP router flooding system".
>>>>>
>>>>> This is a nice idea. If I get what you are suggesting, each
>>>>> IRON router that advertises the same VP (e.g., VP(x)) would
>>>>> need to engage in a routing protocol instance with one
>>>>> another to track all of the PI prefix registrations. The
>>>>> problem I have with it is that that would make for perhaps
>>>>> 10^5 or more of these little routing protocol instances as
>>>>> well as lots and lots of manually-configured peering
>>>>> arrangements between the IRON routers that advertise VP(x).
>>>>
>>>> Something like this - but I am not sure what you mean by "routing
>>>> protocol instance".  I understand that the two or three VP routers
>>>> for any one VP "P" do need to cooperate and share their various
>>>> registrations.  You could either create a fresh protocol to do this,
>>>> or push into service some existing protocol, including perhaps a
>>>> routing protocol.
>>>
>>> We haven't brought the Virtual Router Redundancy Protocol (VRRP)
>>> into discussion yet [RFC5798], but we might want to consider
>>> looking at this as a way of providing fault tolerance for VP
>>> routers. I'm not sure whether VRRP would also support load
>>> balancing between the multiple routers, but it seems like
>>> fault tolerance is the dominant consideration.
>>
>> I agree - fault tolerance is more important than load balancing at
>> this stage of the design, though some form of load balancing might be
>> possible and desirable too.
> 
> VRRP says that load balancing is possible, but AFAICT
> leaves it out of scope.

OK.

I imagine you would want load balancing with generally the "nearest"
VP router being used, when an IBR or IBG router tunnels the initial
one or more traffic packets (before it gets "mapping" from the VP
router.

The use of both nearest and the load sharing would generally make the
system work better.  Also, when one VP router dies, if you had three,
it would only affect on average 1/3 of the IBR and IBG routers.

Load sharing would be vital for scaling purposes - even if VRRP
somehow handled the robustness problem, there's no way you want all
the initial packets for any set of EID prefixes having to go to just
one physical router in the Net.


>> I don't want to try to read this RFC in order to imagine how it might
>> work with I-R, so if you can describe how it would work, that would
>> be good.
> 
> I touched on this above, which is just about as deep
> as my understanding goes. In an nutshell, with VRRP
> each router shares the same IP address, and each
> router maintains synchronized state. One of the
> routers is chosen as the primary, and the others
> are designated as backups. If the primary fails,
> one of the backups takes over sort of like an
> uninterruptible power supply.

I can't see how you can have this shared IP address arrangement for
IRON routers which are going to be in different places, and therefore
 in different parts of the topology.  The diverse placement is
essential for robustness - and for the goals of load-sharing with
generally shorter paths.

In the I-R overlay system, each IRON router which advertised a VP
does so giving its IP address as being the same as the IP address it
uses in the Internet.  This is because IRON routers playing your IBR
or IBG roles tunnel traffic packets directly (via the Internet, not
the I-R overlay network) to the VP router.

So I don't see how you could have multiple such routers, which must
be on different IP addresses on the Internet, behaving on the I-R
overlay as if they all had the one IP address.


>>> Using VRRP also reduces the "fanout" of VP-advertising routers
>>> to just a single RLOC address, and so makes for less complexity
>>> in ferrying CQs around the IRON.
>>
>> But if all VPRs are on the one IP address, this would radically alter
>> the nature of the overlay network.  Also a single router might be VPR
>> for multiple VPs - so I can't see how this would work.
> 
> No, it doesn't alter the overlay network in any way.

I just wrote about how I can't see how it could work.  At some stage,
 if you adopt VRRP, I guess you will explain exactly how it will work
in the context of the I-R overlay.


>> A quick look into this RFC:
>>
>>   http://tools.ietf.org/html/rfc5798#section-5.1.1.2
>>
>> indicates that it relies on multicast.  I think VRRP is intended for
>> multiple routers in a single local network, where multicast could be
>> done.  I can't imagine how you could scalably implement multicast on
>> the I-R overlay network.
> 
> No - not multicast over the I-R overlay network;
> link-local multicast on an underlying link.

I haven't read the VRRP RFC, but I don't understand how you could
scalably do any multicast on the I-R overlay network.  It is a bunch
of IRON routers, using their Internet IP addresses, with tunnels
between them purely (at present, as best I understand it) for the
purpose of handling BGP messages in this overlay network.  No other
packets flow in the overlay network itself.  So I am not sure what
"link-local" would mean in this context

  http://en.wikipedia.org/wiki/Link-local_address

It means within a local, physical, IPv6 network, or within
169.254.0.0/16 for IPv4.

I understand there is an I-R overlay BGP instance for IPv6 and
another one, involving the same (or mainly the same) routers, for
IPv4.  The participating routers use their Internet IP addresses on
these overlay networks, so I can't see how either IPv4 or IPv6
"link-local" addressing or multicast could be done.


>> I think this illustrates our differing design approaches.  I think
>> you tend to view the subsystems from a very high level - and it if
>> looks like one might do the trick, you consider it.  I immediately
>> want to know whether it is possible to do such things, and in this
>> case, it took me a few minutes with a protocol I had never heard of
>> to find a "lower level" detail which seems to preclude its use in the
>> way you intend.
>>
>> I am not suggesting my approach is always the best - because I think
>> it is important to brainstorm ideas and think loosely for a while.
>> Too much "no, it can't be done" thinking too soon results in there
>> being nothing to explore.
> 
> I'm somewhat amazed by this assessment. I am very much
> a "bottom-up" designer by nature, as can be seen in VET
> and SEAL. Higher-level architecture descriptions are not
> my strongest suit, but I guarantee you that everything I
> describe has a path toward something that can actually
> be implemented.

If I was going to suggest VRRP as a solution, I would also point out
how its mechanisms would work in the intended scenario.  Since it is
not at all obvious how you can scalably do any kind of multicast on
the overlay network, and since VRRP apparently relies on multicast,
and on some other things such as the routers sharing the one IP
address, I would have accompanied plans for VRRP with an explanation
for how these things could in fact be done in the overlay network.

But that's fine - I regard this as a brainstorming phase of design,
so its OK to consider things just because they look like they might
do the job, without assuming that they really can do it.  Even if
they can't, it might lead to a line of thought which turns out to be
of lasting value.


>>>> You haven't specified anything other than manual configuration for
>>>> how an IRON router becomes a VP router.  VP routers have extra
>>>> workload, so whoever runs such a router must have a reason to do
>>>> this, probably involving payment of money in some way from the
>>>> end-user networks whose EID prefixes are covered by this VP.
>>>
>>> Yes. End-users have to pay either a one-time or
>>> recurring cost for their PI prefixes.
>>
>> OK - but what about the costs of running the IDMs, which will handle
>> widely varying traffic loads from one EID to the next, with these
>> loads generally having little correlation with the amount of space in
>> the EID?
> 
> Somehow this cost has to be factored into EID prefix
> registry business sector's cost of doing business.
> After all, if all the EID prefix registries did was
> run VP routers and serve up EID prefixes, then the
> IRON would be detached from the DFZ and kept apart
> from a huge set of content on the Internet. So, it
> seems like each EID prefix registry should also be
> required to stand up an IBG.

Yes - IBGs are the equivalents of Ivip DITRs and LISP PTRs.  I think
they will need to monitor traffic on these and charge the destination
networks accordingly.  Ivip anticipates this, but so far LISP and
IRON-RANGER don't.


>>>> If there are two or three IRON routers acting as VP routers for a
>>>> given VP, then some organisation is responsible for that VP, is
>>>> collecting payments as described above and is therefore the one
>>>> organisation driving the existence of these two or three VP routers.
>>>>  So manual configuration seems OK to me - I don't think there needs
>>>> to be a fancy automated system by which one VP router for a given VP
>>>> "P" would auto-discover any other VP router for "P" in the whole I-R
>>>> system.  However, these VP routers for the one VP do need to work
>>>> together to share registrations, and to quickly detect when one or
>>>> more of the set becomes unreachable.
>>>
>>> VRRP maybe?
>>
>> Since it appears to involve multicast, maybe not.
> 
> I'm pretty sure it will work.

OK.


>> It shouldn't be too hard to develop a protocol by which a handful of
>> VPRs work together.  Maybe some existing protocols can be used as
>> part of this.
> 
> I really don't want to require any adjunct protocols
> that aren't already standardized.

I agree, but maybe there's nothing already in existence which will do
the job, or do it as efficiently as a purpose-built protocol.


>>>>> For these reasons, I believe it is better for IRON router
>>>>> M to know about all three of A, B and C and direct bubbles
>>>>> to each of them. I think we can achieve this using OSPF
>>>>> with the NBMA link model in the IRON overlay.
>> I quick look at:
>>
>>   http://en.wikipedia.org/wiki/OSPF
>>
>> and the IPv4 RFC:
>>
>>   http://tools.ietf.org/html/rfc2328#page-19
>>
>> indicates that a large OSPF network is organised into various areas.
>>
>> How would you do this for the IRON-RANGER overlay network?  Don't
>> OSPF and ISIS require more centralised administration, such as to
>> structure the whole system into sub-systems and to give certain
>> routers particular roles, on which other routers depend?
> 
> My understanding is that the set of designated routers
> determines each OSPF area. The name "isatapv2.net" is
> essentially the list of designated routers for the entire
> IRON as a single area. But admittedly, I need to do a
> deeper dive into OSPF to prove that this is feasible.

OK.


>> I haven't read the OSPF article, but my impression is that it is a
>> valuable resource, with Wbenton:
>>
>>   http://en.wikipedia.org/wiki/User:Wbenton-test
>>
>> contributing many things, not least a formidable table and diagram of
>> interdependencies between RFCs.  The diagram looks like it needs it
>> own routing protocol!
> 
> I appreciate all of these links, and will go chase
> them down.

OK.

>>>>> Please note: the EID-based IRON overlay is configured over
>>>>> the DFZ, which is using BGP to disseminate RLOC-based
>>>>> prefix information. So, it is BGP in the underlay and
>>>>> OSPF in the overlay - weird, but I think it works.
>>>>
>>>> Yes the DFZ uses BGP and the overlay uses . . . originally I-R used
>>>> BGP (a separate instance of BGP in each such router).  Also, IRON
>>>> routers don't need to be DFZ routers and in many or most cases are
>>>> not DFZ (BR) routers - but they all communicate via tunnels which are
>>>> carried between networks via the ordinary Internet (using the DFZ).
>>>>
>>>> I guess these tunnels between IRON routers will need to be manually
>>>> configured, since they are typically between physically and
>>>> topologically nearby routers.
>>>
>>> No manual config needed; the IRON is just a gigantic NBMA
>>> link, and can use automatic tunneling the same as for VET
>>> and ISATAP.
>>
>> But it is important for IRON routers to run their new BGP instance
>> with neighbouring IRON routers which are generally physically or
>> topologically close.  Otherwise, the "distance" metrics in the
>> overlay network won't resemble the real "distance" to the other
>> routers, and your routers playing the LFR or IDM role won't
>> automatically discover the address of the "closest" VPR for a given VP.
> 
> Do you mean distance as in hopcount? Because, every IRON
> router is a neighbor on the link - i.e., hopcount is 1
> always.

I meant the "distance" metrics of BGP - each best-path offered by a
neighbouring router is assessed according to the number of ASes it
contains, and then, subject to local policy, the one with the lowest
number of ASes is usually chosen - with this being offered to all the
neighbours, with an additional AS added (or a few, to make it less
attractive, if this is desired).

I assume that in the BGP overlay each IRON router uses as its AS the
AS it is within on the Internet.


>> These tunnels surely need to be manually configured - and that
>> defines the membership in the I-R overlay network and its structure
>> for the purposes of its BGP (or OSPF?) control plane.
> 
> Automatic tunneling is the goal I am working toward.

But if you have 100k IRON routers, how does any one IRON router
decide which small subset of these to create tunnels to?

You can't have all 100,000 IRON routers tunneling to the 99,999 other
IRON routers.

If you are going to use BGP to provide each IRON router with a
best-path to the "nearest"  VP router of several VP routers
advertising a given VP, then you need these tunnels to be with
topologically nearby routers.   I can't imagine how you could do this
automatically.

Also, if I ran an AS with one or more IRON routers, I would want to
manually configure which IRON routers in other ASes each one tunneled
to, rather than trusting some automagic system to do this.  There
will be real flows of BGP messages over those tunnels, and I would
want most or all of them to be with ASes I had a zero-cost peering
arrangement with.


>>>>>>>> Also, this is just for 10 minute registrations.  I recall that the 10
>>>>>>>> minute time is directly related to the worst-case (10 minute) and
>>>>>>>> average (5 minute) multihoming service restoration time, as per our
>>>>>>>> previous discussions.  I think that these are rather long times.
>>>>>>>
>>>>>>> Well, let's touch on this a moment. The real mechanism
>>>>>>> used for multihoming service restoration is Neighbor
>>>>>>> Unreachability Detection. Neighbor Unreachability
>>>>>>> Detection uses "hints of forward progress" to tell if
>>>>>>> a neighbor has gone unreachable, and uses a default
>>>>>>> staletime of 30sec after which a reachability probe
>>>>>>> must be sent. This staletime can be cranked down even
>>>>>>> further if there needs to be a more timely response to
>>>>>>> path failure. This means that the PI prefix-refreshing
>>>>>>> "bubbles" can be spaced out much longer - perhaps 1 every
>>>>>>> 10hrs instead of 10min. (Maybe even 1 every 10 days!)
>>>>>>
>>>>>> OK, I am not sure if I ever knew the details of "Neighbor
>>>>>> Unreachability Detection" - but shortening the time for these
>>>>>> mechanisms raises its own scaling problems.
>>>>>>
>>>>>> Can you give some examples of how this would work?
>>>>>
>>>>> I want to go back on this notion of extended inter-bubble
>>>>> intervals, and return to something shorter like 600sec
>>>>> or even 60sec. There needs to be a timely flow of bubbles
>>>>> in case one or a few IRON routers goes down and needs to
>>>>> have its PI prefix registrations refreshed.
>>>>
>>>> OK - I will stay tuned for further details.
>>>
>>> Bringing VRRP into the consideration could have a
>>> contributing factor to how long the bubble (er, CQ)
>>> interval needs to be.
>>
>> I regard the whole question of registering EIDs with VPRs as being
>> undecided until you propose an exact mechanism.
> 
> The mechanism is periodic transmission of signed router
> advertisements with credentials that prove ownership of
> the advertised prefixes. These are what I have formerly
> called "bubbles", but as discussed above we should
> probably try for a new name.

OK - but depending on your choice of BGP or OSPF I understand you
will devise a mechanism, potentially using VRRP.


>>>>>> At present, I can see these choices for this registration mechanism:
>>>>>>
>>>>>>   1 - Keep BGP as the overlay protocol and use my proposed "VP router
>>>>>>       flooding system".
>>>>>>
>>>>>>   2 - Retain your current plan of each IRON router like M needing to
>>>>>>       know the addresses of all the routers handing a given VP (A, B
>>>>>>       and C) which BGP can't do.  So you could:
>>>>>>
>>>>>>       2a - keep BGP and add some other mechanism.  Maybe M sends a
>>>>>>            message to the one of A, B or C it has a best path to,
>>>>>>            requesting the full list of all routers A, B and C which
>>>>>>            handle a given VP.  When M gets the list, it sends
>>>>>>            registration "bubbles" to the routers on the list.  This
>>>>>>            needs to be repeated from time-to-time to discover
>>>>>>            new VP routers.
>>>>>>
>>>>>>       2b - use something different from BGP which provides all the
>>>>>>            A, B and C router addresses to every IRON router, such as
>>>>>>            M.  This needs to dynamically change as A, B and C die and
>>>>>>            are restarted, or joined by others.
>>>>>
>>>>> Right - I am still leaning toward OSPF with its NBMA
>>>>> link model capabilities. The good news is that the
>>>>> IRON topology itself should be relatively stable, so
>>>>> not much churn due to dynamic updates.
>>>>
>>>> OK.  Since the IRON routers have their own IP addresses and are
>>>> generally in networks multihomed by existing BGP techniques, then any
>>>> outages don't affect the IRON routers' IP addresses or their
>>>> tunneling arrangements.  There would still be transitory breaks in
>>>> connectivity, before the BGP multihoming arrangements kick in.  If
>>>> you could ignore those by some means in the overlay's routing system
>>>> (BGP or OSPF) then yes, the IRON routers should be pretty stable.
>>>
>>> With VRRP, probably even moreso.
>>
>> Or with your own purpose-designed protocol involving one, two or a
>> few more IRON routers in their DEL-roles registering the one EID with
>> two or maybe a few more VPRs.
> 
> I'd really prefer not to do that if at all possible.
> I think VRRP fits.

OK - if you can make VRRP do the trick, then of course that is
preferable to devising a new protocol.

Thanks for the continuing discussion.

  - Robin

_______________________________________________
rrg mailing list
[email protected]
http://www.irtf.org/mailman/listinfo/rrg

Re: [rrg] IRON-RANGER scalability and support for packets from non-upgradednetworks

Reply via email to