I received a message from someone who has recently started to follow the RRG list and had some trouble understanding some terminology in the Ivip Conceptual Summary and Analysis document:
http://www.firstpr.com.au/ip/ivip/Ivip-summary.pdf Here is what I wrote, explaining some terms and concepts which are basic to all map-encap schemes, and some others which are specific to Ivip. ITR ETR Query Server DFZ TE RUAS Root Update Authorisation System ITRD Full Database ITR ITRC Caching ITR QSD Full Database Query Server QSC - Caching Query Server - Robin ITR --- An ITR is an Ingress Tunnel Router. This is common to all the map-encap schemes (LISP, APT, Ivip and TRRP - see the RRG wiki page for details: http://www3.tools.ietf.org/group/irtf/trac/wiki/RoutingResearchGroup All these schemes involve the address space (IPv4 assumed in the following explanations, but also for IPv6) having some sections of its range being devoted to "end-user networks". The idea is that all hosts sending packets to addresses in these ranges will have those packets handled by an ITR. ETR --- This ITR tunnels the packet to an Egress Tunnel Router (ETR). The ITR requires some "mapping" information to tell it the address of the ETR which is appropriate for this packet's destination address. The tunnelled part of the packet's path is between the ITR and ETR, where the original packet is the payload in a larger packet, where the new header is addressed to the ETR, not to the original packet's destination address. The ETR strips off the outer header, reconstituting the original packet, and sends this directly to the end-user's network. The idea is that an end-user network can be anywhere in the world, such as Madrid, and that ITRs all over the world, close to sending hosts, will tunnel the packets directly to an ETR in Madrid. Later, if the end-user network moves to Hawaii, the mapping is changed and all the world's ITRs tunnel the packets directly to an ETR in Hawaii. This makes the address space portable, and usable with any ISP who provides an ETR. The ITR network, the mapping system and multiple ETRs can be used to provide multihoming and Traffic Engineering. In Ivip, the mapping information is simple: If a destination address is within a "micronet", then there is one ETR address to which all packets addressed to addresses within that micronet must be sent. (Bill Herrin suggested the term "micronet" and I now use it in Ivip.) In LISP, the term "EID prefix" means the same thing as "micronet". An EID prefix's mapping information consists of one, two or more ETR addresses and some information about priorities for choosing them, regarding multihoming and Traffic Engineering in the form of load sharing. I think APT and TRRP are similar. These schemes: LISP, APT, Ivip and TRRP are "map-encap" schemes, meaning that the packet's destination address is used by an ITR to look up some mapping information, which gives the ITR one or more ETR addresses to which it will be tunnelled. "Encapsulation" means putting the original packet in a larger packet, and sending it to the ETR - which constitutes a tunnel. These schemes differ considerably in where ITRs and ETRs are located, but the most dramatic differences between them are in how the ITR gains the mapping information. Example of multihoming service restoration ------------------------------------------ Please see the diagram at the start of: http://www.firstpr.com.au/ip/ivip/ A packet from sending host H1 is addressed to receiving host IH9, which has an address in a micronet which is part of a Mapped Address Block which is managed by Ivip. The packet leaves the network and is forwarded to ITR1, which is an "anycast ITR in the DFZ/core". Initially, the mapping for this micronet is to tunnel the packet to ETR1, which is in ISP N3, one of the two ISPs used by the multi-homed end-user. When any of the following occur: 1 - N3's network is unreachable from the rest of the Net. 2 - ETR1 dies. 3 - The end-user's link to ETR1 fails. then something (probably a separate commercial global multihoming monitoring system which the end-user pays for and which controls the mapping of their micronet) changes the mapping to tunnel these packets to ETR2 instead. Then the packets are delivered to IH9 again. Packets from H3 are handled in a similar way, except its network N2 has its own ITR. Query Server ------------ "Query Server" is a general term for any server which responds to queries. In the context of the RRG discussions, I use the term specifically for certain elements of Ivip, and also more generally to refer to what I regard as "query servers" in other proposals. For instance, in LISP-ALT, a query from an ITR about mapping is passed over the ALT network to some device which sends an answer to the ITR. That may be an ETR or something else, but I refer to it, in a general sense, as a "query server". In APT, the "Default Mapper" is a "query server" (as well as being a full database ITR). In TRRP, the authoritative nameservers in the trrp.arpa domain and its subdomains are "query servers". APT's and Ivip's query servers are local, but LISP-ALT's and TRRP's are (in general) located somewhere in a global network. DFZ --- "DFZ" means "Default Free Zone". The Internet's inter-domain routing system uses routers which compare notes with each of their peers (other routers they have direct links to) using BGP (Border Gateway Protocol) about the best path to send packets on, according to which of many prefixes the packet is addressed to. In IPv4, there are currently about 250,000 such prefixes advertised in BGP: http://bgp.potaroo.net Each such prefix is announced (typically) by one or more border routers of ISPs or of end-user networks. In order to participate in the BGP system, and thereby have a direct connection to the "core" of the Internet, each such ISP or end-user network needs an Autonomous System (AS) number (ASN). The router's talk to their peers about each such prefix, telling each peer an intentionally simplified measure of how hard it is for the router to deliver packets addressed to each prefix. The value given is the number of Autonomous Systems the packet would have to pass through. This value may be boosted above the true value according to operator's desire not to handle such packets. Routers decide where to send packets according to which peer reports the lowest value, and according to locally programmed policies. Consider a BGP border router of an ISP or end-user network, where the ISP or end-user network has a single prefix: 50.0.0.0/20. If this network is "single-homed" (the opposite of multi-homed) then this router has a single upstream peer - a single BGP router in another AS by which to send and receive packets to and from the rest of the Internet. This is a pretty simple task: If the destination address is within 50.0.0.0/20, it needs to be sent somewhere in the local network. If not, it needs to be sent to the upstream link. This means the single-homed router's FIB (Forwarding Information Base) functions which actually handle each traffic packet can be pretty simple - they only need to test for 50.0.0.0/20 and any smaller ranges of addresses within this (longer prefixes). If the packet doesn't match one of these, then it is sent to the "default route" - which is to the single upstream link. Even if the local network has a few dozen or a few hundred prefixes, this is still a relatively small task compared to what a multihomed BGP router must do. Also, the single-homed router only needs to have its BGP conversations with a single peer - the router on the upstream link. It doesn't really matter what values the peer tells it about the 250,000 or so BGP advertised prefixes, since the border router has no other place to send packets which don't match 50.0.0.0/20. Now consider a multihomed ISP or end-user network. Its border router has two or more upstream links - two in this example. For every one of the 250,000 prefixes (apart from whichever of those are advertised by its own network), the multihomed border router needs to make a decision about whether it is best to send them to upstream link A or upstream link B. Generally, the packet would be delivered either way, but one way will typically be "shorter" (by the crude "number of ASes" measure used by BGP) then the other. So the router's CPU conducts a set of 250,000 conversations with each of its upstream peers - A and B. Then, it makes 250,000 decisions about which of these links is the best one to send packets on for each such prefix. Any time local policy changes sufficiently, or the reports from its peers change their reports sufficiently, this router may decide that it should send packets for this prefix to a different peer than the one it currently sends them to. Whenever it makes such a change, it announces this to its peers, with the appropriate number of ASes in the announcement. (A crucial low-level detail is that the announcement contains a list of ASNs through which the packet would travel, so other routers can avoid using paths which include their own ASN. This is a robust way of preventing routing loops.) Then all these 250,000 decisions are programmed into the router's FIB to handle packets in this way, which means the FIB section (typically expensive hardware) in the router needs to be able to cope with this many divisions. The BGP router of a multihomed network (any BGP router with two or more "upstream links - or any "transit" router, which is between multiple ASes and has no network or its own - must always engage in multiple sets of conversations with its multiple peers. Likewise, its FIB always needs to have at least 250,000 separate rules by which it can instantly (less than a microsecond or so) classify incoming packets so as to forward them to whichever of the router's interfaces lead to the best link for these packets. These multihomed and transit BGP routers cannot depend on the simple arrangement of testing for local prefixes and, if there is no match, sending the packet according to the "default route". Their task is much more demanding. So multihomed and transit BGP routers are said to be in the "Default Free" Zone! There are something like 200,000 such routers in the DFZ - see the "Routers in DFZ - reliable figures from iPlane" thread last year: http://psg.com/lists/rrg/2007/ . Also, someone mentioned a similarly rough figure on the RRG list recently. Problems with the cost of these routers, and with delays and stability problems as they try to figure out the best path for packets, via their 250,000 conversations with each peer, are the main driving force behind the RRG's project of devising a new architectural solution to this routing scaling problem: http://tools.ietf.org/html/rfc4984 http://tools.ietf.org/html/draft-irtf-rrg-design-goals-01 http://tools.ietf.org/html/draft-narten-radir-problem-statement-01 The primary problem is that the only way a network can gain portable address space, and/or address space which can be used for multihoming and Traffic Engineering, is by getting its own one or more prefixes from an RIR and advertising it (or splitting it into longer prefixes, such as a /20 into 16 /24s) BGP. Each such prefix represents a further burden on all DFZ routers. Bill Herrin attempts to estimate the cost of every such prefix: http://bill.herrin.us/network/bgpcost.html and arrives at the conservative estimate that every time someone advertised such a prefix, it costs everyone else USD$8000. Part of his estimate is that the price premium of a router which can handle the DFZ tasks is at least USD$30,000. I think this refers to the difference between a router which can perform the multihomed BGP border router functions and one which can't, but could do a single-homed BGP border router function. Perhaps it means the price difference between routers which can't handle BGP (and its 250,000+ prefixes) at all, and those which can, for both single- and multi-homed border router scenarios. Such prefixes are variously known as "BGP advertised prefixes", "DFZ routes" etc. The total set of them may be known as the "global routing table", the "DFZ routing table" or sometimes just the "DFZ". Hence, "injecting a route into the DFZ" means a network advertising a prefix via its BGP border router, adding one more prefix (AKA "route") to the ~250,000 already existing. Our primary goal is to devise some new architecture which will enable large numbers of end-user networks (not ISPs, who really need full BGP-managed address space) to get address space which is portable and usable for multihoming and Traffic Engineering, without adding further to the bloat in the "DFZ routing table". The Net is not going to stop functioning if the current 250k size grows past some limit, but cost and instability problems will get worse unless something is done. Since there are millions of end-users who will want and arguably need multihomable address space, we clearly need a new way of providing for their needs. While we are about doing this, for instance with a map-encap scheme, some of us also want to enable finer and less expensive divisions in the IPv4 address space to enable higher rates of utilization - to combat the IPv4 address depletion problem. Since any global network of ITRs and ETRs is an extraordinarily powerful tool which has not been contemplated before - but which apparently needs to be built to solve the routing scaling problem - some of use want to ensure it supports new approaches to mobility (rapidly moving a device to another physical or topological location, but keeping its IP address or address prefix). Existing approaches to mobility require extensive changes to host operating systems and generally find it a challenge to maintain optimally short path lengths for the packets. TE -- "Traffic Engineering" (TE) . . . Some other RRG folks could probably provide a more comprehensive definition, but for this discussion, TE means the ability of an edge network (and/or its ISP) to control the path of packets over multiple alternative links, usually according to what type of traffic the packets are part of. This might be according to the packets' destination address, or perhaps according to whether it is an HTTP or a VoIP voice packet. An example of outbound TE is: a multihomed network has two upstream links and programs its border router to send some types of packets out link A and the rest out link B. This may achieve the goal of load sharing, or favour one link because it is faster, cheaper, more reliable etc. than the other. Outbound TE is easy, and not a problem for any map-encap scheme. The real challenge is inbound TE. How, with a map-encap scheme, can a multihomed end-user edge network control the global ITR system so that some traffic comes in via link A (meaning the packets are tunnelled to an ITR in ISP-A) and the rest via another ETR in ISP-B, arriving over link B? LISP, APT and TRRP include TE constructs in their mapping information, requiring each ITR to make decisions about which of multiple ETRs to send the packets to. Ivip has no such explicit TE functions. To achieve TE like this, the address space in question must be split into two or more micronets, thereby splitting the traffic (this won't work if all traffic is to one address) and then by mapping each micronet to a different ETR. RUAS - Root Update Authorisation System --------------------------------------- In Ivip, there are multiple BGP advertised prefixes within which the address space is managed by Ivip's mapping system. Each such prefix is called a Mapped Address Block (MAB). Therefore, ITRs find packets addressed to any one of these MABs and tunnels them to an ETR, according to the mapping for the micronet to which the packet is addressed. (All addresses within a micronet have the same mapping information - simply an ETR address.) Each MAB - such as a /12, /16 or /20 - typically contains many areas called User Address Blocks (UABs), each of which is controlled by a single end-user. The end-user decides how to split those UABs into micronets, and then decides the mapping (ETR address) for each micronet. There are multiple RUASes. Each one is authoritative for one or more MABs. Therefore, each end-user has a direct or indirect relationship with the RUAS which is responsible for controlling the mapping of the MAB within which its UAS is located. RUASes work together to create a stream of updates, such as one every second, which are sent out through a cross-linked tree-structured system of "Replicators" to all the world's full database ITRs and full database query servers. The goal is for an end-user's mapping change command, directly or indirectly to an RUAS, to be received by all the world's ITRs within 5 seconds. This includes caching ITRs which are handling packets - or at least have recently handled packets - addressed to to this micronet. ITRD - Full Database ITR ------------------------ These ITRs get the full feed of mapping updates, and therefore maintain in real-time a copy of the entire Ivip mapping database. This means when they get a packet addressed to any MAB, they already have the mapping information for all the micronets in that MAB and therefore know which ETR the packet should be tunnelled to. In the longer-term future, when there are millions or billions of micronets, it is unlikely that every (or any) ITRD will have its FIB already programmed to handle packets addressed to every possible micronet. More likely, across the whole address range, the FIB will specify something like: 1 - Forward the packet to an interface, as already determined by decisions made with BGP conversations with peers. (Conventional BGP-based forwarding as is done today.) 2 - The packet is addressed to a particular micronet, and the ETR address to tunnel it to is therefore: aa.bb.cc.dd 3 - The packet is addressed to a MAB, but the FIB does not currently have the ETR address for its micronet. So query the router's central CPU, find the mapping and then go to step 2. 4 - Handle the packet in some other way. 5 - None of the above - drop the packet. So I think ITRDs of the future will be a "caching ITR" in their FIB packet handling section, with an inbuilt full database query server. I think the same would be true of the full database ITRs of other schemes: all ITRs in LISP-NERD and the Default Mappers of APT. ITRC - Caching ITR ------------------ An ITR which doesn't have a full copy of the mapping database. It sends a query to another device, a full database query server or a caching query server, and pretty soon (tens of milliseconds max, unless a query or response packet is lost, or the query server is dead) gets a response back. It holds the packet until it gets the response, and then tunnels it according to the mapping information. ITRCs clearly need less storage and ITRDs. Also, they don't need to get the continual feed of mapping updates. So ITRCs are much cheaper and can be much more numerous. This spreads the load of ITR work, and by bringing the ITR function closer to the sending host, helps ensure the total path taken by the packet is as short as possible. In other proposals, all LISP-ALT ITRs are caching ITRs, and so are all ITRs in TRRP. The ITRs in APT which are not Default Mappers are also caching ITRs. Only LISP-NERD has no caching ITRs. Ivip also has the option of an ITRC function being built into a sending host (ITFH - ITR Function in Host). The sending host's address can be an Ivip-mapped address, or an ordinary BGP-managed ("RLOC" in LISP terms) address, but it cannot be behind NAT. The primary reason for this is that an ITRC needs to be reachable by a query server when the query server sends it a "Notification" that some mapping has changed for a micronet the ITRC recently made a query about. ITFHs should be essentially zero-cost ITRs, with optimal path lengths. QSD - Full Database Query Server -------------------------------- Like an ITRD, a QSD gets the full feed of mapping updates and so has a real-time updated copy of the whole mapping database (the sum total of all the mapping for all the MABs from all the RUASes). QSDs are intended to be in ISP and end-user networks so nearby ITRCs, ITFHs and QSCs can quickly and reliably get the mapping information they need. The QSD keeps a record of queries it answered recently, and the caching times for the mapping information it sent back, so that if mapping for one of these micronets changes within a caching time, the QSD will send out a "Notification" with the new mapping information to the querier. In APT, the Default Mapper is also a "local" full database query server. It has the full database and is within an ISP network, for the caching ITRs in that network, meaning the replies come quickly and reliably. QSC - Caching Query Server -------------------------- Ivip has the option for caching query servers. It may be best for a network to have one or a few QSDs, and to enable ITRCs and ITFHs to query some closer, more numerous and cheaper and more lightly loaded QSCs, rather than directly querying the few expensive, busy, probably more distant, QSDs. This way, very often (ideally) the local QSC will already have the mapping a particular ITRC/ITFH needs, since it was probably asked about the same micronet recently by some other ITRC/ITFH. This reduces load on the QSDs and speeds the response time for some or many queries. QSCs pass on Notifications to whichever device queried them about the micronet which has just had its mapping changed. An ITRC or ITFH could query a QSC or a QSD. Each QSC could query directly to a QSD, or ask another QSC. In the latter case, there could be multiple levels of QSC, but eventually the answer would be given by a QSD. -- to unsubscribe send a message to [EMAIL PROTECTED] with the word 'unsubscribe' in a single line as the message text body. archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg
