On Tue, Dec 23, 2008 at 10:56 AM, David Meyer <[email protected]> wrote: > So what you describe is the well-known, standard > map-and-encap behavior. A packet hits the ITR, the ITR > looks up the mapping (existing mapping, DNS, APT, ALT, > static, or whatever), encaps the packet, sends it to the > destination ETR. Now, at this point if you have to asses > reachability somehow, or you can create a persistent > failure by latching on the wrong "route" (or "routes"; a > route here is analogous to what folks are calling RLOCs > in other contexts).
Hi Dave, For clarity's sake, lets say: "Latching on to a currently unreachable ETR." TRRP moves the connectedness problem away from the ITR. At the primary level, the ITR is not responsible for deciding whether it can reach a particular ETR. Period. The connectedness problem is moved entirely to the destination network's external system that selects the priority at which ETRs go into the map. The scope of the connectedness problem reduces to: 1. Deciding which of my ETRs can currently reach me, and 2. Deciding which of my ETRs are "fully connected" to the Internet for some heuristic definition of "fully connected." The ITR latches on to an unreachable ETR only if the destination selects an ETR for himself which is not fully connected to the Internet. Let's be clear about something with the connectedness problem: It's a statistics game. The overall unreliability of your Internet link is additive from all the components which introduce failures. The probability of latching on to an unreachable ETR when another is reachable doesn't have to be zero. Unless it always fails for the same group, it just has to be a small fraction of the overall unreliability of the system. > A few comments on your document (and thanks for pushing > on me to read it). Thank you! > " An AS-interior default route leads from the BGP > routers to the ITR." > > Is this saying that there is a default in the > site's iBGP or IGP that tells it how to get to the > exit points for the site? If so, is this just an > implementation detail or a requirement? If a route for the destination address exists in eBGP, then a packet with that destination address follows that route across the core to the machine using the address. If a specific route for the destination address exists in the IGP, the packet follows that route within the ISP to the machine using the address. Otherwise the packet follows a default route in the IGP. The default route leads to an ITR. The ITR encapsulates the packet inside a tunnel packet with the destination address of an ETR. The destination address for the ETR must be selected from the subset of addresses for which routes exist in eBGP. > "The ITR finds an Egress Tunnel Router (ETR) for the > destination IP address via a DNS lookup." > > What destination address? The destination address in > the packet emitted by the host? Yes. > "The ITR should look up the route entry when it > first sees a packet for the destination. If > possible, it should hold packets for the destination > in a buffer until a route is found. " > > So this is standard map-and-encap behavior (mod > holding the packet). I see that you inverse address > query for a TXT record associated with the IP address > in the destination address in the packet that > the host emitted (this TXT record encodes information > about the RLOCs of the ETRs, if you will). Yes? Yes. > So at this point if you have to asses reachability > somehow. No. The destination network has already made a determination that the ETR is reachable by all hosts on the Internet. He made this determination when he set the contents of that TXT record. If he made the wrong choice then he is unreachable, just as if all possible paths to him were down.** ** Not strictly true; it may still be possible to reach him. For the purpose of quantifying the worst case scenario, the destination is unreachable if the ETR to which he assigned top priority is unreachable. > - "The ITR then tunnels the packet via GRE to the "best" > ETR." > > So the ITR got a set of ETR addresses from the > DNS (TXT records) in the previous step, each of which > is a candidate site for where it might send the > (encapsulated?) packet. Correct? Yes with a caveat: the expected behavior is that the ITR will select the one ETR address at the top of the stack. None of the others are initially candidates. The others become candidates only under circumstances in which one of the secondary mechanisms has shown itself to be functioning successfully. For example, if a packet sent to the ETR at the top of the stack returns a host unreachable then the second ETR in the stack becomes a candidate. But if no packet returns, either because they're all delivered or because the packets are dropping into a black hole, the second and later ETRs in the TXT record never become candidates. > In addition, the ITR gets this list when it inverse > address queries for whatever was in the destination > of the packet that the host emitted (this is how I > understand your use of the DNS). It encapsulates this > packet over GRE to the destination found in the TXT > record (with some processing of the options in the > TXT record). Correct? Yes. > If so, how does the ITR know which ETRs its going to > talk to (in order to set up the GRE tunnels)? Or does > the ITR not set up s GRE tunnel between the ITR and > ETR? Or did you mean some kind of dynamic GRE (i.e., > just uses GRE encap, or ?). It's stateless; there is no setup. I think Scott Brim described the system as a "funnel" rather than a tunnel, and that's pretty good way to put it. The ETR decapsulates packets with its destination address that are in the GRE format. Unless you tell it otherwise, it doesn't care what the source address of the GRE packet was. > I see from this example that its the ETR that sets up > a p2mp tunnel: > > Cisco IOS configuration example: > > interface Tunnel0 > description TRRP Egress Tunnel Router > no ip address > tunnel source FastEthernet0/0 > tunnel mode gre multipoint > tunnel key 1 > > makes it seem lile the ETR will need to know where all > the other ITRs that might want to talk to it are. No. Note the absence of a "tunnel destination" parameter. That particular configuration will accept and decapsulate GRE packets with key 1 from any source IP address. I originally tried to do it without a gre key too. Linux was happy but IOS decided that with neither a tunnel destination nor a tunnel key, it couldn't decide which tunnel interface to associate the incoming packets with. > - "The ETR, which must have an IP address within space > announced via BGP, knows a local-scope route to the > destination IP address and delivers the packet. " > > That makes sense. The IGP (or possibly iBGP) knows how > to how to route packets within the domain (that's the > point of the IGP, AFAIK). Anyway, that makes sense. > > - "Longer route prefixes are then withdrawn from BGP > until a comfortable BGP table size is attained." > > I couldn't get that to follow from what you said, or > how it works. Let's say you have a 192.1.1.0/24. You announce that with eBGP into the network core via, let's say, Sprint. One day you get a call from a Verizon rep. The Verizon rep says: "We see that you announce 192.1.1.0/24 from Sprint. We're going to stop accepting /24's from Sprint 90 days from now. The /24 routes from other ISPs cost us too much money and we don't get paid for them. If you want to continue talking to Verizon customers, you must do one of the following:" "A. Pay us $50/month to continue accepting your 192.1.1.0/24 route from Sprint." "B. Deploy TRRP on your network." You're really pissed off at Verizon for trying to extort money from you, so you look into this TRRP thing instead. It turns out that what you have to do is get Sprint to give you two IP address from their 65.160.0.0/13 block. You ask Sprint to give you two and they assign 65.160.123.45 and 65.160.123.46. You pull an old Cisco 2501 out of the closet and configure it with a GRE tunnel on the 65.160.123.45 address. Next you have to go over to ARIN. They manage the address space for 192.1.1.0/24, and thus the DNS domain for 1.1.192.v4.trrp.arpa. You tell ARIN: "Please delegate 1.1.192.v4.arpa my DNS server at ns1.1.1.192.v4.arpa, whose IP address is 65.160.123.46." Finally, you set up a DNS server on 65.160.123.46 which has just one entry: *.1.1.192.v4.arpa TXT 80,g4,65.160.123.45 Done. 90 days later, Verizon stops accepting 192.1.1.0/24. Inside their network, packets follow a default route to an ITR. The ITR queries the TXT record in your DNS server and encapsulates packets for you in a GRE packet with a destination address of 65.160.123.45. You don't even notice. A few weeks later, an AT&T rep calls and says, "Listen, we're not going to keep accepting your route for free. Pay us $50 or deploy TRRP." You respond: "Screw you, I already have TRRP." > (i). dig couldn't find a nameserver for either > v4.trrp.arpa or v6.trrp.arpa, or > v4.trrp.in-addr.arpa or v6.trrp.in-addr.arpa It doesn't exist. No point requesting resources at that level for an experimental protocol when any subdomain will do. > (ii). The term "route" is somehow overloaded in your > document. In particular, when you're talking > about the TXT record encoding, you overview the > encoding as 'pp,ii,route pp,ii,route ..." and in > the encoding you have stuff like I wrote the document in earlier days before some of the terminology was sorted out. It could use some cleanup. > (iii). Host unreachables > > You say that the ETR needs to unreachables do to > the insecurity of ICMP. You suggest sending ICMP > echo-request. This is another form of searching > (with data probes) for locator liveness. Bear in mind that this is a *secondary* mechanism. In a subset of cases, it allows the ITR to try to find an alternate path when the destination network made a poor choice of ETR. If the ping succeeds, the ITR must ignore the unreachable. Otherwise this secondary mechanism could disrupt the primary mechanism and errantly latch on to an ETR other than the one the destination network told it to use. All I'm really saying here is this: the connectivity decision is supposed to be made by the destination network, but there's no reason to be excessively dogmatic about it. When normal operations provides you with "free" information to the effect that the destination network chose badly, why not use it? > (iv). BGP with the "Holey Route Authority" > > You say: > > A central route authority keeps track of all > holey routes and peers them with all ASes which > implement an ITR. Each such AS implements a > route-map to force those prefixes to head the > same direction as the default route. > > All I can say is that hasn't worked to date. Another approach would be to add a BGP option or community or something to the supernet announcement which advises all recipients that the supernet prefix is associated with TRRP use and should be discarded by systems which support TRRP. Admitted as a particularly nasty and not truly solved corner case in TRRP. > (v). Preemptive Change Notification (PCN) > > You say (PCN Type 1): > > PCN Type 1: Notification of change by the DNS > server The authoritative DNS Route server > remembers all IP addresses which requested each > Route entry during the entry's TTL. Immediately > upon the Route entry's change, the Route Server > sends a UDP message to port XX of the > requestor. The first byte of the message is > 0x01. The remaining bytes are the DNS query > which should immediately expire. > > Can you comment on the scaling propeties of this? PCN doesn't scale. The reality of it is that if you use it you'll pick a maximum size for the pool of requestors you're going to remember and when it's full it's full. Everybody else will have to pick up the change after the TTL expires. PCN is an optional, secondary mechanism. In a subset of cases it allows TRRP to fail over to an alternate ETR faster than the baseline of a few tens of seconds. Regards, Bill Herrin -- William D. Herrin ................ [email protected] [email protected] 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004 _______________________________________________ rrg mailing list [email protected] https://www.irtf.org/mailman/listinfo/rrg
