On Tue, Dec 23, 2008 at 10:56 AM, David Meyer <[email protected]> wrote:
>        So what you describe is the well-known, standard
>        map-and-encap behavior. A packet hits the ITR, the ITR
>        looks up the mapping (existing mapping, DNS, APT, ALT,
>        static, or whatever), encaps the packet, sends it to the
>        destination ETR. Now, at this point if you have to asses
>        reachability somehow, or you can create a persistent
>        failure by latching on the wrong "route" (or "routes"; a
>        route here is analogous to what folks are calling RLOCs
>        in other contexts).

Hi Dave,

For clarity's sake, lets say: "Latching on to a currently unreachable ETR."

TRRP moves the connectedness problem away from the ITR. At the primary
level, the ITR is not responsible for deciding whether it can reach a
particular ETR. Period.

The connectedness problem is moved entirely to the destination
network's external system that selects the priority at which ETRs go
into the map. The scope of the connectedness problem reduces to:

1. Deciding which of my ETRs can currently reach me, and
2. Deciding which of my ETRs are "fully connected" to the Internet for
some heuristic definition of "fully connected."


The ITR latches on to an unreachable ETR only if the destination
selects an ETR for himself which is not fully connected to the
Internet.


Let's be clear about something with the connectedness problem:

It's a statistics game. The overall unreliability of your Internet
link is additive from all the components which introduce failures. The
probability of latching on to an unreachable ETR when another is
reachable doesn't have to be zero. Unless it always fails for the same
group, it just has to be a small fraction of the overall unreliability
of the system.



>        A few comments on your document (and thanks for pushing
>        on me to read it).

Thank you!


>          " An AS-interior default route leads from the BGP
>          routers to the ITR."
>
>           Is this saying that there is a default in the
>           site's iBGP or IGP that tells it how to get to the
>           exit points for the site? If so, is this just an
>           implementation detail or a requirement?

If a route for the destination address exists in eBGP, then a packet
with that destination address follows that route across the core to
the machine using the address.

If a specific route for the destination address exists in the IGP, the
packet follows that route within the ISP to the machine using the
address.

Otherwise the packet follows a default route in the IGP. The default
route leads to an ITR. The ITR encapsulates the packet inside a tunnel
packet with the destination address of an ETR. The destination address
for the ETR must be selected from the subset of addresses for which
routes exist in eBGP.


>          "The ITR finds an Egress Tunnel Router (ETR) for the
>          destination IP address via a DNS lookup."
>
>            What destination address? The destination address in
>            the packet emitted by the host?

Yes.


>             "The ITR should look up the route entry when it
>             first sees a packet for the destination. If
>             possible, it should hold packets for the destination
>             in a buffer until a route is found. "
>
>           So this is standard map-and-encap behavior (mod
>           holding the packet). I see that you inverse address
>           query for a TXT record associated with the IP address
>           in the destination address in the packet that
>           the host emitted (this TXT record encodes information
>           about the RLOCs of the ETRs, if you will). Yes?

Yes.


>           So at this point if you have to asses reachability
>           somehow.

No. The destination network has already made a determination that the
ETR is reachable by all hosts on the Internet. He made this
determination when he set the contents of that TXT record. If he made
the wrong choice then he is unreachable, just as if all possible paths
to him were down.**

** Not strictly true; it may still be possible to reach him. For the
purpose of quantifying the worst case scenario, the destination is
unreachable if the ETR to which he assigned top priority is
unreachable.




>        - "The ITR then tunnels the packet via GRE to the "best"
>           ETR."
>
>            So the ITR got a set of ETR addresses from the
>            DNS (TXT records) in the previous step, each of which
>            is a candidate site for where it might send the
>            (encapsulated?) packet. Correct?

Yes with a caveat: the expected behavior is that the ITR will select
the one ETR address at the top of the stack. None of the others are
initially candidates. The others become candidates only under
circumstances in which one of the secondary mechanisms has shown
itself to be functioning successfully.

For example, if a packet sent to the ETR at the top of the stack
returns a host unreachable then the second ETR in the stack becomes a
candidate. But if no packet returns, either because they're all
delivered or because the packets are dropping into a black hole, the
second and later ETRs in the TXT record never become candidates.


>            In addition, the ITR gets this list when it inverse
>            address queries for whatever was in the destination
>            of the packet that the host emitted (this is how I
>            understand your use of the DNS). It encapsulates this
>            packet over GRE to the destination found in the TXT
>            record (with some processing of the options in the
>            TXT record). Correct?

Yes.


>            If so, how does the ITR know which ETRs its going to
>            talk to (in order to set up the GRE tunnels)? Or does
>            the ITR not set up s GRE tunnel between the ITR and
>            ETR? Or did you mean some kind of dynamic GRE (i.e.,
>            just uses GRE encap, or ?).

It's stateless; there is no setup. I think Scott Brim described the
system as a "funnel" rather than a tunnel, and that's pretty good way
to put it. The ETR decapsulates packets with its destination address
that are in the GRE format. Unless you tell it otherwise, it doesn't
care what the source address of the GRE packet was.



>            I see from this example that its the ETR that sets up
>            a p2mp tunnel:
>
>                Cisco IOS configuration example:
>
>                interface Tunnel0
>                  description TRRP Egress Tunnel Router
>                  no ip address
>                  tunnel source FastEthernet0/0
>                  tunnel mode gre multipoint
>                  tunnel key 1
>
>           makes it seem lile the ETR will need to know where all
>           the other ITRs that might want to talk to it are.

No. Note the absence of a "tunnel destination" parameter. That
particular configuration will accept and decapsulate GRE packets with
key 1 from any source IP address.


I originally tried to do it without a gre key too. Linux was happy but
IOS decided that with neither a tunnel destination nor a tunnel key,
it couldn't decide which tunnel interface to associate the incoming
packets with.



>        - "The ETR, which must have an IP address within space
>           announced via BGP, knows a local-scope route to the
>           destination IP address and delivers the packet. "
>
>           That makes sense. The IGP (or possibly iBGP) knows how
>           to how to route packets within the domain (that's the
>           point of the IGP, AFAIK). Anyway, that makes sense.
>
>        - "Longer route prefixes are then withdrawn from BGP
>           until a comfortable BGP table size is attained."
>
>           I couldn't get that to follow from what you said, or
>           how it works.

Let's say you have a 192.1.1.0/24. You announce that with eBGP into
the network core via, let's say, Sprint.

One day you get a call from a Verizon rep. The Verizon rep says:

"We see that you announce 192.1.1.0/24 from Sprint. We're going to
stop accepting /24's from Sprint 90 days from now. The /24 routes from
other ISPs cost us too much money and we don't get paid for them. If
you want to continue talking to Verizon customers, you must do one of
the following:"

"A. Pay us $50/month to continue accepting your 192.1.1.0/24 route from Sprint."

"B. Deploy TRRP on your network."

You're really pissed off at Verizon for trying to extort money from
you, so you look into this TRRP thing instead.

It turns out that what you have to do is get Sprint to give you two IP
address from their 65.160.0.0/13 block. You ask Sprint to give you two
and they assign 65.160.123.45 and 65.160.123.46. You pull an old Cisco
2501 out of the closet and configure it with a GRE tunnel on the
65.160.123.45 address.

Next you have to go over to ARIN. They manage the address space for
192.1.1.0/24, and thus the DNS domain for 1.1.192.v4.trrp.arpa. You
tell ARIN: "Please delegate 1.1.192.v4.arpa my DNS server at
ns1.1.1.192.v4.arpa, whose IP address is 65.160.123.46."

Finally, you set up a DNS server on 65.160.123.46 which has just one
entry: *.1.1.192.v4.arpa TXT 80,g4,65.160.123.45

Done.

90 days later, Verizon stops accepting 192.1.1.0/24. Inside their
network, packets follow a default route to an ITR. The ITR queries the
TXT record in your DNS server and encapsulates packets for you in a
GRE packet with a destination address of 65.160.123.45. You don't even
notice.

A few weeks later, an AT&T rep calls and says, "Listen, we're not
going to keep accepting your route for free. Pay us $50 or deploy
TRRP." You respond: "Screw you, I already have TRRP."



>        (i).    dig couldn't find a nameserver for either
>                v4.trrp.arpa or v6.trrp.arpa, or
>                v4.trrp.in-addr.arpa or v6.trrp.in-addr.arpa

It doesn't exist. No point requesting resources at that level for an
experimental protocol when any subdomain will do.


>        (ii).   The term "route" is somehow overloaded in your
>                document. In particular, when you're talking
>                about the TXT record encoding, you overview the
>                encoding as 'pp,ii,route pp,ii,route ..." and in
>                the encoding you have stuff like

I wrote the document in earlier days before some of the terminology
was sorted out. It could use some cleanup.




>        (iii).  Host unreachables
>
>                You say that the ETR needs to unreachables do to
>                the insecurity of ICMP. You suggest sending ICMP
>                echo-request. This is another form of searching
>                (with data probes) for locator liveness.

Bear in mind that this is a *secondary* mechanism. In a subset of
cases, it allows the ITR to try to find an alternate path when the
destination network made a poor choice of ETR.

If the ping succeeds, the ITR must ignore the unreachable. Otherwise
this secondary mechanism could disrupt the primary mechanism and
errantly latch on to an ETR other than the one the destination network
told it to use.


All I'm really saying here is this: the connectivity decision is
supposed to be made by the destination network, but there's no reason
to be excessively dogmatic about it. When normal operations provides
you with "free" information to the effect that the destination network
chose badly, why not use it?




>        (iv).   BGP with the "Holey Route Authority"
>
>                You say:
>
>                  A central route authority keeps track of all
>                  holey routes and peers them with all ASes which
>                  implement an ITR. Each such AS implements a
>                  route-map to force those prefixes to head the
>                  same direction as the default route.
>
>                All I can say is that hasn't worked to date.

Another approach would be to add a BGP option or community or
something to the supernet announcement which advises all recipients
that the supernet prefix is associated with TRRP use and should be
discarded by systems which support TRRP.

Admitted as a particularly nasty and not truly solved corner case in TRRP.



>        (v).    Preemptive Change Notification (PCN)
>
>                You say (PCN Type 1):
>
>                  PCN Type 1: Notification of change by the DNS
>                  server The authoritative DNS Route server
>                  remembers all IP addresses which requested each
>                  Route entry during the entry's TTL. Immediately
>                  upon the Route entry's change, the Route Server
>                  sends a UDP message to port XX of the
>                  requestor. The first byte of the message is
>                  0x01. The remaining bytes are the DNS query
>                  which should immediately expire.
>
>                Can you comment on the scaling propeties of this?

PCN doesn't scale. The reality of it is that if you use it you'll pick
a maximum size for the pool of requestors you're going to remember and
when it's full it's full. Everybody else will have to pick up the
change after the TTL expires.

PCN is an optional, secondary mechanism. In a subset of cases it
allows TRRP to fail over to an alternate ETR faster than the baseline
of a few tens of seconds.

Regards,
Bill Herrin


-- 
William D. Herrin ................ [email protected]  [email protected]
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004
_______________________________________________
rrg mailing list
[email protected]
https://www.irtf.org/mailman/listinfo/rrg

Reply via email to