Robin,

I find this whole discussion of a separate reachability probing system to be fraught with a number of economic, security, more importantly, operational problems.

On Jan 21, 2009, at 8:21 PM, Robin Whittle wrote:
A further reduction in probing, and greater flexibility in deciding
which ETR the traffic packets will be tunneled to can be achieved by
the Ivip approach of using a completely separate reachability probing
system - separate from ITRs and from sending hosts.  The end-user
would either do this, or have some company do it for them.  The
probing could be done from multiple points all over the Net, to the
ETRs or more likely through the ETRs to routers and/or hosts in the
end-user network.

Really? There is no way companies are going to allow one of these 3rd- party "probing/reachability companies", even if they have a paid relationship with them, to probe through their ETR's to their internal network. Who's to say that the probing company (or companies) doesn't get hacked and they're used as a launching point to attack the inside network of a company using them for services?



Most likely, that company's specialised probing system would be
configured to make the decision on how best to map the destination
network's micronets - and would change the mapping within seconds,
according to whatever criteria were specified by the administrators
of the destination network.  Within a few seconds, all ITRs in the
world handling packets addressed to these micronets would be
tunneling these packets to the ETR chosen in the new mapping decision.

This is a generalised DFZ->destination-network probing approach,
since the probing servers would, broadly speaking, be in the DFZ.
This is the only scalable approach, since the same level of probing
would still occur if 100,000 ITRs were sending packets to the
destination network as if one, none or a few were sending packets.

The key to using a separate, dedicated, reachability probing system
(quite outside the Ivip system itself, and so which can be made to
work on any principles, any protocols etc. which suit the destination
network being probed) is Ivip's real-time mapping distribution
system.  This tells all the ITRs which need to know which ETR to
tunnel the packets to.  This greatly simplifies the ITR and ETR
design and separates out reachability testing and the resulting
decision-making from the core-edge-separation system itself. (LISP,
APT, TRRP and Six/One Router monolithically integrate them.)

This won't fly operationally. First, I can't envision an economic model that would cause someone or, better yet, multiple probing/ reachability companies to be launched as you envision. Ultimately, you're talking about not only a lot of fixed costs for servers and such, but more importantly the OpEx for colocation costs and bandwidth that company would be burning to send out millions, billions or trillions of "probes" to everyone's ETR on the planet.

Second, these probes from a 3rd-party probing/reachability company will not reliably tell end-user networks that there is, in fact, known "good" connectivity to an ETR, because of ECMP and/or LAG being used along certain paths within SP's networks and not others. More specifically, ECMP and/or LAG are load-balancing mechanisms that SP's widely use to scale the physical BW between nodes in their network, (e.g.: to scale BW between a city-pair to 100G and [much] larger by logically bonding together multiple OC-192's, etc. into a single "bundle"). The key part of those technologies is core routers use IP header information, (L3 addresses and/or L4 port information), as input keys to their load-hashing algorithms to determine the particular output link in a LAG or ECMP "bundle" that a particular flow goes on. The problem observed in operational networks today is that "soft-failures" cause a particular link in a bundle to stop forwarding traffic, which goes unnoticed to IGP's (OSPF or IS-IS) or BGP and, unfortunately, results in blackholing of customer traffic until the problem is isolated and "bad" link in the bundle is taken out of service. Unfortunately, there aren't tools to diagnose this problem today so it's very difficult and time-consuming to troubleshoot & resolve these problems. More to the point of why your proposal of 3rd-party probing companies will not work is: - When these 3rd-party probing companies are transmitting probes toward an ETR they're going to be unable to construct *identical* EID within RLOC packets that would cause them to be hashed and push on the same links that are bound for that customer's ETR. IOW, the 3rd-party probe company *will* get false-positives that will either: a) leave the ETR's in service when there is a bad link blackholing traffic on a parallel path unseen by the 3rd-party company; or, b) falsely take an ETR out-of-service because the 3rd-party probing company saw traffic being blackholed on a link that may, in fact, carry very little or know end-user traffic toward that ETR. - Economically, it's infeasible for this imaginary probing company, or set of probing companies, to be able to deploy servers to every POP in every ISP to ensure full coverage of these LAG & ECMP paths ... or, for that matter, a decent percentage of each operator's network that it can reasonably approximate reachability across all paths in that every ISP's networks. - As an operator, I don't have much (any?) faith in trusting others to determine connectivity (or lack thereof) to/from my ETR's. I trust my network and the configuration of my ITR's/ETR's to determine that. Furthermore, when someone goes wrong (e.g.: probes aren't returned) I can easily login to those devices (since I own them) and quickly troubleshoot the problem and restore service how *I* deem appropriate for my network.

So, to summarize, *if* we have to do probing, the only operationally and economically feasible method is either to/from the ITR's/ETR's or the end-systems themselves. (More on this just below).


So having something other than the ITRs doing the probing and making
the decisions does involve some extra complexity - a real-time
mapping system.  I believe this is a small price to pay for the
greater flexibility, more robust probing (all the way to the
destination network, not just to the ETRs), greater simplicity in
ITRs and ETRs, reduction in probing traffic etc.  Also, it enables
real-time control of ETR address for incoming TE.

This discussion does raise an interesting architectural point with respect to all map-and-encap solutions, which depend on reachability probing between ITR's and ETR's. Specifically, do these reachability probes faithfully represent end-user (host-machine originated and terminated) flows so they are travelling along the same path as host- to-host data flows? I would argue that they do not, (without substantially increasing the probing traffic load/bandwidth to "search" through all possible paths used by end-user traffic).

OTOH, solutions that either:
a) do proactive probing originating from hosts at something at or just below the TCP/UDP layer; or, better yet, b) piggyback ***reachability*** detection along with active TCP/UDP traffic flow (cf: TCP ACK's) between hosts
... would faithfully represent that a given path really is working.

FWIW, I don't think there's a straightforward solution for map-and- encap solutions to the aforementioned probing over LAG/ECMP paths problem. If there is, I'm all ears. OTOH, if there's not a solution, I do not think we should gloss over it and, instead, I would strongly recommend we should point this out out very clearly within an "Operational Considerations" section of any/all map-and-encaps protocol specs or an overall/all-encompassing companion "Operational Considerations" document noting this and other 'challenges' that map- and-encaps can't feasibly address. Said document(s) may prove helpful in evaluating the final set of candidate solutions that the RRG will offer its recommendation on.

-shane
_______________________________________________
rrg mailing list
[email protected]
http://www.irtf.org/mailman/listinfo/rrg

Reply via email to