Re: [rrg] Why a host-based solution does not necessarily add signalling load

Shane Amante Wed, 21 Jan 2009 21:28:31 -0800

Robin,

I find this whole discussion of a separate reachability probing systemto be fraught with a number of economic, security, more importantly,operational problems.


On Jan 21, 2009, at 8:21 PM, Robin Whittle wrote:

A further reduction in probing, and greater flexibility in deciding
which ETR the traffic packets will be tunneled to can be achieved by
the Ivip approach of using a completely separate reachability probing
system - separate from ITRs and from sending hosts.  The end-user
would either do this, or have some company do it for them.  The
probing could be done from multiple points all over the Net, to the
ETRs or more likely through the ETRs to routers and/or hosts in the
end-user network.

Really? There is no way companies are going to allow one of these 3rd-party "probing/reachability companies", even if they have a paidrelationship with them, to probe through their ETR's to their internalnetwork. Who's to say that the probing company (or companies) doesn'tget hacked and they're used as a launching point to attack the insidenetwork of a company using them for services?

Most likely, that company's specialised probing system would be
configured to make the decision on how best to map the destination
network's micronets - and would change the mapping within seconds,
according to whatever criteria were specified by the administrators
of the destination network.  Within a few seconds, all ITRs in the
world handling packets addressed to these micronets would be
tunneling these packets to the ETR chosen in the new mapping decision.

This is a generalised DFZ->destination-network probing approach,
since the probing servers would, broadly speaking, be in the DFZ.
This is the only scalable approach, since the same level of probing
would still occur if 100,000 ITRs were sending packets to the
destination network as if one, none or a few were sending packets.

The key to using a separate, dedicated, reachability probing system
(quite outside the Ivip system itself, and so which can be made to
work on any principles, any protocols etc. which suit the destination
network being probed) is Ivip's real-time mapping distribution
system.  This tells all the ITRs which need to know which ETR to
tunnel the packets to.  This greatly simplifies the ITR and ETR
design and separates out reachability testing and the resulting
decision-making from the core-edge-separation system itself. (LISP,
APT, TRRP and Six/One Router monolithically integrate them.)

This won't fly operationally. First, I can't envision an economicmodel that would cause someone or, better yet, multiple probing/reachability companies to be launched as you envision. Ultimately,you're talking about not only a lot of fixed costs for servers andsuch, but more importantly the OpEx for colocation costs and bandwidththat company would be burning to send out millions, billions ortrillions of "probes" to everyone's ETR on the planet.

Second, these probes from a 3rd-party probing/reachability companywill not reliably tell end-user networks that there is, in fact, known"good" connectivity to an ETR, because of ECMP and/or LAG being usedalong certain paths within SP's networks and not others. Morespecifically, ECMP and/or LAG are load-balancing mechanisms that SP'swidely use to scale the physical BW between nodes in their network,(e.g.: to scale BW between a city-pair to 100G and [much] larger bylogically bonding together multiple OC-192's, etc. into a single"bundle"). The key part of those technologies is core routers use IPheader information, (L3 addresses and/or L4 port information), asinput keys to their load-hashing algorithms to determine theparticular output link in a LAG or ECMP "bundle" that a particularflow goes on. The problem observed in operational networks today isthat "soft-failures" cause a particular link in a bundle to stopforwarding traffic, which goes unnoticed to IGP's (OSPF or IS-IS) orBGP and, unfortunately, results in blackholing of customer trafficuntil the problem is isolated and "bad" link in the bundle is takenout of service. Unfortunately, there aren't tools to diagnose thisproblem today so it's very difficult and time-consuming totroubleshoot & resolve these problems. More to the point of why yourproposal of 3rd-party probing companies will not work is:- When these 3rd-party probing companies are transmitting probestoward an ETR they're going to be unable to construct *identical* EIDwithin RLOC packets that would cause them to be hashed and push on thesame links that are bound for that customer's ETR. IOW, the 3rd-partyprobe company *will* get false-positives that will either: a) leavethe ETR's in service when there is a bad link blackholing traffic on aparallel path unseen by the 3rd-party company; or, b) falsely take anETR out-of-service because the 3rd-party probing company saw trafficbeing blackholed on a link that may, in fact, carry very little orknow end-user traffic toward that ETR.- Economically, it's infeasible for this imaginary probing company, orset of probing companies, to be able to deploy servers to every POP inevery ISP to ensure full coverage of these LAG & ECMP paths ... or,for that matter, a decent percentage of each operator's network thatit can reasonably approximate reachability across all paths in thatevery ISP's networks.- As an operator, I don't have much (any?) faith in trusting others todetermine connectivity (or lack thereof) to/from my ETR's. I trust mynetwork and the configuration of my ITR's/ETR's to determine that.Furthermore, when someone goes wrong (e.g.: probes aren't returned) Ican easily login to those devices (since I own them) and quicklytroubleshoot the problem and restore service how *I* deem appropriatefor my network.

So, to summarize, *if* we have to do probing, the only operationallyand economically feasible method is either to/from the ITR's/ETR's orthe end-systems themselves. (More on this just below).

So having something other than the ITRs doing the probing and making
the decisions does involve some extra complexity - a real-time
mapping system.  I believe this is a small price to pay for the
greater flexibility, more robust probing (all the way to the
destination network, not just to the ETRs), greater simplicity in
ITRs and ETRs, reduction in probing traffic etc.  Also, it enables
real-time control of ETR address for incoming TE.

This discussion does raise an interesting architectural point withrespect to all map-and-encap solutions, which depend on reachabilityprobing between ITR's and ETR's. Specifically, do these reachabilityprobes faithfully represent end-user (host-machine originated andterminated) flows so they are travelling along the same path as host-to-host data flows? I would argue that they do not, (withoutsubstantially increasing the probing traffic load/bandwidth to"search" through all possible paths used by end-user traffic).


OTOH, solutions that either:

a) do proactive probing originating from hosts at something at or justbelow the TCP/UDP layer; or, better yet,b) piggyback ***reachability*** detection along with active TCP/UDPtraffic flow (cf: TCP ACK's) between hosts

... would faithfully represent that a given path really is working.

FWIW, I don't think there's a straightforward solution for map-and-encap solutions to the aforementioned probing over LAG/ECMP pathsproblem. If there is, I'm all ears. OTOH, if there's not a solution,I do not think we should gloss over it and, instead, I would stronglyrecommend we should point this out out very clearly within an"Operational Considerations" section of any/all map-and-encapsprotocol specs or an overall/all-encompassing companion "OperationalConsiderations" document noting this and other 'challenges' that map-and-encaps can't feasibly address. Said document(s) may prove helpfulin evaluating the final set of candidate solutions that the RRG willoffer its recommendation on.


-shane
_______________________________________________
rrg mailing list
[email protected]
http://www.irtf.org/mailman/listinfo/rrg

Re: [rrg] Why a host-based solution does not necessarily add signalling load

Reply via email to