Hi Shane, Thanks for your response to my message in the "Why a host-based solution does not necessarily add signalling load" thread:
http://www.irtf.org/pipermail/rrg/2009-January/000838.html I am replying with a more specific Subject line. You wrote: > I find this whole discussion of a separate reachability probing > system to be fraught with a number of economic, security, more > importantly, operational problems. > >> A further reduction in probing, and greater flexibility in >> deciding which ETR the traffic packets will be tunneled to can >> be achieved by the Ivip approach of using a completely separate >> reachability probing system - separate from ITRs and from >> sending hosts. The end-user would either do this, or have some >> company do it for them. The probing could be done from multiple >> points all over the Net, to the ETRs or more likely through the >> ETRs to routers and/or hosts in the end-user network. > > Really? There is no way companies are going to allow one of these > 3rd-party "probing/reachability companies", even if they have a > paid relationship with them, to probe through their ETR's to their > internal network. Who's to say that the probing company (or > companies) doesn't get hacked and they're used as a launching > point to attack the inside network of a company using them for > services? I can't see any inherent security problem. The probing could be by some established protocol, or a new one. In either case, there is no need for the device which responds to do anything more than respond. So there is not any need for the protocol to be capable of altering the responding device, running code on it etc. Maybe some fancy protocol might be required, but ordinary ping should do the trick. Several (or perhaps dozens of) separately located, probe servers send an ordinary ping packet to the CE router, to some router or host behind the CE router - whatever the customer (the administrators of the end-user network being probed) want. The ping packet would contain a nonce, and the echo reply would contain the nonce too - so it is easy to make it secure against an of-path attacker sending forged echo replies. The probing server would act in some ways as an ITR. It generates a ping packet and tunnels it to the device to be pinged. ("Tunnel" in this case means encapsulate, or for Ivip's forwarding modes, altering the IP header and passing the packet to routers which recognise this new header format and forward according to it. See links at end.) However, while an ITR always looks up the current mapping from a local full database query server (assuming all ITRs are caching ITRs), the probing server has its own arrangements for which ETR to tunnel the packet to, in order to reach the probed device. For instance multiple probing servers would each send probes to one or perhaps more devices in the destination network. For each such device, which is on a micronet (EID in LISP terminology) address, the device could be reached by two or more ETRs. (I am assuming two or more upstream ISPs, since this is a multihomed network. Whether the ETRs are physically at the ISPs, or physically at the destination network, doesn't matter.) So Probing Server A would periodically generate two packets with the destination address of Device N in the destination network. One would be tunneled to some ETR X and the other to some ETR Y in another ISP. Each packet would have a different nonce so the probing server could keep a track of reachability from itself to Device N via the two ETRs. The echo replies could come back via any one of the destination network's upstream ISPs. The purpose of this system is to test reachability of the destination multihomed end-user network via incoming packets (coming into this network) via multiple ETRs. {If it was desired to test outgoing reachability from the destination network to various parts of the DFZ - such as the various locations of probing servers - then this would be a separate process. The result of that process would be to better inform the CE router's decision about where to send outgoing packets (via one upstream ISP or another) - including perhaps by sensing packet loss due to congestion in or beyond the various upstream ISPs. However, that is a completely separate project from the need for test inbound reachability in order to change the mapping so traffic uses an ETR which is currently working better than one which is not working properly.} Since all this could be done via ordinary ping, I don't see how there are any security problems in this. It would be up to the customer which routers or hosts in their network were probed, how often etc. If the ETRs are in the upstream ISPs, probably all that needs to be tested is reachability to the CE router. Even if there were security problems, how could Ivip's approach be less secure than the alternatives: to have any number of ITRs or sending hosts probing reachability in some way? At least with Ivip's approach, the customer can know, or specify, which probing servers probe its network. Then, if there were any security problems, the responding devices could be configured to ignore probes from any other IP addresses. >> Most likely, that company's specialised probing system would be >> configured to make the decision on how best to map the >> destination network's micronets - and would change the mapping >> within seconds, according to whatever criteria were specified by >> the administrators of the destination network. Within a few >> seconds, all ITRs in the world handling packets addressed to >> these micronets would be tunneling these packets to the ETR >> chosen in the new mapping decision. >> >> This is a generalised DFZ->destination-network probing approach, >> since the probing servers would, broadly speaking, be in the >> DFZ. This is the only scalable approach, since the same level of >> probing would still occur if 100,000 ITRs were sending packets >> to the destination network as if one, none or a few were sending >> packets. >> >> The key to using a separate, dedicated, reachability probing >> system (quite outside the Ivip system itself, and so which can >> be made to work on any principles, any protocols etc. which suit >> the destination network being probed) is Ivip's real-time >> mapping distribution system. This tells all the ITRs which need >> to know which ETR to tunnel the packets to. This greatly >> simplifies the ITR and ETR design and separates out reachability >> testing and the resulting decision-making from the >> core-edge-separation system itself. (LISP, APT, TRRP and Six/One >> Router monolithically integrate them.) > > This won't fly operationally. First, I can't envision an economic > model that would cause someone or, better yet, multiple > probing/reachability companies to be launched as you envision. Depending on the sophistication of the probing service, it shouldn't be too hard to write the code, run the servers etc. They need to be securely linked together, be a redundant, distributed, network etc. It is not trivial, but it is not inordinately expensive. The servers can be bog-standard COTS devices, which are perfectly fast and reliable. They can be sitting in data centres in various locations - including simply by means of renting a few servers in various hosting companies. No-doubt there would be some open-source code for a probing system, and then various folks might set up cooperative sets of servers to do the work for themselves. Small-time operators could use the open-source code to make their own probing networks just by renting servers in a bunch of data centres. > Ultimately, you're talking about not only a lot of fixed costs for > servers and such, but more importantly the OpEx for colocation > costs and bandwidth that company would be burning to send out > millions, billions or trillions of "probes" to everyone's ETR on > the planet. Sure, but the bandwidth is not high - the packets can be short. Bandwidth is cheap, especially for continual rates of usage like this. A couple of hundred dollars a month per server would do the trick, unless business was so brisk that the probe volume grew to something really big - but by then, there would be plenty of cash to spend more. A good service would offer all sorts of flexible options in terms of the probe rate, the number of probing servers which do the probing, fancy algorithms for ignoring short outages if the customer wanted them ignored etc. However, a more straightforward reachability probing service would take its marching orders something like this: One or more IP addresses to probe (eg CE router, internal routers, hosts etc.) List of two or more ETRs to probe these by. Frequency of probing. Success rate above which reachability will be considered good. How long to wait if probes from one or more probe servers are not acknowledged before changing the mapping to use another ETR. How long to wait once connectivity via the original ETR is found to be restored before changing the mapping back to the original ETR. List of the current micronets. Preference for each micronet for which ETR to use if both are working. Username and password by which the probing system could change the mapping of all these micronets, via whatever RUAS (Root Update Authority System) company, or intermediary to an RUAS, the destination network uses to send its real-time mapping changes. The end-user network retains control of their mapping by being able to generate such usernames and passwords for their mapped address space - so the end-user network can easily change these settings at the RUAS or whatever company to prevent the current probing service changing their mapping if the current company is abandoned in favour of some other company, or if the destination network wants to directly control its own mapping. Any such service would also have to be secure. I am assuming the service itself would be given the credentials to alter the mapping of the destination network's micronets. If the probing service simply conveyed the results to the destination network, then this is less of a security risk - but the communication still has to be done securely. I think most end-user networks would prefer some purpose built, secure, distributed, external system to execute the decisions and make the mapping changes - from outside their own network. Probing companies would develop their probing and decision-making algorithms in response to demand. None of this probing stuff needs to be specified as part of Ivip or any other such real-time mapping core-edge separation system. The probing arrangements are completely separate from the core-edge separation system itself and so it is up to the probing companies and their customers exactly how it would be done. IETF standards would be good, of course, but they would be entirely modular and separate from Ivip or whatever. Such modularity and open-ended flexibility is impossible with LISP, APT, TRRP (or Six/One Router I guess) because these all build the probing and decision-making functionality into the ITRs. Host-based systems are equally monolithic. > Second, these probes from a 3rd-party probing/reachability company > will not reliably tell end-user networks that there is, in fact, > known "good" connectivity to an ETR, because of ECMP and/or LAG > being used along certain paths within SP's networks and not > others. More specifically, ECMP and/or LAG are load-balancing > mechanisms that SP's widely use to scale the physical BW between > nodes in their network, (e.g.: to scale BW between a city-pair to > 100G and [much] larger by logically bonding together multiple > OC-192's, etc. into a single "bundle"). The key part of those > technologies is core routers use IP header information, (L3 > addresses and/or L4 port information), as input keys to their > load-hashing algorithms to determine the particular output link in > a LAG or ECMP "bundle" that a particular flow goes on. The > problem observed in operational networks today is that > "soft-failures" cause a particular link in a bundle to stop > forwarding traffic, which goes unnoticed to IGP's (OSPF or IS-IS) > or BGP and, unfortunately, results in blackholing of customer > traffic until the problem is isolated and "bad" link in the bundle > is taken out of service. Unfortunately, there aren't tools to > diagnose this problem today so it's very difficult and > time-consuming to troubleshoot & resolve these problems. More to > the point of why your proposal of 3rd-party probing companies will > not work is: - When these 3rd-party probing companies are > transmitting probes toward an ETR they're going to be unable to > construct *identical* EID within RLOC packets that would cause > them to be hashed and push on the same links that are bound for > that customer's ETR. IOW, the 3rd-party probe company *will* get > false-positives that will either: a) leave the ETR's in service > when there is a bad link blackholing traffic on a parallel path > unseen by the 3rd-party company; or, b) falsely take an ETR > out-of-service because the 3rd-party probing company saw traffic > being blackholed on a link that may, in fact, carry very little or > know end-user traffic toward that ETR. - Economically, it's > infeasible for this imaginary probing company, or set of probing > companies, to be able to deploy servers to every POP in every ISP > to ensure full coverage of these LAG & ECMP paths ... or, for that > matter, a decent percentage of each operator's network that it can > reasonably approximate reachability across all paths in that > every ISP's networks. OK - but how is the scheme I propose going to be any more affected (false detection of outages not actually affecting traffic, or not detecting failures which do affect traffic) by the flaky nature of these outages than a system in which ITRs probe reachability (to ETRs, which is not enough - or to CE routers or beyond)? Googling ECMP and LAG finds only 2.3k pages with a lot of stuff from one vendor (Force 10), a US patent 7190696 (Rajeev Manur et al.) and your own "draft-amante-oam-ng-requirements". Without further reseach, I can't easily tell how widely these techniques are used. Assuming such techniques are widely used, what do the routers hash on? Can you be sure that for every host-to-host session, that the resulting tunneled packets from an ITR will hash to the same value and so share the same physical path? Can you be sure that the ITR's attempt to probe reachability to the ETR, or through the ETR to the CE router in the destination network, will hash to the same value? If not, then your critique probably applies to ITR-based probing too. What about host-based probing? Depending on how a host-based system works, how can you be sure these hash algorithms will send all traffic packets for a particular host session on the same physical path as whatever probe packets the host sends? Likewise for all these approaches, including Ivip, what about similar problems in the path by which the probe replies travel? While I agree that the use of fancy techniques which lead to flakier outage behaviour does pose a difficulty for Ivip's approach to reachability testing, I would want to see a lot more detail before I concluded these problems were significantly worse for Ivip than for LISP etc. core-edge separation schemes or some as-yet not designed host-based core-edge elimination schemes. One major general advantage of Ivip's modular approach is that if the destination network doesn't like how reachability testing is done, and or doesn't like how this leads to decisions about real-time changes to its mapping, then they can easily hire another company which does a better job. That is impossible for all proposals and classes of proposals discussed in the RRG other than Ivip, since they all build the reachability testing and decision making monolithically into the scalable routing solution. > - As an operator, I don't have much (any?) > faith in trusting others to determine connectivity (or lack > thereof) to/from my ETR's. But if you don't like my suggested Ivip approach, you can always run your own system as you wish. I can't see that running your own system is going to be any less secure, any less reliable etc. than the non-Ivip alternatives, in which any Tom-Dick or Harriet ITR in the world (too many to know about) expects to be able to probe reachability, and will do something you don't want if your network fails to respond to pretty much every one of their probe packets. Worse still, a host-based solution: you are leaving reachability testing to every host in the world which might want to send a packet to your network. Your network needs to work well with all them, including potentially badly written host implementations which are likely to be more numerous, complex and flaky than the smaller number of ITR implementations in the LISP etc. model. With Ivip, you don't have to worry about any of this. You either hire a company which probes your network and makes mapping decisions the way you like, or you do it yourself. Ivip's approach is perfectly scalable. Probe traffic is continual and low-level. It doesn't matter whether there are zero hosts or ITRs sending packets to your network now, or a hundred million. All the other architectures suffer grave scaling problems. They are all monolithic and restricted to being very inflexible, since their probing arrangements need to be set in stone and implemented on every ITR (and probably every ETR, with some method of testing reachability all the way to the destination network), or worse still, every host in the Net. > I trust my network and the > configuration of my ITR's/ETR's to determine that. I am assuming that "you" are the administrator of a multihomed end-user network using Ivip micronet space, or alternatively LISP etc. EID space, or its equivalent in APT, TRRP or Six/One Router - or its equivalent in a host-based core-edge elimination scheme. Your own servers cannot on their own determine that they are reachable from the hosts who currently want to send you packets. Likewise, they can't determine on their own the reachability from all corners of the Net to the ETRs your multihomed end-user network depends on for incoming packets. In principle sending hosts or ITRs can determine it, by probing your ETR, routers, destination hosts or whatever. They could also communicate this back to your network, but it is they - the non-Ivip ITRs or sending hosts (in a host-based system) which need to make the decision about how to send traffic packets to your network. However, those approaches have insurmountable scaling problems. Ivip's approach scales well and enables you to determine reachability, and to control mapping in real-time, however you like. I was just suggesting that for most end-user networks, the best approach would be to hire a company to do this for them. Having the probing and data collection done from within your network would be possible - but you would need to rely on external servers to reflect back your probes. (It does not scale if you want to probe every currently active ITR or sending host.) You could collect the information and make your own decisions about changed mapping within your own network. However, you would need to have a fancy arrangement to make this work robustly when you need it most - when there is network instability and one or more outages on your upstream ISP links, or between those ISPs and the rest of the Net. A well engineered, distributed, network of probing servers run by a specialist reachability probing company would always be able to communicate reliably with your mapping company and make the changes you specified they should make, in response to whatever they have just determined about reachability. > Furthermore, > when someone goes wrong (e.g.: probes aren't returned) I can > easily login to those devices (since I own them) and quickly > troubleshoot the problem and restore service how *I* deem > appropriate for my network. I am not sure how this relates to an objection to Ivip's approach. If you hire a probing company, it is pretty obvious that they will be able to automatically communicate to you (wherever you are, outside or inside your own network - assuming it is reachable by some means) that something has gone wrong. It is up to you to instruct that company how to probe, how to decide about mapping changes, how to report trouble to you etc. You could roll your own system if you really want, but I figure most multihomed end-user networks would prefer to pay a modest annual fee to some specialist company which is motivated to serve them reliably, and provide whatever probing, decision making etc. services you and other such companies want. > So, to summarize, *if* we have to do probing, the only > operationally and economically feasible method is either to/from > the ITR's/ETR's or the end-systems themselves. (More on this just > below). You haven't convinced me of this so far. >> So having something other than the ITRs doing the probing and >> making the decisions does involve some extra complexity - a >> real-time mapping system. I believe this is a small price to >> pay for the greater flexibility, more robust probing (all the >> way to the destination network, not just to the ETRs), greater >> simplicity in ITRs and ETRs, reduction in probing traffic etc. >> Also, it enables real-time control of ETR address for incoming >> TE. > > This discussion does raise an interesting architectural point with > respect to all map-and-encap solutions, which depend on > reachability probing between ITR's and ETR's. Specifically, do > these reachability probes faithfully represent end-user > (host-machine originated and terminated) flows so they are > travelling along the same path as host-to-host data flows? I > would argue that they do not, (without substantially increasing > the probing traffic load/bandwidth to "search" through all > possible paths used by end-user traffic). OK - this is much the same concern as I raised above about LISP etc. ITRs being able to reliably probe reachability via the ECMP and LAG systems you mentioned. > OTOH, solutions that either: a) do proactive probing originating > from hosts at something at or just below the TCP/UDP layer; or, > better yet, b) piggyback ***reachability*** detection along with > active TCP/UDP traffic flow (cf: TCP ACK's) between hosts ... > would faithfully represent that a given path really is working. I think that reachability probing at the host or ITR level can't depend on traffic packets. Firstly, there are PMTUD and efficiency problems bloating some or all traffic packets with whatever is required to elicit a response indicating reachability. Secondly, if an ITR is currently sending packets to ETR-A, since this is currently preferred, it also arguably needs to be probing reachability to (really through) ETR-B, ETR-C or however many other ETRs the network could be reached by so it can make a quick change-over to one of these, in the event that ETR-A fails. The probing of those other ITRs - ITR-B and ITR-C etc. - can't involve traffic packets because all traffic is currently being tunneled via ITR-A. > FWIW, I don't think there's a straightforward solution for > map-and-encap solutions to the aforementioned probing over > LAG/ECMP paths problem. If there is, I'm all ears. I agree, these techniques do pose problems - for any reachability probing system. I am not convinced yet that Ivip's approach would fare significantly worse than LISP's. LISP must use the ITRs to do the probing. Ivip can't use the ITRs for this (because it doesn't scale and to keep ITRs simple) but you can organise probing from special probing servers all around the world if you like, and do anything you like with the results when deciding how to change the mapping. > OTOH, if > there's not a solution, I do not think we should gloss over it > and, instead, I would strongly recommend we should point this out > out very clearly within an "Operational Considerations" section of > any/all map-and-encaps protocol specs or an > overall/all-encompassing companion "Operational Considerations" > document noting this and other 'challenges' that map-and-encaps > can't feasibly address. Said document(s) may prove helpful in > evaluating the final set of candidate solutions that the RRG will > offer its recommendation on. I agree entirely. Is your draft: http://tools.ietf.org/html/draft-amante-oam-ng-requirements-01 the best place to understand ECMP, LAG and the like? If not, can you point to better references? - Robin Ivip's forwarding modes, with modified IP headers and likewise modified DFZ and other routers, are described in: ETR Address Forwarding (EAF) - for IPv4 http://tools.ietf.org/html/draft-whittle-ivip4-etr-addr-forw Prefix Label Forwarding (PLF) - for IPv6 http://www.firstpr.com.au/ip/ivip/ivip6/ _______________________________________________ rrg mailing list [email protected] http://www.irtf.org/mailman/listinfo/rrg
