Excerpts from Pekka Nikander on Thu, Jan 15, 2009 10:37:16PM +0200: >> It's clear that pure endpoint-based multipathing, a la Shim6 REAP, >> cannot scale. > > Would you please be more specific. It is not at all clear to me. > AFAIK, SCTP is doing essentially the same thing. (I can see the > signalling load there, causing potential packet storms; is that the > "scalability" problem you are referring to?)
I should have done the math first but ... Let's assume, conservatively, on the order of ten billion communicating endpoint pairs and 2 locators per endpoint. How often do you test locator pairs that you are not using? Let's be very conservative and assume once per minute (endpoints can get away with this low level). Even so, that's an extra 2^10 packets per minute across the core ... when things are going perfectly, just for steady state maintenance. I can't characterize what percentage of traffic that is, because, for example, there may be plenty of HTTP sessions that are idle but being kept open. We could reduce that maintenance overhead. Then we add in what happens when there is a network problem. I suspect that because there are active sessions and data is flowing, you would try all alternative paths at once, with fast probes and duplicate data packets. I really don't know what would happen, but I'm awed at the thought. >> Putting path failure detection and recovery in ITRs instead of in >> the endpoints can greatly reduce the number of messages flying >> around, but a simple approach there doesn't scale either, and >> without information they don't have much choice. > > AFAICS, the fundamental problem here is that the network today (BGP) > cannot detect failures at the desired speed, leading to the > situation where the hosts (or your funnel points) need to overcome > the deficiency. Suppose I am at locators A and B, you are talking to me at locator A and A becomes unreachable. Regardless of how fast routing detects that, you (an endpoint) are going to need to detect it yourself, verify that B is reachable, and switch to B, so you need your own fault detection and recovery protocol. >>> 8) When RLOC-EID is done properly (e.g., like HIP where each >>> concept appears on a different layer of the protocol stack), there >>> is no liveness problem (nor can there be one). >> >> I don't see it. HIP properly removes dependency on location-based >> names from identification functions, but it does nothing to solve >> the multipath problem. You still need to find viable paths, >> ascertain that multiple locators refer to the same entity >> (difficult when one or both are moving -- rendezvous servers may be >> required, tsk), detect path failure and switch to another one. > > I mostly agree. In the HIP WG, years ago the decision was to leave > the problem of path failure detection to the SHIM6 WG, with the > intention to integrate REAP to HIP. However, with HIP (even LHIP) > verifying that the locators refer to the same entity is relatively > easy, as long as at least one of the parties retains at least one > stable locator. Rendezvous servers are a nice way to introduce > stable locator to an otherwise dynamic system... And then, under > those assumptions, finding paths and switching over is relatively > easy. Works very well in practise. We've multiple times tested it > with both proxies and individual hosts, using 2-3 different radio > interfaces at each mobile host/proxy, and moving at typical vehicle > speeds in difficult terrain. But I digress. That's not a digression, that's real evidence :-). My concern is not that "finding paths and switching over is relatively easy", but what happens when everyone does it. Scott _______________________________________________ rrg mailing list [email protected] http://www.irtf.org/mailman/listinfo/rrg
