On Fri, Dec 4, 2009 at 7:34 PM, Noel Chiappa <[email protected]> wrote:
>    > From: Patrick Frejborg <[email protected]> architecture .
>
>    > here I do get headache
>
> Sorry... :-)
>

Not your fault - the pain is caused by myself :-)

>    > I think this is an architectural question.
>
> The point your raise immediately below, very definitely yes! :-)
>
>    > In order to get this working you need to combine two architectures -
>    > the routing architecture and a mapping database architecture .
>
> Well, not necessarily. Here is how I look at the problem (sorry if this is a
> bit of a diversion, but I want to explain the framework in which I am
> thinking about this problem - it helps me avoid headaches :-).
>
> To me, designs which encapsulate and decapsulate packets which are sent
> across some existing substrate are best thought of as a new packet-switching
> layer, one built on top of an existing PS layer. (This is a circumstance we
> have seen before, e.g. in IP over X.25 networks, or over the ARPANET, etc,
> etc.)
>

As frame relay, ATM etc

> Now, packet-switching systems all have common problems they have to solve:
> selecting the next hop, making sure that next-hop is up and reachable (i.e.
> packets from the first box can successfully get to the second box across
> whatever is linking the two), etc.
>

Yes, and most of them have been static overlay solutions, only a few dynamic

> So, the new encapsulating layer system has to solve all these classical
> packet-switching problems. To do so, it can either build its own 'native'
> mechanisms to do so (i.e. direct inter-device exchanges among the set of
> encapsulating and/or decapsulating devices), or it can try and 'tap into'
> existing mechanisms at the layer below to perform these functions.
>
> Which way to go is a complex question, which includes questions like 'how
> expensive is a native mechanism' (a particular concern for overlay systems,
> which may have thousands of 'direct' neighbours at their level, unlike
> lower-level systems built directly on the hardware, which likely have much
> smaller numbers of direct neighbours); 'is the information in the lower layer
> even accessible' (in some cases it is not, such as the ARPANET); 'can the
> lower-layer system really do what I need' (and sometimes the answer is 'no' -
> or at least not 'yes, with a high enough level of reliabiility for my
> purposes'); etc.
>
> So, in this framework, your question can be recast as a series of questions,
> such as 'is the information lower layer's routing system accessible to the
> higher layer' (yes); 'does the lower-level's routing architecture provide all
> the information I need at the higher level' (depending on the level of
> reliability you need for things like reachability, perhaps not); etc.
>
>
>    > You have a mapping database but that database have no information about
>    > an ETR's availability ... unless you integrate the current routing
>    > architecture with the mapping database. If the link between the ETR and
>    > DFZ is lost the mapping database need to know that, i.e. the routing
>    > protocol must inform the database.
>
> There are two _separate_ functions happening at the encapsulating layer:
> path-selection, and neighbour liveness. Many (most?) routing systems do not
> clearly separate these functions, but this all becomes easier to think about
> - and engineer for - if you do.
>

Ah, I have never understood the "neighbour liveness" concept - but I
think I have seen the problem. In order to get down the prefixes in
DFZ, you must compress the prefixes and then you will loose prefix
visibility which is very much the same as the "neighbour liveness" -
you don't know if the last hop of the destination is reachable.
You are right, it becomes easier to think - and the headache is less severe :-)


> Now, clearly, there is a feedback from the second to the first: there is no
> point in selecting a neighbour as a next hop if you cannot reach it. However,
> particularly in a encapsulating system (which have a mapping subsystem which
> might not be as dynamic as one would like for path selection), there is value
> to separating the two.
>
> The mapping output produces a set of 'plausible' next hops for that ultimate
> destination. Each of that set is tested to see if it is reachable. If not, it
> is discarded as a plausible next hop for that (in fact, for all) ultimate
> destinations. This test can either use some existing lower-level mechanism
> (e.g. the routing, at the layer below), or a 'native' mechanism at the
> encapsulation system layer - e.g. a direct 'ping'.
>
> There is no absolute need to update the mapping system's data to indicate
> that one of the 'next hops' it lists is down, provided that there is a
> liveness check on actually using one of those listed potential next hops.
>

True.
But could you still use a routing protocol to expose the neighbour liveness?
What if there are two types of RIBs
- one that is installing the prefixes in the FIB, this is a very
compressed RIB in the DFZ
- one "liveness" RIB, containing only host routes for every ETR in
Internet but the prefixes under this RIB is never installed to the FIB
The ITR needs to do two lookup per packet, first check the liveness
RIB - if ok, send the packet to the FIB. If not ok, send the packet to
the closest mapping server. Other routers don't use the liveness RIB
for forwarding of packets.
The liveness RIB could be a new address-family under BGP, the RIB
would be huge but could we avoid the tunnels (which could create
scalability issues, that we have seen from ATM & frame relay)??
Another drawback is that the core routers need to be upgraded but only
the control plane needs to be upgraded if there is enough "softmemory"
 and CPU resources on the routers (to carry the liveness RIB).
Most likely you have looked into this already, trade-offs needs to be
taken and this approach is less appealing, I guess.


>
>    > after that the database members needs to inform their ITRs so that the
>    > ITRs that have ongoing sessions to the affected ETR will flush their
>    > cache and replace the entry to the secondary ETR. So you will have a
>    > redistribution from routing protocol -> mapping database -> routing
>    > protocol, instead of having BGP churn you could have cache churn
>    > instead.
>
> It is indeed the case that if a mapping is updated, entities which have
> cached copies of that mapping need to find out that the mapping has been
> updated. This is not an insolube engineering problem, and the exact details
> of the solution will depend on where copies may be cached: e.g. only in
> devices which are directly communicating with the entities named in the
> mapping, e.g. ITRs; or perhaps in other places as well.
>
>
>    > If you prefer to avoid the redistribution you could let the routing
>    > architecture take care to inform the ITR of ETRs availability.
>
> See above comments about two separate functions, etc.
>
>    > But that will have negative impact on the DFZ. Today the EID in a
>    > multi-homing solution usually creates a /20 prefix - regardless of how
>    > many ISP connections it uses. But by using ETRs each attachment point
>     will create a /32 entry in the DFZ
>
> You seem to be assuming that the only way for encapsulating-layer devices to
> perform the kind of control functionality they need at their layer is by
> using lower-layer mechanisms (e.g. routing). This is not necessary. There
> does not need to be a separate entry for each ETR in the routing (as would
> indeed be true if the lower-layer routing were the _only_ mechanism available
> to the higher layer).
>
>    > So it seems that you have to do a redistribution of routing protocol ->
>    > mapping database -> routing protocol
>
> At an abstract architectural level, when one see/proposes such complex
> interactions (particularly involving real-time feedback) between various
> subsystems, one's 'architectural bad idea alarm bell' should go off... That
> kind of thing is prone to problems, as it's hard to model how it will
> operate, particularly in complex configurations.
>
> It's better to adopt a design philosophy in which the interactions are
> simpler, and do not involve dependency loops. The approach above (the mapping
> provides a set of potential next hops, and another mechanisms selects which
> ones are actually reachable) does that.
>
>
>    > And the current DNS system is quite slow to update
>
> Which would indicate that keeping the _mappings_ in the DNS would not be
> good, if you want to be able to change them, and have the changes propogated
> quickly.

Agree

>
> There has been discussion of hybrid systems in which the DNS instead stores
> information about the entities which are authoritative for a given mapping,
> not the mappings themselves; the distribution of mappings themselves is part
> of the new encapsulation system. This does mean that authoritative mapping
> servers cannot be quickly added/removed, but their dynamicity is likely to be
> fairly low.
>
> This allows questions such as the cache updating problem you raised above,
> etc to be handled; since that part of the overall mechanism is in the new
> subsystem, it can be designed accordingly, to meet whatever performance goals
> are desirable.

I'm not really a friend of caches, sooner or later you get into
trouble with them because they are unpredictable and housekeeping of
entries in the cache is hard. In the 90's CEF replaced the
route-cache, ATM LANE..uh-oh...during the DSL-boom some ISPs tried to
do NAT on the B-RAS and Peer2Peer detectors had limits how many flows
they could handle. I'm skeptic with caches but I'll try to keep my
mind open.

Noel, thanks for your patient and guiding - this has been very useful
for me. Thanks!

-- patte

>
>
>    > what happened in Sweden was an engineering issue but to get the
>    > services restored was due to the architecture of the system
>
> I am unfamiliar with this case, but will review it - study of history is
> perhaps the single most important tool for a system architect.
>
>    > one hour or longer is really not what multi-homed enterprises are
>    > expecting in failure cases
>
> Indeed, and understandably. However, a correct combination of architecture
> and engineering should provide much better performance at a 'reasonable'
> cost.
>
>        Noel
>
_______________________________________________
rrg mailing list
[email protected]
http://www.irtf.org/mailman/listinfo/rrg

Reply via email to