Alia, Thanks for your quick feedback.
Let me check with my co-authors on whether we should change the doc to answer your comments and come back for a discussion based on ink-on-paper, or answer on the list. I am afraid of a never-ending thread for the later :) Cheers, Pierre. On May 20, 2013, at 5:04 PM, Alia Atlas <[email protected]> wrote: > Hi Pierre, > > Thank you for starting the conversation and a quick intro on the differences. > > When I look at this draft and PLSN, what I see is that the PLR is > definitionally either a type B router (since > it has an alternate that is safe for forwarding traffic or for link up it's > old primary) and that the PLR is then the > only router to apply the basic procedure. However, the PLR may not have an > alternate available, unless MRT is used. > > As draft-ietf-rtgwg-microloop-analysis-01 says in Sec 3.3: > > " Another distinct situation is when the router does not support IPFRR or > could not repair the failure, the new primary next-hops do not satisfy the > safety condition, and there's no other neighbor that does, i.e. a type-C > situation. Unlike other routers in the network, the router directly connected > to the network does not have the old next-hop any more, and cannot continue > using it. Immediately switching to the new next-hops, on the other hand, may > result in a micro-loop. In this situation, the router MUST discard traffic > forwarded along the affected route for the duration of DELAY_TYPEC, and then > update the routes. Implementations MAY have a configuration option to allow > switching immediately to the new next-hops for situations where this type of > a micro-loop is not a concern. If implemented, this option MUST be disabled > by default." > > Granted, this discarding becomes the default behavior for > draft-litkowski-rtgwg-uloop-delay-00, but the reasoning and trade-offs are > not discussed. > > In the analysis given in draft-litkowski-rtgwg-uloop-delay-00, the benefit > discussed is only in terms of local > microloops and completely ignores non-local microloops. I know that this > particular technique is not solving > the remote microloops problem - but those are a real problem and without even > attempting to characterize that, > there's little way of telling whether the local microloops are 1% of the > problem or 99%. > > That the technique can apply when only the PLR does it is not as interesting > as having a more general technique > that works for traffic from routers that implement it and does not cause > problems. > > Obviously, the WG debated this issue quite some time ago and was willing to > go for a simpler partial solution (PLSN) > over OFIB that gave similar coverage to RLFA. > > Is your current argument that this even simpler and more partial a solution > might gain some traction? Or is it that this > was simpler to implement and provides some mitigation? > > In addition to lacking any guidance on the scale of the total problem that it > solves, the draft also lacks details to handle > the cases where the network hasn't been stable. Granted, the latter is not > deeply complex - but the solution isn't safely > usable without it. > > I think that we as a WG need to do 4 things: > a) Understand the scope of the total microloop problem and what fraction > of this that draft-litkowski-rtgwg-uloop-delay-00 actually can solve. Does > it handle asymmetric link-costs and multi-hop micro-loops? Better examples > of what types of local microloops are handled and why other types aren't > protected would be useful. How would an operator be certain as to what > protection would be provided or how to engineer a network to obtain it? > b) Have a draft that fully describes the problem, the trade-offs, and the > solution in detail rather than just a brief conceptual overview. > c) Understand the computation and complexity trade-offs between the > different solutions - given that LFA is already assumed for it to be useful. > d) Discuss how partial a solution is desirable to standardize and the > pros/cons of having a worse solution standardized. Implementations aren't > free - and by standardizing a more partial solution, this can delay > implementations of a better solution. > > I understand the desire to standardize something and to take something that > seems straightforward and is likely useful to at least one network, but given > the WG track record, at a minimum, I think we must have a more complete draft > that fully documents the solution in detail and compares it fairly. > > Regards, > Alia > > > On Mon, May 20, 2013 at 7:57 AM, Pierre Francois <[email protected]> > wrote: > > > Dear rtgwg list members, > > I would like to know your opinion about what we should do with > http://tools.ietf.org/html/draft-litkowski-rtgwg-uloop-delay-00 , that we > presented in Orlando. > > The idea was to avoid microloops occurring in the direct neighbourhood of a > node shutting down or bringing up a link in an IGP topology, by introducing > some > fixed delay in the update of the FIB in the down case, and introducing a > fixed delay in the propagation of the LSP describing the link as up in the up > case. > > The solution is simple, will be released by some in the upcoming months, and > the Orlando audience was seeming to find it interesting to work on. > > Alia mentioned the interest of comparing this solution with the state of the > art before going further with the doc, so here it comes. > > Generally, compared to other solutions, local-delay does not provide full > coverage, as it only avoids all (but only) microloops occurring locally to > the affected node. However, > in many networks, as shown by Stephane's analysis, it is already highly > beneficial to have loop avoidance there. Considering the simplicity of the > approach, > this looks like a low hanging fruit. > > Alia was considering a comparison with PLSN. (described in > http://tools.ietf.org/html/draft-ietf-rtgwg-microloop-analysis-01, expired 7 > years ago ;) ) > > The differences with the PLSN approach are the following: > > PLSN lets all routers having to converge for some destinations, try to > understand the safety of their new next hops, for each destination. > Based on this assessment, they either > > 1. Transiently use a safe, non post-convergence, set of next hops, to finally > converge to the post-convergence one, or > 2. Transiently use old next-hops, to finally converge to the post-convergence > ones. > > Local delay can be defined as a subset of this approach: > Only the node local to the event applies the procedure. > Step 1 in PLSN is not applied, we only suggest the node to wait for a fixed > time, no transient FIB state. > > I was considering a comparison with oFIB, draft-ietf-rtgwg-ordered-fib , > submitted to IESG as informational. > local-delay can be defined as a subset of this approach: > > While oFIB defines an ordering among all the nodes of the network, telling > which node should wait for which neighbours to be done with their update, > before performing their own, local-delay tells the local node to wait before > fast convergence has happened in the rest of the network. > > I think that despite the close relationships between these approaches, > local-delay is worth being documented on its own because: > > It's simple, on its way to be supported, and provides loop avoidance where > they happen to be the most annoying. > > Cheers, > > Pierre. > > > _______________________________________________ > rtgwg mailing list > [email protected] > https://www.ietf.org/mailman/listinfo/rtgwg >
_______________________________________________ rtgwg mailing list [email protected] https://www.ietf.org/mailman/listinfo/rtgwg
