Pierre, The end result should be an updated draft. The list is good for discussing what should go in and why :-)
Regards, Alia On Mon, May 20, 2013 at 11:27 AM, Pierre Francois <[email protected] > wrote: > > Alia, > > Thanks for your quick feedback. > > Let me check with my co-authors on whether we should change the doc to > answer your comments and come back > for a discussion based on ink-on-paper, or answer on the list. I am afraid > of a never-ending thread for the later :) > > Cheers, > > Pierre. > > On May 20, 2013, at 5:04 PM, Alia Atlas <[email protected]> wrote: > > Hi Pierre, > > Thank you for starting the conversation and a quick intro on the > differences. > > When I look at this draft and PLSN, what I see is that the PLR is > definitionally either a type B router (since > it has an alternate that is safe for forwarding traffic or for link up > it's old primary) and that the PLR is then the > only router to apply the basic procedure. However, the PLR may not have > an alternate available, unless MRT is used. > > As draft-ietf-rtgwg-microloop-analysis-01 says in Sec 3.3: > > " Another distinct situation is when the router does not support IPFRR or > could not repair the failure, the new primary next-hops do not satisfy > the safety condition, and there's no other neighbor that does, i.e. a > type-C situation. Unlike other routers in the network, the router > directly connected to the network does not have the old next-hop any more, > and cannot continue using it. Immediately switching to the new next-hops, > on the other hand, may result in a micro-loop. In this situation, the > router MUST discard traffic forwarded along the affected route for the > duration of DELAY_TYPEC, and then update the routes. Implementations MAY > have a configuration option to allow switching immediately to the new > next-hops for situations where this type of a micro-loop is not a concern. > If implemented, this option MUST be disabled by default." > > Granted, this discarding becomes the default behavior > for draft-litkowski-rtgwg-uloop-delay-00, but the reasoning and trade-offs > are not discussed. > > In the analysis given in draft-litkowski-rtgwg-uloop-delay-00, the benefit > discussed is only in terms of local > microloops and completely ignores non-local microloops. I know that this > particular technique is not solving > the remote microloops problem - but those are a real problem and without > even attempting to characterize that, > there's little way of telling whether the local microloops are 1% of the > problem or 99%. > > That the technique can apply when only the PLR does it is not as > interesting as having a more general technique > that works for traffic from routers that implement it and does not cause > problems. > > Obviously, the WG debated this issue quite some time ago and was willing > to go for a simpler partial solution (PLSN) > over OFIB that gave similar coverage to RLFA. > > Is your current argument that this even simpler and more partial a > solution might gain some traction? Or is it that this > was simpler to implement and provides some mitigation? > > In addition to lacking any guidance on the scale of the total problem that > it solves, the draft also lacks details to handle > the cases where the network hasn't been stable. Granted, the latter is > not deeply complex - but the solution isn't safely > usable without it. > > I think that we as a WG need to do 4 things: > a) Understand the scope of the total microloop problem and what > fraction of this that draft-litkowski-rtgwg-uloop-delay-00 actually can > solve. Does it handle asymmetric link-costs and multi-hop micro-loops? > Better examples of what types of local microloops are handled and why > other types aren't protected would be useful. How would an operator be > certain as to what protection would be provided or how to engineer a > network to obtain it? > b) Have a draft that fully describes the problem, the trade-offs, and > the solution in detail rather than just a brief conceptual overview. > c) Understand the computation and complexity trade-offs between the > different solutions - given that LFA is already assumed for it to be useful. > d) Discuss how partial a solution is desirable to standardize and the > pros/cons of having a worse solution standardized. Implementations aren't > free - and by standardizing a more partial solution, this can delay > implementations of a better solution. > > I understand the desire to standardize something and to take something > that seems straightforward and is likely useful to at least one network, > but given the WG track record, at a minimum, I think we must have a more > complete draft that fully documents the solution in detail and compares it > fairly. > > Regards, > Alia > > > On Mon, May 20, 2013 at 7:57 AM, Pierre Francois < > [email protected]> wrote: > >> >> >> Dear rtgwg list members, >> >> I would like to know your opinion about what we should do with >> http://tools.ietf.org/html/draft-litkowski-rtgwg-uloop-delay-00 , that >> we presented in Orlando. >> >> The idea was to avoid microloops occurring in the direct neighbourhood of >> a node shutting down or bringing up a link in an IGP topology, by >> introducing some >> fixed delay in the update of the FIB in the down case, and introducing a >> fixed delay in the propagation of the LSP describing the link as up in the >> up case. >> >> The solution is simple, will be released by some in the upcoming months, >> and the Orlando audience was seeming to find it interesting to work on. >> >> Alia mentioned the interest of comparing this solution with the state of >> the art before going further with the doc, so here it comes. >> >> Generally, compared to other solutions, local-delay does not provide full >> coverage, as it only avoids all (but only) microloops occurring locally to >> the affected node. However, >> in many networks, as shown by Stephane's analysis, it is already highly >> beneficial to have loop avoidance there. Considering the simplicity of the >> approach, >> this looks like a low hanging fruit. >> >> Alia was considering a comparison with PLSN. (described in >> http://tools.ietf.org/html/draft-ietf-rtgwg-microloop-analysis-01, >> expired 7 years ago ;) ) >> >> The differences with the PLSN approach are the following: >> >> PLSN lets all routers having to converge for some destinations, try to >> understand the safety of their new next hops, for each destination. >> Based on this assessment, they either >> >> 1. Transiently use a safe, non post-convergence, set of next hops, to >> finally converge to the post-convergence one, or >> 2. Transiently use old next-hops, to finally converge to the >> post-convergence ones. >> >> Local delay can be defined as a subset of this approach: >> Only the node local to the event applies the procedure. >> Step 1 in PLSN is not applied, we only suggest the node to wait for a >> fixed time, no transient FIB state. >> >> I was considering a comparison with oFIB, draft-ietf-rtgwg-ordered-fib , >> submitted to IESG as informational. >> local-delay can be defined as a subset of this approach: >> >> While oFIB defines an ordering among all the nodes of the network, >> telling which node should wait for which neighbours to be done with their >> update, before performing their own, local-delay tells the local node to >> wait before fast convergence has happened in the rest of the network. >> >> I think that despite the close relationships between these approaches, >> local-delay is worth being documented on its own because: >> >> It's simple, on its way to be supported, and provides loop avoidance >> where they happen to be the most annoying. >> >> Cheers, >> >> Pierre. >> >> >> _______________________________________________ >> rtgwg mailing list >> [email protected] >> https://www.ietf.org/mailman/listinfo/rtgwg >> > > >
_______________________________________________ rtgwg mailing list [email protected] https://www.ietf.org/mailman/listinfo/rtgwg
