Re: Progressing with draft-litkowski-rtgwg-uloop-delay-00 ?

Pierre Francois Mon, 20 May 2013 08:27:51 -0700

Alia, 

Thanks for your quick feedback.


Let me check with my co-authors on whether we should change the doc to answer 
your comments and come back 
for a discussion based on ink-on-paper, or answer on the list. I am afraid of a 
never-ending thread for the later :)

Cheers,

Pierre.

On May 20, 2013, at 5:04 PM, Alia Atlas <[email protected]> wrote:

> Hi Pierre,
> 
> Thank you for starting the conversation and a quick intro on the differences.
> 
> When I look at this draft and PLSN, what I see is that the PLR is 
> definitionally either a type B router (since
> it has an alternate that is safe for forwarding traffic or for link up it's 
> old primary) and that the PLR is then the
> only router to apply the basic procedure.  However, the PLR may not have an 
> alternate available, unless MRT is used.
> 
> As draft-ietf-rtgwg-microloop-analysis-01 says in Sec 3.3:
> 
> " Another distinct situation is when the router does not support IPFRR or 
> could not repair the failure, the new primary next-hops do not satisfy the 
> safety condition, and there's no other neighbor that does, i.e. a type-C 
> situation. Unlike other routers in the network, the router directly connected 
> to the network does not have the old next-hop any more, and cannot continue 
> using it. Immediately switching to the new next-hops, on the other hand, may 
> result in a micro-loop. In this situation, the router MUST discard traffic 
> forwarded along the affected route for the duration of DELAY_TYPEC, and then 
> update the routes. Implementations MAY have a configuration option to allow 
> switching immediately to the new next-hops for situations where this type of 
> a micro-loop is not a concern. If implemented, this option MUST be disabled 
> by default."
> 
> Granted, this discarding becomes the default behavior for 
> draft-litkowski-rtgwg-uloop-delay-00, but the reasoning and trade-offs are 
> not discussed.
> 
> In the analysis given in draft-litkowski-rtgwg-uloop-delay-00, the benefit 
> discussed is only in terms of local
> microloops and completely ignores non-local microloops.  I know that this 
> particular technique is not solving
> the remote microloops problem - but those are a real problem and without even 
> attempting to characterize that,
> there's little way of telling whether the local microloops are 1% of the 
> problem or 99%.
> 
> That the technique can apply when only the PLR does it is not as interesting 
> as having a more general technique 
> that works for traffic from routers that implement it and does not cause 
> problems.
> 
> Obviously, the WG debated this issue quite some time ago and was willing to 
> go for a simpler partial solution (PLSN)
> over OFIB that gave similar coverage to RLFA.
> 
> Is your current argument that this even simpler and more partial a solution 
> might gain some traction?  Or is it that this
> was simpler to implement and provides some mitigation?
> 
> In addition to lacking any guidance on the scale of the total problem that it 
> solves, the draft also lacks details to handle
> the cases where the network hasn't been stable.  Granted, the latter is not 
> deeply complex - but the solution isn't safely
> usable without it.
> 
> I think that we as a WG need to do 4 things:
>     a) Understand the scope of the total microloop problem and what fraction 
> of this that draft-litkowski-rtgwg-uloop-delay-00 actually can solve.   Does 
> it handle asymmetric link-costs and multi-hop micro-loops?  Better examples 
> of what types of local microloops are handled and why other types aren't 
> protected would be useful.  How would an operator be certain as to what 
> protection would be provided or how to engineer a network to obtain it?  
>     b) Have a draft that fully describes the problem, the trade-offs, and the 
> solution in detail rather than just a brief conceptual overview.
>     c) Understand the computation and complexity trade-offs between the 
> different solutions - given that LFA is already assumed for it to be useful.
>     d) Discuss how partial a solution is desirable to standardize and the 
> pros/cons of having a worse solution standardized.   Implementations aren't 
> free - and by standardizing a more partial solution, this can delay 
> implementations of a better solution.
> 
> I understand the desire to standardize something and to take something that 
> seems straightforward and is likely useful to at least one network, but given 
> the WG track record, at a minimum, I think we must have a more complete draft 
> that fully documents the solution in detail and compares it fairly. 
> 
> Regards,
> Alia
> 
> 
> On Mon, May 20, 2013 at 7:57 AM, Pierre Francois <[email protected]> 
> wrote:
> 
> 
> Dear rtgwg list members,
> 
> I would like to know your opinion about what we should do with 
> http://tools.ietf.org/html/draft-litkowski-rtgwg-uloop-delay-00 , that we 
> presented in Orlando.
> 
> The idea was to avoid microloops occurring in the direct neighbourhood of a 
> node shutting down or bringing up a link in an IGP topology, by introducing 
> some
> fixed delay in the update of the FIB in the down case, and introducing a 
> fixed delay in the propagation of the LSP describing the link as up in the up 
> case.
> 
> The solution is simple, will be released by some in the upcoming months, and 
> the Orlando audience was seeming to find it interesting to work on.
> 
> Alia mentioned the interest of comparing this solution with the state of the 
> art before going further with the doc, so here it comes.
> 
> Generally, compared to other solutions, local-delay does not provide full 
> coverage, as it only avoids all (but only)  microloops occurring locally to 
> the affected node. However,
> in many networks, as shown by Stephane's analysis, it is already highly 
> beneficial to have loop avoidance there. Considering the simplicity of the 
> approach,
> this looks like a low hanging fruit.
> 
> Alia was considering a comparison  with PLSN. (described in 
> http://tools.ietf.org/html/draft-ietf-rtgwg-microloop-analysis-01, expired 7 
> years ago ;) )
> 
> The differences with the PLSN approach are the following:
> 
> PLSN lets all routers having to converge for some destinations, try to 
> understand the safety of their new next hops, for each destination.
> Based on this assessment, they either
> 
> 1. Transiently use a safe, non post-convergence, set of next hops, to finally 
> converge to the post-convergence one, or
> 2. Transiently use old next-hops, to finally converge to the post-convergence 
> ones.
> 
> Local delay can be defined as a subset of this approach:
> Only the node local to the event applies the procedure.
> Step 1 in PLSN is not applied, we only suggest the node to wait for a fixed 
> time, no transient FIB state.
> 
> I was considering a comparison with oFIB, draft-ietf-rtgwg-ordered-fib , 
> submitted to IESG as informational.
> local-delay can be defined as a subset of this approach:
> 
> While oFIB defines an ordering among all the nodes of the network, telling 
> which node should wait for which neighbours to be done with their update, 
> before performing their own, local-delay tells the local node to wait before 
> fast convergence has happened in the rest of the network.
> 
> I think that despite the close relationships between these approaches, 
> local-delay is worth being documented on its own because:
> 
> It's simple, on its way to be supported, and provides loop avoidance where 
> they happen to be the most annoying.
> 
> Cheers,
> 
> Pierre.
> 
> 
> _______________________________________________
> rtgwg mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/rtgwg
>

_______________________________________________
rtgwg mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/rtgwg

Re: Progressing with draft-litkowski-rtgwg-uloop-delay-00 ?

Reply via email to