Re: Progressing with draft-litkowski-rtgwg-uloop-delay-00 ?

Alia Atlas Mon, 20 May 2013 08:37:16 -0700

Pierre,

The end result should be an updated draft.  The list is good for discussing
what should go in and why :-)


Regards,
Alia


On Mon, May 20, 2013 at 11:27 AM, Pierre Francois <[email protected]
> wrote:

>
> Alia,
>
> Thanks for your quick feedback.
>
> Let me check with my co-authors on whether we should change the doc to
> answer your comments and come back
> for a discussion based on ink-on-paper, or answer on the list. I am afraid
> of a never-ending thread for the later :)
>
> Cheers,
>
> Pierre.
>
> On May 20, 2013, at 5:04 PM, Alia Atlas <[email protected]> wrote:
>
> Hi Pierre,
>
> Thank you for starting the conversation and a quick intro on the
> differences.
>
> When I look at this draft and PLSN, what I see is that the PLR is
> definitionally either a type B router (since
> it has an alternate that is safe for forwarding traffic or for link up
> it's old primary) and that the PLR is then the
> only router to apply the basic procedure.  However, the PLR may not have
> an alternate available, unless MRT is used.
>
> As draft-ietf-rtgwg-microloop-analysis-01 says in Sec 3.3:
>
> " Another distinct situation is when the router does not support IPFRR or
> could not repair the failure, the new primary next-hops do not satisfy
> the safety condition, and there's no other neighbor that does, i.e. a
> type-C situation. Unlike other routers in the network, the router
> directly connected to the network does not have the old next-hop any more,
> and cannot continue using it. Immediately switching to the new next-hops,
> on the other hand, may result in a micro-loop. In this situation, the
> router MUST discard traffic forwarded along the affected route for the
> duration of DELAY_TYPEC, and then update the routes. Implementations MAY
> have a configuration option to allow switching immediately to the new
> next-hops for situations where this type of a micro-loop is not a concern.
> If implemented, this option MUST be disabled by default."
>
> Granted, this discarding becomes the default behavior
> for draft-litkowski-rtgwg-uloop-delay-00, but the reasoning and trade-offs
> are not discussed.
>
> In the analysis given in draft-litkowski-rtgwg-uloop-delay-00, the benefit
> discussed is only in terms of local
> microloops and completely ignores non-local microloops.  I know that this
> particular technique is not solving
> the remote microloops problem - but those are a real problem and without
> even attempting to characterize that,
> there's little way of telling whether the local microloops are 1% of the
> problem or 99%.
>
> That the technique can apply when only the PLR does it is not as
> interesting as having a more general technique
> that works for traffic from routers that implement it and does not cause
> problems.
>
> Obviously, the WG debated this issue quite some time ago and was willing
> to go for a simpler partial solution (PLSN)
> over OFIB that gave similar coverage to RLFA.
>
> Is your current argument that this even simpler and more partial a
> solution might gain some traction?  Or is it that this
> was simpler to implement and provides some mitigation?
>
> In addition to lacking any guidance on the scale of the total problem that
> it solves, the draft also lacks details to handle
> the cases where the network hasn't been stable.  Granted, the latter is
> not deeply complex - but the solution isn't safely
> usable without it.
>
> I think that we as a WG need to do 4 things:
>     a) Understand the scope of the total microloop problem and what
> fraction of this that draft-litkowski-rtgwg-uloop-delay-00 actually can
> solve.   Does it handle asymmetric link-costs and multi-hop micro-loops?
>  Better examples of what types of local microloops are handled and why
> other types aren't protected would be useful.  How would an operator be
> certain as to what protection would be provided or how to engineer a
> network to obtain it?
>     b) Have a draft that fully describes the problem, the trade-offs, and
> the solution in detail rather than just a brief conceptual overview.
>     c) Understand the computation and complexity trade-offs between the
> different solutions - given that LFA is already assumed for it to be useful.
>     d) Discuss how partial a solution is desirable to standardize and the
> pros/cons of having a worse solution standardized.   Implementations aren't
> free - and by standardizing a more partial solution, this can delay
> implementations of a better solution.
>
> I understand the desire to standardize something and to take something
> that seems straightforward and is likely useful to at least one network,
> but given the WG track record, at a minimum, I think we must have a more
> complete draft that fully documents the solution in detail and compares it
> fairly.
>
> Regards,
> Alia
>
>
> On Mon, May 20, 2013 at 7:57 AM, Pierre Francois <
> [email protected]> wrote:
>
>>
>>
>> Dear rtgwg list members,
>>
>> I would like to know your opinion about what we should do with
>> http://tools.ietf.org/html/draft-litkowski-rtgwg-uloop-delay-00 , that
>> we presented in Orlando.
>>
>> The idea was to avoid microloops occurring in the direct neighbourhood of
>> a node shutting down or bringing up a link in an IGP topology, by
>> introducing some
>> fixed delay in the update of the FIB in the down case, and introducing a
>> fixed delay in the propagation of the LSP describing the link as up in the
>> up case.
>>
>> The solution is simple, will be released by some in the upcoming months,
>> and the Orlando audience was seeming to find it interesting to work on.
>>
>> Alia mentioned the interest of comparing this solution with the state of
>> the art before going further with the doc, so here it comes.
>>
>> Generally, compared to other solutions, local-delay does not provide full
>> coverage, as it only avoids all (but only)  microloops occurring locally to
>> the affected node. However,
>> in many networks, as shown by Stephane's analysis, it is already highly
>> beneficial to have loop avoidance there. Considering the simplicity of the
>> approach,
>> this looks like a low hanging fruit.
>>
>> Alia was considering a comparison  with PLSN. (described in
>> http://tools.ietf.org/html/draft-ietf-rtgwg-microloop-analysis-01,
>> expired 7 years ago ;) )
>>
>> The differences with the PLSN approach are the following:
>>
>> PLSN lets all routers having to converge for some destinations, try to
>> understand the safety of their new next hops, for each destination.
>> Based on this assessment, they either
>>
>> 1. Transiently use a safe, non post-convergence, set of next hops, to
>> finally converge to the post-convergence one, or
>> 2. Transiently use old next-hops, to finally converge to the
>> post-convergence ones.
>>
>> Local delay can be defined as a subset of this approach:
>> Only the node local to the event applies the procedure.
>> Step 1 in PLSN is not applied, we only suggest the node to wait for a
>> fixed time, no transient FIB state.
>>
>> I was considering a comparison with oFIB, draft-ietf-rtgwg-ordered-fib ,
>> submitted to IESG as informational.
>> local-delay can be defined as a subset of this approach:
>>
>> While oFIB defines an ordering among all the nodes of the network,
>> telling which node should wait for which neighbours to be done with their
>> update, before performing their own, local-delay tells the local node to
>> wait before fast convergence has happened in the rest of the network.
>>
>> I think that despite the close relationships between these approaches,
>> local-delay is worth being documented on its own because:
>>
>> It's simple, on its way to be supported, and provides loop avoidance
>> where they happen to be the most annoying.
>>
>> Cheers,
>>
>> Pierre.
>>
>>
>> _______________________________________________
>> rtgwg mailing list
>> [email protected]
>> https://www.ietf.org/mailman/listinfo/rtgwg
>>
>
>
>

_______________________________________________
rtgwg mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/rtgwg

Re: Progressing with draft-litkowski-rtgwg-uloop-delay-00 ?

Reply via email to