Hi Bruno

I was thinking about this some more. It is something that was recognised in the 
early days, but somewhat swept aside.

The case that Gyan bought up was an ECMP case, but I fear that the case is more 
common and I think we should characterise it as part of the text rather that 
giving the impression it is unusual.

I think the problem occurs whenever there are two or more nodes between the 
point of packet entry and the failure.

CE1 - R1 - R2 - R3 - R4 -/- R5 - CE2
      |                     |
      R6 - R7 - R8 - R9 — R10

The normal path CE1-CE2 is via R2

When R4-R5 fails it is trivial to see how the repair works with R7 as the entry 
into Q space.

However unless R1, R2,  R3 converge in that order there will be microloops for 
traffic entering via any of those three nodes.

So I think we can say that unless the PLR is only receiving traffic to be 
protected directly or from its immediate neighbour it is not guaranteed that 
there  will not be micro loops that are not addressable by the propose strategy 
of aligning the repair path with the post convergence path.

Now thinking about the text you have below, I think we need to write in in 
terms of - Unless the operator is certain that no micro loops will form over 
any path the protected traffic will traverse between entry to the network and 
arrival at the PLR a micro loop avoidance method MUST be deployed. Of course I 
think that it would be helpful to the operator community for the text to 
provide some guidance on how to ascertain whether there is a danger of the 
formation of micro loops.

I would note that the long chains of nodes show in the example above were 
probably not present in the test topologies which as I remember were all 
national scale provider networks, but unless we provide guidance otherwise 
Ti-LFA could reasonably be deployed in edge networks and in the case of cell 
systems these are often ring topologies.

So I think we need to agree (as a WG) on the constrains that we are prepared to 
specify in the text and the degree of warning we need to provide to the 
operator community and then we can polish the text below.

Best regards

Stewart




> On 16 Oct 2023, at 17:25, [email protected] wrote:
> 
> Hi Stewart,
>  
> Please see inline
>  
>  
> Orange Restricted
> From: Stewart Bryant <[email protected] 
> <mailto:[email protected]>>
> Sent: Monday, October 16, 2023 2:08 PM
> To: [email protected] <mailto:[email protected]>; rtgwg-chairs 
> <[email protected] <mailto:[email protected]>>; 
> [email protected] 
> <mailto:[email protected]>
> Cc: Stewart Bryant <[email protected] 
> <mailto:[email protected]>>
> Subject: draft-ietf-rtgwg-segment-routing-ti-lfa : A simple pathological 
> network fragment
>  
> During the operations directorate early review of 
> draft-ietf-rtgwg-segment-routing-ti-lfa 
> Gyan Mishra points to a simple pathological network fragment that I think 
> deserves wider discussion.
>  
> https://datatracker.ietf.org/doc/review-ietf-rtgwg-segment-routing-ti-lfa-11-opsdir-early-mishra-2023-08-25/
>  
> I am not aware of any response to the RTGWG by the draft authors concerning 
> the review comment and I cannot see obvious new text addressing this concern.
> 
> The fragment is as follows
> 
> CE1 –R1- R2-/-R3-CE2
>      |         |
>      R4 – R5 -R6
> 
> In the pre converged network R4 is ECMP CE2 via R5 (cost 4) and via R1 (cost 
> also 4).
> 
> We can easily build a TI-LFA repair path from R2 under link failure to CE2 
> (so long as we remember that R4 is an ECMP path to CE2), but the problem 
> occurs during convergence. If R1 converges before R4, R4 may ECMP packets 
> addressed to CE2 back to R1 in a micro loop. Meanwhile since no packets for 
> R3 are reaching R2 the Ti-LFA repair is not doing anything useful. 
> 
> The Ti-LFA text leads the reader to conclude that it is a loop-free solution, 
> but gives no guidance on how to determine when this assumption breaks down. 
> There is an informational reference to 
> draft-bashandy-rtgwg-segment-routing-uloop, but this short individual draft 
> does little in the way of helping the reader determine when  loop avoidance 
> strategy needs to be deployed and the loop-free approach it describes does 
> not seem to be fully developed.
>  
> I am worried that proceeding with the Ti-LFA draft without noting that there 
> is a real risk that simple network fragments can micoloop, and providing a 
> fully formed mitigation strategy is a disservice to the operator community 
> given the industry interest in Ti-LDA and the insidious nature of unexpected 
> micro loop network transients, I am wondering what the view of the working 
> group is on how to proceed.
>  
> One approach would be for the Ti-LFA draft to incorporate detailed guidance 
> on how to determine the risk of a micro loop in a specific operator network, 
> and to provide specific mitigation advice. Another approach would be to  
> reference a developed loop avoidance strategy and recommending its preemptive 
> deployment. Another approach would be to make 
> draft-bashandy-rtgwg-segment-routing-uloop a normative reference and tie the 
> fate of the two drafts. Another approach would be to elaborate on the risks 
> and their manifestations but declare it a currently unsolved problem. I am 
> sure there are other options that the WG may formulate.
>  
> What is the opinion of the working group on how we should proceed with 
> draft-ietf-rtgwg-segment-routing-ti-lfa when considering the possible 
> formation of micro loops?
>  
> FRR takes place between the failure (detection) and the IGP reconvergence. 
> Those are two consecutive steps that the WG has so far addressed with 
> different solutions and documents.
> That’s not new and that’s not specific to TI-LFA. E.g., that’s applicable to 
> RLFA.
>  
> Would the below text, taken verbatim from RFC 7490 (RLFA), work for you? Or 
> would you say that the text is not good enough?
> “When the network reconverges, micro-loops [RFC5715 
> <https://datatracker.ietf.org/doc/html/rfc5715>] can form due to
>    transient inconsistencies in the forwarding tables of different
>    routers.  If it is determined that micro-loops are a significant
>    issue in the deployment, then a suitable loop-free convergence
>    method, such as one of those described in [RFC5715 
> <https://datatracker.ietf.org/doc/html/rfc5715>], [RFC6976 
> <https://datatracker.ietf.org/doc/html/rfc6976>], or
>    [ULOOP-DELAY 
> <https://datatracker.ietf.org/doc/html/rfc7490#ref-ULOOP-DELAY>], should be 
> implemented.”
>  
> https://datatracker.ietf.org/doc/html/rfc7490#section-10
>  
> Of course, we could update the list of informative references.
> E.g., by adding another informative reference to 
> draft-bashandy-rtgwg-segment-routing-uloop and by removing informative 
> references to [RFC6976] and [ULOOP-DELAY] which are probably outdated.
>  
> --Bruno
>  
>  
> - Stewart
> ____________________________________________________________________________________________________________
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou 
> falsifie. Merci.
> 
> This message and its attachments may contain confidential or privileged 
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and delete 
> this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been 
> modified, changed or falsified.
> Thank you.

_______________________________________________
rtgwg mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/rtgwg

Reply via email to