Thanks for the thorough review

I am in the process of uploading version 15 of the document. So whenever I say "fixed", "changed", "modified",..., etc, I am referring to version 15

See inline comments at #Ahmed


Ahmed


On 4/16/24 1:14 PM, John Scudder via Datatracker wrote:
John Scudder has entered the following ballot position for
draft-ietf-rtgwg-segment-routing-ti-lfa-13: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer tohttps://www.ietf.org/about/groups/iesg/statements/handling-ballot-positions/ for more information about how to handle DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-rtgwg-segment-routing-ti-lfa/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

# John Scudder, RTG AD, comments for draft-ietf-rtgwg-segment-routing-ti-lfa-13
CC @jgscudder

Thanks for this document. The technology is valuable, and the underlying
techniques sound. Despite what you might guess from my abundance of comments, I
like this document. I suspect it has suffered from having been edited
repeatedly by the same set of experts, it becomes hard to see it as a new
reader might. So, please accept my comments in the spirit of a new set of eyes
looking over some long-established text.

Although most of my remarks are non-blocking COMMENTs, I am putting two in as
DISCUSS points. Some of my other comments are of a similar nature, but I think
they're less serious for the overall coherence of the document.

## DISCUSS

### Whole document, is post-convergence of the essence, or not?
#Ahmed No. THere are many aspects in the documents that indicates that the whole document is not post convergence, although post-covergence aspect is a very important aspect of the document. In fact the sentence starting with "Although not a Ti-LFA requirement or constraint," that you quoted in the comment after the next one explicitly mentions that post-convergence is not a constrain or a requiremehnt

The document seems to be arguing with itself about whether following the
post-convergence path is, or is not an essential/required feature. In the
Abstract:

    A key aspect of
    TI-LFA is the FRR path selection approach establishing protection
    over the expected post-convergence paths from the point of local
    repair

It's a *key* aspect! OK! But then, in the Introduction:

    Although not a Ti-LFA requirement or constraint, TI-LFA also brings
    the benefit of the ability to provide a backup path that follows the
    expected post-convergence path
#Ahmed: I do not understand why you think that the work "key" implies "only" or "must".

Wait, it's "key" but "not a requirement or constraint"? Moving on to Section 6,

#Ahmed: Again I do not understand how you came to the conclusion that "key" implies "requirement" or "constraint" even though the document explicitly mentions that post-convergence is neither a requirement nor a contraint


    The repair list encodes the
    explicit post-convergence path to the destination

So it "encodes the explicit post-convergence path". "Encodes", not "might
encode" or "can encode". So the Abstract is right and Section 2 is wrong. But
wait there's more! Later in Section 11,

#Ahmed: I really do not understand why you want to push "must be post convergence" or "only post-convergence" down the throat of this draft.

Does the sentence say "the repair list ****only**** encodes"??.


    traffic can
    be steered by the PLR onto its expected post-convergence path during
    the FRR phase

So it "can"... which implies "doesn't have to be".

#Ahmed: Of course it implies "doesn't have to be". If there is any sentence in the draft that says that a ti-lfa repair path must be on post convergence, I would be more than happy to re-word it.


There's more, for example, all of Section 5 talks about post-convergence paths,
and there are many more mentions in Section 6 too.
#Ahmed: Of course. It is a "key" attribute of the post convergence path and big advantage of ti-lfa, but not a "must-have", "a constraint", nor a "requirement" of ti-lfa

Given that Sections 5-8 seem to come closest to being the normative ones
(though the document is sadly not very precise in this regard) I'm left with
the impression that the Abstract is right ("key"), and the quoted passages of
Sections 2 and 11 are wrong. In any case, I think this needs to be resolved in
some way.
#Ahmed: IMO is no problem to solve as I mentioned in the replies to your above comments, which are hinged on a the **very big** assumption that "key" implies "constraint", "requirement" and/or "must-have"

### Section 10, multiple unrelated failures

    Implementations of TI-LFA should deal with the
    occurence of multiple unrelated failures in accordance to the IP Fast
    Reroute Framework [RFC5714].

(Nit, you misspelled "occurrence".)
#Ahmed: Fixed in version 15 (coming soon)

Can you explain what you mean by this sentence? I haven't reviewed it carefully
but RFC 5714 is a framework, and I don’t know what it means to be "in
accordance to" it. The only relevant text I was able to find in it was RFC 5714
Section 5.2.6,

    However, it is important that the occurrence of
    a second failure while one failure is undergoing repair should not
    result in a level of service which is significantly worse than that
    which would have been achieved in the absence of any repair strategy.

Putting that together with the quote from your specification, I come up with an
interpretation like "it's important to behave reasonably in the face of
multiple failures, we aren't going tell you how to do it, this other document
we are citing isn't going to tell you how to do it either, it's just going to
tell you that our specification was supposed to cover this, but we didn't.”

#Ahmed: The paragraph starting with "for each destination" near the bottom of page 5 indicates clearly mentioned that this document covers only a single failure. The above sentence refers the implementer to the framework of FRR.

I do not understand what is your concern?



----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

## COMMENTS

### General Gripe, Y U no XML?

It looks like you must have uploaded the TXT rendering instead of the XML
source. Whenever it is you upload your next revision, please consider uploading
the XML source instead. The set of renderings available when you upload text is
inferior to the renderings available when you upload source (notably, the
modern HTML rendering is not available). If there's some reason uploading
source is difficult or impossible for you, disregard this request of course.
(The only reason I can think of for it to be hard is if you used a less-common
tool to produce your draft, for example, nroff or the Microsoft Word template.
If you use the more mainstream XML or MD workflows, it should be easy enough.)
#Ahmed: I used https://author-tools.ietf.org/

### Abstract, for want of a hyphen the meaning was lost

    It extends these concepts to provide guaranteed coverage in
    any two connected networks using a link-state IGP.

The only way I can make sense of this is to add a hyphen:

    It extends these concepts to provide guaranteed coverage in
    any two-connected networks using a link-state IGP.
#Ahmed: Added the hyphen in version 15. Thanks for the comment

Is that what you meant? I would also suggest changing "using" to "that uses",
as in,

    It extends these concepts to provide guaranteed coverage in
    any two-connected network that uses a link-state IGP.

I'm not sure the concept of two-connectedness is universally understood enough
that it's suitable for use in an Abstract, so I'd support further editing to
get rid of the graph theory term entirely, but at least this edit makes it
correct.
#Ahmed: 2-connectness is an important requirement for ti-lfa to be successful. I am 100% sure that all network engineers know what "two-connected means, and, if not are quite capable of looking up the term "two-connected graph"

### Section 1, orphan definitions

- RSPT is defined, but never used. Delete?
#Ahmed: Deleted

- SLA is defined, only used once (and in a comment below I suggest deleting
that use!). Delete and just expand on first use, if the first-and-only use is
kept at all?
#Ahmed: deleted

- SPT is defined, but never used. Delete?
#Ahmed: deleted

- SRGB is defined, only used once. Delete and just expand on first use?
#Ahmed: Deleted and expanded the first use

- TLDP is defined, only used once. Delete and just expand on first use? (If you
keep it, please correct the nit from Ben Niven-Jenkins’ RTGAREA review.)

#Ahmed: Deleted the definition. The first use is already expanded


### Section 2, "aims at"

    Segment Routing aims at supporting services with tight SLA guarantees
    [RFC8402].

This seems to be rewriting history. That is to say, sure SR “aims at supporting
services with tight SLA guarantees”, but only in the sense that every
general-purpose packet transport does; without specifics this isn’t very
meaningful. In any case, the citation doesn’t supply evidence for the
statement, indeed the string “SLA” or “service level” never occurs in RFC 8402.
Maybe just remove this sentence? It doesn’t appear essential in any case.
#Ahmed: the sentence has been removed in the latest version as of today (version 14)

### Section 2, can't parse this sentence

    By relying on SR this document provides a local repair
    mechanism for standard link-state IGP shortest path capable of
    restoring end-to-end connectivity in the case of a sudden directly
    connected failure of a network component.

I can’t parse this sentence. I am *guessing* that what you mean is something
like,

NEW:
    This document leverages Segment Routing (SR) to provide a local
    repair mechanism for a shortest path computed by a standard
    link-state IGP. The local repair is capable of restoring end-to-end
    connectivity in the case of a failure of a directly connected
    network component.

If this is what you mean, you’re welcome to use this text if you want. If it’s
not what you mean, please explain?
#Ahmed: The text has been changed to be very close to what you are suggesting in version 14. I hope the latest text is satisfactory

Note I removed “sudden” in my NEW text because I’m guessing you don’t mean to
exclude a gradual or a foreseeable failure. A failure is a failure, after all.

### Section 2, When the network reconverges

    When the network reconverges

I suggest,

NEW:
    When the network reconverges after a failure
#Ahmed: thanks for the text. I used it

See also the next comment.

### Section 2, microloops

Much of Section 2 is a discussion of microloops and exactly how TI-LFA relates
to them. This doesn’t seem like introductory material, especially because the
rest of the specification doesn't talk about microloops at all. I suggest
moving that material to a new (sub)section for that purpose and just mentioning
it in the Introduction, as in something like "microloops are not addressed by
TI-LFA and can be a concern in some deployments. This is discussed in <xref>."

I don't insist that you make this change, the document is still usable without
it. However, I think that the Introduction as it stands is not very useful *as
an introduction* because of the abundance of non-introductory material. (Some
of my subsequent comments fall under the same heading.)
#Ahmed: The discussion about microloops have been quite long and not very pleasant. It took us a great deal of effort get consensus on the current text. So I would rather not touch it

### Section 2, “primary link”

You mention the “primary link” in a few places here (and nowhere else). What is
the “primary link”? Please clarify or re-word. For example, maybe you mean “the
link whose failure is detected”.
#Ahmed: I added definitions to the terms "primary link" and "primary interface" to the terminology section

### Section 2, what value does comparing to older FRR techniques add to an
intro?

    By using SR, TI-LFA does not require the establishment of TLDP
    sessions (Targeted Label Distribution Protocol) with remote nodes in
    order to take advantage of the applicability of remote LFAs (RLFA)
    [RFC7490][RFC7916] or remote LFAs with directed forwarding
    (DLFA)[RFC5714].  All the Segment Identifiers (SIDs) are available in
    the link state database (LSDB) of the IGP.  As a result, preferring
    LFAs over RLFAs or DLFAs, as well as minimizing the number of RLFA or
    DLFA repair nodes is not required anymore.
#Ahmed: I tend to disagree. For an alternative technique that solves that the same problem to be implemented and used,it has to provide additional value. An implementer has to see this value in order to invest money and effort in the implementation. A user has to see value in order to adopt the alternative technique.

I see why you wanted this paragraph during the process of developing this spec
and persuading the WG of its value. I don’t see how it contributes any value to
the final spec. Similarly,

    By using SR, there is no need to create state in the network in order
    to enforce an explicit FRR path.  This relieves the nodes themselves
    from having to maintain extra state, and it relieves the operator
    from having to deploy an extra protocol or extra protocol sessions
    just to enhance the protection coverage.

Appears to just be a restatement of the value proposition of SR itself. I don’t
see value in restating it in this document.
#Ahmed: Not really, it is not a restatement. It is a clarification on why SR is used in this technique

I think you could remove both these paragraphs without harm, and it would make
the document a quicker and clearer read.
#Ahmed: IMO, value is important to persuade investment in implementation as well as user adoption.

### Section 2, encoding challenges

    One of the challenges of TI-
    LFA is to encode the expected post-convergence path by combining
    adjacency segments and node segments.

Do you mean “compactly encode”, “efficiently encode”, or similar? Is encoding,
per se, the challenge?
#Ahmed: encoding "the expected post-convergence path by combiningadjacency segments and node segments" Is the challenge that is intended by this sentence. It is not the word "encoding" only.

### Section 2, roadmap should be a roadmap

I agree with Ben Niven-Jenkins that the omission of Sections 8-10 in the
overview list at the end of the Introduction is a bit jarring to the reader. (I
would add Section 13, too.)
#Ahmed: Corrected.

### Section 3, exclude from that set

I think this is wrong:

    Exclude from that set of neighbors that are reachable from R using X.

Did you mean,

NEW:
    Exclude from that set, the neighbors that are reachable from R using X.
#Ahmed: Thanks for catching that.  I fixed it

### Section 3, defined but not used

    A symmetric network is a network such that the IGP metric of each
    link is the same in both directions of the link.

This definition is never used.
#Ahmed: removed

### Section 6, you can't guarantee that

    is guaranteed to be loop-
    free irrespective of the state of FIBs along the nodes belonging to
    the explicit path.

As written, there’s no way to guarantee that. (Trivial proof, one possible
state of a FIB is to point back to the preceding node along the path. That
might not be an *expected* state, but it is *a* state.) I think this is just a
case of overly casual writing, and you mean that the loop-free property will
exist regardless of whether the nodes belonging to the explicit path have
converged to recognize failure X or not. Consider rewriting along those lines?

#Ahmed: The abstract and few other places explicitly mentions that the draft addresses link-state IGPs. However to avoid confusion, I added at the end of the sentence the phrase

"as long as the states of the FIBs are programmed according to a link-state IGP"


### Section 6 and others

Please supply definitions for “P node” and “Q node”.
#Ahmed. Added sentences to the end of the definitions of P-space and Q-space to define a P node and a Q node, respectivel.
### Section 6, terminology is inconsistent and unclear

I can understand NodeSID(R1) well enough even though you haven't supplied a
definition, it’s the node SID for router R1. But what is “Node_SID(P)”? P isn’t
a router, it’s a set of routers, or a space. Please clarify, whether with a
definition or otherwise. Probably your clarification will fix both this and the
previous point.
#Ahmed:The sentence where "Node_SID(P)" says right before using the term "Node-SID(P)" says "from P to Q when these nodes". Pay special attention to "these nodes". AFAIK a "these" refers to last nouns and the last nouns are "P" and "Q". In addition I added definitions to "P node" and "Q node" to the terminology section based on your comment


While you’re at it, you might as well make your terminology consistent. In one
of the node SID cases above you use an underscore, and in the other, you don’t.
Also, your node SID notation is inconsistent with your adjacency SID notation
which looks like AdjSID_R1R2 — so in one case an underscore and parentheses, in
another case parentheses with no underscore, and in the final case an
underscore with no parentheses. Pick one.

(And then there's Section 12 with "node-SID"...)
#Ahmed: I changed all of them to "node-sid". I'll correct anyone that is missed.

### Sections 6.1, 6.2, 6.3, SHOULD NOT use SHOULD

What work are the SHOULDs doing here? Considering that in other parts of the
document you leave the computation of the repair path up to the implementation,
why are you mandating it here? And, if you’re mandating it, why not mandate it
all the way with a MUST?

#Ahmed: Every time the terms "should" or "should not" is/are used, it is clear what these terms are referring to. If there is a place where the terms  "should" or "should not" are not clean, please point these places out so that I can correct them.

Second, we are not mandating anything in these sections because, as you mentioned, computation are part of the implementation. Hence we did not use "MUST" or "MUST NOT"


It seems to me that if you don’t want to mandate implementation, it would be
sensible to take the RFC 2119 language out altogether. If you do want to
mandate implementation, I don’t see why you wouldn’t make this a MUST. If you
really do want it to be SHOULD, Please explain your reasoning.
#Ahmed: Disagree. There are certain properties that should exist in an implementation. However the implementation itself is out of scope.

### Section 7, primary outgoing interface

The existence of a "primary outgoing interface" seems to imply the existence of
a secondary outgoing interface, tertiary outgoing interface, etc. Please define
primary outgoing interface, or if this isn't an important distinction, consider
whether you can simplify to just say "outgoing interface".

#Ahmed: I added definitions to the terms "primary link" and "primary interface" to the terminology section

### Section 7.1, first

    The active segment becomes the first
    segment of the repair list.

By “first” do you mean “first to be pushed, last to be processed”? If so, I
suggest clarifying that in the text, because the plain English reading of
"first" is the opposite.

#Ahmed: Modified the sentence to be

The active segment becomes the first
        segment after the repair list.


### Section 7.2, "as stated"... where?

    As stated in Section 2, when SR
    policies are involved and a strict compliance of the policy is
    required, an end-to-end protection should be preferred over a local
    repair mechanism.

I don’t see this in section 2 (I searched for “end-to-end”, “prefer”, “policy”
and “policies”). Can you help me understand what text in section 2 you’re
talking about?
#Ahmed: Probably something forgotten from a long time, I removed it

### Section 7.2.1, what's an Adj()?

Please define Adj(). Is this yet another terminology variation? (c.f. Section 6
comment on AdjSID_R1R2)
#Ahmed: Modified all of them to be adj-sid() and added a definition of adj-sid to the terminology section

### Section 8.1, what's the tail end of a node segment?

    1.  If the active segment is a node segment that has been signaled
        with penultimate hop popping and the repair list ends with an
        adjacency segment terminating on the tail-end of the active
        segment, then the active segment MUST be popped before pushing
        the repair list.

What is the tail end of a node segment? I can’t figure out what that means.

I think I know what you’re trying to say, but please find a way to reword it
that doesn’t end up making me try to parse the above with my "what did the
authors actually *mean*?" glasses on.

#Ahmed Modified the text to say

"terminating on a node that advertised NEXT operation of the active segment"


### Section 8.2 and others, description of SRv6 behaviors

RFC 8754 describes forwarding behaviors using a kind of line-numbered
pseudocode, and later documents that modify forwarding behaviors specify
updates to the pseudocode. (Examples: RFC 8986,
draft-ietf-spring-srv6-srh-compression-15,
draft-ietf-rtgwg-srv6-egress-protection-16,
draft-ietf-spring-sr-redundancy-protection-03,
draft-ietf-spring-srv6-path-segment-07)

You don’t do this, you use a descriptive approach instead. I'm ok with this in
isolation, but I’d like to know if you made an affirmative decision to diverge
from the usual SRv6 way of doing things, and if so, why, and if the SPRING
working group specifically considered this and is OK with it.
#Ahmed: When it comes to implementation specifics, we explicitly avoid to mandate anything. Pseudo code may appear to be  a mandate
### Section 8.2, shorter than what?

    In such case, there is no need for a preceding
    Prefix SID and the resulting repair list is likely shorter.

Shorter than what? This is the first place in this document the string “Prefix
SID” occurs, so I’m confused.

#Ahmed: shorter than preceding the adjacency SID with a prefix SID.

For the term prefix-SID, it is defined in the SR architecture RFC [RFC8402]. I added a reference to RFC8402 beside the 1st mention of prefix-SID


### Section 11, limit the implementation of local FRR policies

    Based on this assumption, in order to facilitate the operation of
    FRR, and limit the implementation of local FRR policies

Do you mean "limit the need for implementation of local FRR policies"? (And you
could drop “implementation of” for that matter.)

### Section 11, TI-LFA and SR policies don't mix?

The last paragraph, regarding the use of SR policies, and also Section 9,
leaves me wondering whether a simpler statement would be that TI-LFA is
inappropriate for use in a network that makes use of SR policies. Is this a
fair characterization?
#Ahmed; No it is not. Another overloading similar to overloading the word "key" above :)

### Section 12, can't this be an appendix?
#Ahmed Already moved to an appendix in version 14 as suggested by Gunter

Shouldn’t this be an Appendix? In general, that seems like a common (and good!)
practice for inessential information like this, especially when it has a
potentially limited shelf-life.

Also, although I appreciate that you provided some rudimentary parameterization
of the topologies in Table 1, I think it would be helpful to at least say what
time period the topologies reflect —
draft-francois-rtgwg-segment-routing-ti-lfa-00 dates to summer of 2015; are we
talking about the topologies that were in vogue in 2015? Those of 2023? Etc.

### Section 12, granularity wut

       We do not cover the
       case for 2 SIDs (Section 6.3) separately because there was no
       granularity in the result.

I don’t understand what this means. Can you rephrase it? Generally, I find the
words “granular”. “granularity”, and related have almost zero descriptive
power. :-(

### Section 12, "2 or more" or "2", "3", and no more?

In your description of the table, you say,

The convention that we use is as follows

    *  0 SIDs: the calculated repair path starts with a directly
...
    *  1 SIDs: the repair node is a PQ node, in which case only 1 SID is
...
    *  2 or more SIDs: The repair path consists of 2 or more SIDs as
       described in Section 6.3 and Section 6.4.  We do not cover the
       case for 2 SIDs (Section 6.3) separately
...

But the table headers show:

    +-------------+------------+------------+------------+------------+
    |   Network   |    0 SIDs  |    1 SID   |   2 SIDs   |   3 SIDs   |
    +-------------+------------+------------+------------+------------+

I.e. the table headers don’t show “2 or more” they show 2, and 3, broken out
distinctly, and no "or more" case. Seems like these need to be reconciled.

### Section 13, guaranteed upper bound

    The techniques described in this document are internal
    functionalities to a router that result in the ability to guarantee
    an upper bound on the time taken to restore traffic flow upon the
    failure of a directly connected link or node.

This is the only place in the document where you talk about guaranteed upper
bound. This is a fairly strong promise to make, I think you shouldn't be
mentioning it unless you provide some kind of support for how the guarantee is
provided. Note, I don't question that TI-LFA can be part of the machinery
providing such a guarantee, but without showing your work I don't think you can
make this claim.
#Ahmed: Changed to wording to "can guarantee"

## Notes

This review is in the ["IETF Comments" Markdown format][ICMF], You can use the
[`ietf-comments` tool][ICT] to automatically convert this review into
individual GitHub issues.

[ICMF]:https://github.com/mnot/ietf-comments/blob/main/format.md
[ICT]:https://github.com/mnot/ietf-comments


_______________________________________________
rtgwg mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to