Thanks for the thorough review
I am in the process of uploading version 15 of the document. So whenever
I say "fixed", "changed", "modified",..., etc, I am referring to version 15
See inline comments at #Ahmed
Ahmed
On 4/16/24 1:14 PM, John Scudder via Datatracker wrote:
John Scudder has entered the following ballot position for
draft-ietf-rtgwg-segment-routing-ti-lfa-13: Discuss
When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)
Please refer tohttps://www.ietf.org/about/groups/iesg/statements/handling-ballot-positions/
for more information about how to handle DISCUSS and COMMENT positions.
The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-rtgwg-segment-routing-ti-lfa/
----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------
# John Scudder, RTG AD, comments for draft-ietf-rtgwg-segment-routing-ti-lfa-13
CC @jgscudder
Thanks for this document. The technology is valuable, and the underlying
techniques sound. Despite what you might guess from my abundance of comments, I
like this document. I suspect it has suffered from having been edited
repeatedly by the same set of experts, it becomes hard to see it as a new
reader might. So, please accept my comments in the spirit of a new set of eyes
looking over some long-established text.
Although most of my remarks are non-blocking COMMENTs, I am putting two in as
DISCUSS points. Some of my other comments are of a similar nature, but I think
they're less serious for the overall coherence of the document.
## DISCUSS
### Whole document, is post-convergence of the essence, or not?
#Ahmed No. THere are many aspects in the documents that indicates that
the whole document is not post convergence, although post-covergence
aspect is a very important aspect of the document. In fact the
sentence starting with "Although not a Ti-LFA requirement or
constraint," that you quoted in the comment after the next one
explicitly mentions that post-convergence is not a constrain or a
requiremehnt
The document seems to be arguing with itself about whether following the
post-convergence path is, or is not an essential/required feature. In the
Abstract:
A key aspect of
TI-LFA is the FRR path selection approach establishing protection
over the expected post-convergence paths from the point of local
repair
It's a *key* aspect! OK! But then, in the Introduction:
Although not a Ti-LFA requirement or constraint, TI-LFA also brings
the benefit of the ability to provide a backup path that follows the
expected post-convergence path
#Ahmed: I do not understand why you think that the work "key" implies
"only" or "must".
Wait, it's "key" but "not a requirement or constraint"? Moving on to Section 6,
#Ahmed: Again I do not understand how you came to the conclusion that
"key" implies "requirement" or "constraint" even though the document
explicitly mentions that post-convergence is neither a requirement nor a
contraint
The repair list encodes the
explicit post-convergence path to the destination
So it "encodes the explicit post-convergence path". "Encodes", not "might
encode" or "can encode". So the Abstract is right and Section 2 is wrong. But
wait there's more! Later in Section 11,
#Ahmed: I really do not understand why you want to push "must be post
convergence" or "only post-convergence" down the throat of this draft.
Does the sentence say "the repair list ****only**** encodes"??.
traffic can
be steered by the PLR onto its expected post-convergence path during
the FRR phase
So it "can"... which implies "doesn't have to be".
#Ahmed: Of course it implies "doesn't have to be". If there is any
sentence in the draft that says that a ti-lfa repair path must be on
post convergence, I would be more than happy to re-word it.
There's more, for example, all of Section 5 talks about post-convergence paths,
and there are many more mentions in Section 6 too.
#Ahmed: Of course. It is a "key" attribute of the post convergence path
and big advantage of ti-lfa, but not a "must-have", "a constraint", nor
a "requirement" of ti-lfa
Given that Sections 5-8 seem to come closest to being the normative ones
(though the document is sadly not very precise in this regard) I'm left with
the impression that the Abstract is right ("key"), and the quoted passages of
Sections 2 and 11 are wrong. In any case, I think this needs to be resolved in
some way.
#Ahmed: IMO is no problem to solve as I mentioned in the replies to your
above comments, which are hinged on a the **very big** assumption that
"key" implies "constraint", "requirement" and/or "must-have"
### Section 10, multiple unrelated failures
Implementations of TI-LFA should deal with the
occurence of multiple unrelated failures in accordance to the IP Fast
Reroute Framework [RFC5714].
(Nit, you misspelled "occurrence".)
#Ahmed: Fixed in version 15 (coming soon)
Can you explain what you mean by this sentence? I haven't reviewed it carefully
but RFC 5714 is a framework, and I don’t know what it means to be "in
accordance to" it. The only relevant text I was able to find in it was RFC 5714
Section 5.2.6,
However, it is important that the occurrence of
a second failure while one failure is undergoing repair should not
result in a level of service which is significantly worse than that
which would have been achieved in the absence of any repair strategy.
Putting that together with the quote from your specification, I come up with an
interpretation like "it's important to behave reasonably in the face of
multiple failures, we aren't going tell you how to do it, this other document
we are citing isn't going to tell you how to do it either, it's just going to
tell you that our specification was supposed to cover this, but we didn't.”
#Ahmed: The paragraph starting with "for each destination" near the
bottom of page 5 indicates clearly mentioned that this document covers
only a single failure. The above sentence refers the implementer to the
framework of FRR.
I do not understand what is your concern?
----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------
## COMMENTS
### General Gripe, Y U no XML?
It looks like you must have uploaded the TXT rendering instead of the XML
source. Whenever it is you upload your next revision, please consider uploading
the XML source instead. The set of renderings available when you upload text is
inferior to the renderings available when you upload source (notably, the
modern HTML rendering is not available). If there's some reason uploading
source is difficult or impossible for you, disregard this request of course.
(The only reason I can think of for it to be hard is if you used a less-common
tool to produce your draft, for example, nroff or the Microsoft Word template.
If you use the more mainstream XML or MD workflows, it should be easy enough.)
#Ahmed: I used https://author-tools.ietf.org/
### Abstract, for want of a hyphen the meaning was lost
It extends these concepts to provide guaranteed coverage in
any two connected networks using a link-state IGP.
The only way I can make sense of this is to add a hyphen:
It extends these concepts to provide guaranteed coverage in
any two-connected networks using a link-state IGP.
#Ahmed: Added the hyphen in version 15. Thanks for the comment
Is that what you meant? I would also suggest changing "using" to "that uses",
as in,
It extends these concepts to provide guaranteed coverage in
any two-connected network that uses a link-state IGP.
I'm not sure the concept of two-connectedness is universally understood enough
that it's suitable for use in an Abstract, so I'd support further editing to
get rid of the graph theory term entirely, but at least this edit makes it
correct.
#Ahmed: 2-connectness is an important requirement for ti-lfa to be
successful. I am 100% sure that all network engineers know what
"two-connected means, and, if not are quite capable of looking up the
term "two-connected graph"
### Section 1, orphan definitions
- RSPT is defined, but never used. Delete?
#Ahmed: Deleted
- SLA is defined, only used once (and in a comment below I suggest deleting
that use!). Delete and just expand on first use, if the first-and-only use is
kept at all?
#Ahmed: deleted
- SPT is defined, but never used. Delete?
#Ahmed: deleted
- SRGB is defined, only used once. Delete and just expand on first use?
#Ahmed: Deleted and expanded the first use
- TLDP is defined, only used once. Delete and just expand on first use? (If you
keep it, please correct the nit from Ben Niven-Jenkins’ RTGAREA review.)
#Ahmed: Deleted the definition. The first use is already expanded
### Section 2, "aims at"
Segment Routing aims at supporting services with tight SLA guarantees
[RFC8402].
This seems to be rewriting history. That is to say, sure SR “aims at supporting
services with tight SLA guarantees”, but only in the sense that every
general-purpose packet transport does; without specifics this isn’t very
meaningful. In any case, the citation doesn’t supply evidence for the
statement, indeed the string “SLA” or “service level” never occurs in RFC 8402.
Maybe just remove this sentence? It doesn’t appear essential in any case.
#Ahmed: the sentence has been removed in the latest version as of today
(version 14)
### Section 2, can't parse this sentence
By relying on SR this document provides a local repair
mechanism for standard link-state IGP shortest path capable of
restoring end-to-end connectivity in the case of a sudden directly
connected failure of a network component.
I can’t parse this sentence. I am *guessing* that what you mean is something
like,
NEW:
This document leverages Segment Routing (SR) to provide a local
repair mechanism for a shortest path computed by a standard
link-state IGP. The local repair is capable of restoring end-to-end
connectivity in the case of a failure of a directly connected
network component.
If this is what you mean, you’re welcome to use this text if you want. If it’s
not what you mean, please explain?
#Ahmed: The text has been changed to be very close to what you are
suggesting in version 14. I hope the latest text is satisfactory
Note I removed “sudden” in my NEW text because I’m guessing you don’t mean to
exclude a gradual or a foreseeable failure. A failure is a failure, after all.
### Section 2, When the network reconverges
When the network reconverges
I suggest,
NEW:
When the network reconverges after a failure
#Ahmed: thanks for the text. I used it
See also the next comment.
### Section 2, microloops
Much of Section 2 is a discussion of microloops and exactly how TI-LFA relates
to them. This doesn’t seem like introductory material, especially because the
rest of the specification doesn't talk about microloops at all. I suggest
moving that material to a new (sub)section for that purpose and just mentioning
it in the Introduction, as in something like "microloops are not addressed by
TI-LFA and can be a concern in some deployments. This is discussed in <xref>."
I don't insist that you make this change, the document is still usable without
it. However, I think that the Introduction as it stands is not very useful *as
an introduction* because of the abundance of non-introductory material. (Some
of my subsequent comments fall under the same heading.)
#Ahmed: The discussion about microloops have been quite long and not
very pleasant. It took us a great deal of effort get consensus on the
current text. So I would rather not touch it
### Section 2, “primary link”
You mention the “primary link” in a few places here (and nowhere else). What is
the “primary link”? Please clarify or re-word. For example, maybe you mean “the
link whose failure is detected”.
#Ahmed: I added definitions to the terms "primary link" and "primary
interface" to the terminology section
### Section 2, what value does comparing to older FRR techniques add to an
intro?
By using SR, TI-LFA does not require the establishment of TLDP
sessions (Targeted Label Distribution Protocol) with remote nodes in
order to take advantage of the applicability of remote LFAs (RLFA)
[RFC7490][RFC7916] or remote LFAs with directed forwarding
(DLFA)[RFC5714]. All the Segment Identifiers (SIDs) are available in
the link state database (LSDB) of the IGP. As a result, preferring
LFAs over RLFAs or DLFAs, as well as minimizing the number of RLFA or
DLFA repair nodes is not required anymore.
#Ahmed: I tend to disagree. For an alternative technique that solves
that the same problem to be implemented and used,it has to provide
additional value. An implementer has to see this value in order to
invest money and effort in the implementation. A user has to see value
in order to adopt the alternative technique.
I see why you wanted this paragraph during the process of developing this spec
and persuading the WG of its value. I don’t see how it contributes any value to
the final spec. Similarly,
By using SR, there is no need to create state in the network in order
to enforce an explicit FRR path. This relieves the nodes themselves
from having to maintain extra state, and it relieves the operator
from having to deploy an extra protocol or extra protocol sessions
just to enhance the protection coverage.
Appears to just be a restatement of the value proposition of SR itself. I don’t
see value in restating it in this document.
#Ahmed: Not really, it is not a restatement. It is a clarification on
why SR is used in this technique
I think you could remove both these paragraphs without harm, and it would make
the document a quicker and clearer read.
#Ahmed: IMO, value is important to persuade investment in implementation
as well as user adoption.
### Section 2, encoding challenges
One of the challenges of TI-
LFA is to encode the expected post-convergence path by combining
adjacency segments and node segments.
Do you mean “compactly encode”, “efficiently encode”, or similar? Is encoding,
per se, the challenge?
#Ahmed: encoding "the expected post-convergence path by
combiningadjacency segments and node segments" Is the challenge that is
intended by this sentence. It is not the word "encoding" only.
### Section 2, roadmap should be a roadmap
I agree with Ben Niven-Jenkins that the omission of Sections 8-10 in the
overview list at the end of the Introduction is a bit jarring to the reader. (I
would add Section 13, too.)
#Ahmed: Corrected.
### Section 3, exclude from that set
I think this is wrong:
Exclude from that set of neighbors that are reachable from R using X.
Did you mean,
NEW:
Exclude from that set, the neighbors that are reachable from R using X.
#Ahmed: Thanks for catching that. I fixed it
### Section 3, defined but not used
A symmetric network is a network such that the IGP metric of each
link is the same in both directions of the link.
This definition is never used.
#Ahmed: removed
### Section 6, you can't guarantee that
is guaranteed to be loop-
free irrespective of the state of FIBs along the nodes belonging to
the explicit path.
As written, there’s no way to guarantee that. (Trivial proof, one possible
state of a FIB is to point back to the preceding node along the path. That
might not be an *expected* state, but it is *a* state.) I think this is just a
case of overly casual writing, and you mean that the loop-free property will
exist regardless of whether the nodes belonging to the explicit path have
converged to recognize failure X or not. Consider rewriting along those lines?
#Ahmed: The abstract and few other places explicitly mentions that the
draft addresses link-state IGPs. However to avoid confusion, I added at
the end of the sentence the phrase
"as long as the states of the FIBs are programmed according to a
link-state IGP"
### Section 6 and others
Please supply definitions for “P node” and “Q node”.
#Ahmed. Added sentences to the end of the definitions of P-space and
Q-space to define a P node and a Q node, respectivel.
### Section 6, terminology is inconsistent and unclear
I can understand NodeSID(R1) well enough even though you haven't supplied a
definition, it’s the node SID for router R1. But what is “Node_SID(P)”? P isn’t
a router, it’s a set of routers, or a space. Please clarify, whether with a
definition or otherwise. Probably your clarification will fix both this and the
previous point.
#Ahmed:The sentence where "Node_SID(P)" says right before using the term
"Node-SID(P)" says "from P to Q when these nodes". Pay special attention
to "these nodes". AFAIK a "these" refers to last nouns and the last
nouns are "P" and "Q". In addition I added definitions to "P node" and
"Q node" to the terminology section based on your comment
While you’re at it, you might as well make your terminology consistent. In one
of the node SID cases above you use an underscore, and in the other, you don’t.
Also, your node SID notation is inconsistent with your adjacency SID notation
which looks like AdjSID_R1R2 — so in one case an underscore and parentheses, in
another case parentheses with no underscore, and in the final case an
underscore with no parentheses. Pick one.
(And then there's Section 12 with "node-SID"...)
#Ahmed: I changed all of them to "node-sid". I'll correct anyone that is
missed.
### Sections 6.1, 6.2, 6.3, SHOULD NOT use SHOULD
What work are the SHOULDs doing here? Considering that in other parts of the
document you leave the computation of the repair path up to the implementation,
why are you mandating it here? And, if you’re mandating it, why not mandate it
all the way with a MUST?
#Ahmed: Every time the terms "should" or "should not" is/are used, it is
clear what these terms are referring to. If there is a place where the
terms "should" or "should not" are not clean, please point these places
out so that I can correct them.
Second, we are not mandating anything in these sections because, as you
mentioned, computation are part of the implementation. Hence we did not
use "MUST" or "MUST NOT"
It seems to me that if you don’t want to mandate implementation, it would be
sensible to take the RFC 2119 language out altogether. If you do want to
mandate implementation, I don’t see why you wouldn’t make this a MUST. If you
really do want it to be SHOULD, Please explain your reasoning.
#Ahmed: Disagree. There are certain properties that should exist in an
implementation. However the implementation itself is out of scope.
### Section 7, primary outgoing interface
The existence of a "primary outgoing interface" seems to imply the existence of
a secondary outgoing interface, tertiary outgoing interface, etc. Please define
primary outgoing interface, or if this isn't an important distinction, consider
whether you can simplify to just say "outgoing interface".
#Ahmed: I added definitions to the terms "primary link" and "primary
interface" to the terminology section
### Section 7.1, first
The active segment becomes the first
segment of the repair list.
By “first” do you mean “first to be pushed, last to be processed”? If so, I
suggest clarifying that in the text, because the plain English reading of
"first" is the opposite.
#Ahmed: Modified the sentence to be
The active segment becomes the first
segment after the repair list.
### Section 7.2, "as stated"... where?
As stated in Section 2, when SR
policies are involved and a strict compliance of the policy is
required, an end-to-end protection should be preferred over a local
repair mechanism.
I don’t see this in section 2 (I searched for “end-to-end”, “prefer”, “policy”
and “policies”). Can you help me understand what text in section 2 you’re
talking about?
#Ahmed: Probably something forgotten from a long time, I removed it
### Section 7.2.1, what's an Adj()?
Please define Adj(). Is this yet another terminology variation? (c.f. Section 6
comment on AdjSID_R1R2)
#Ahmed: Modified all of them to be adj-sid() and added a definition of
adj-sid to the terminology section
### Section 8.1, what's the tail end of a node segment?
1. If the active segment is a node segment that has been signaled
with penultimate hop popping and the repair list ends with an
adjacency segment terminating on the tail-end of the active
segment, then the active segment MUST be popped before pushing
the repair list.
What is the tail end of a node segment? I can’t figure out what that means.
I think I know what you’re trying to say, but please find a way to reword it
that doesn’t end up making me try to parse the above with my "what did the
authors actually *mean*?" glasses on.
#Ahmed Modified the text to say
"terminating on a node that advertised NEXT operation of the active segment"
### Section 8.2 and others, description of SRv6 behaviors
RFC 8754 describes forwarding behaviors using a kind of line-numbered
pseudocode, and later documents that modify forwarding behaviors specify
updates to the pseudocode. (Examples: RFC 8986,
draft-ietf-spring-srv6-srh-compression-15,
draft-ietf-rtgwg-srv6-egress-protection-16,
draft-ietf-spring-sr-redundancy-protection-03,
draft-ietf-spring-srv6-path-segment-07)
You don’t do this, you use a descriptive approach instead. I'm ok with this in
isolation, but I’d like to know if you made an affirmative decision to diverge
from the usual SRv6 way of doing things, and if so, why, and if the SPRING
working group specifically considered this and is OK with it.
#Ahmed: When it comes to implementation specifics, we explicitly avoid
to mandate anything. Pseudo code may appear to be a mandate
### Section 8.2, shorter than what?
In such case, there is no need for a preceding
Prefix SID and the resulting repair list is likely shorter.
Shorter than what? This is the first place in this document the string “Prefix
SID” occurs, so I’m confused.
#Ahmed: shorter than preceding the adjacency SID with a prefix SID.
For the term prefix-SID, it is defined in the SR architecture RFC
[RFC8402]. I added a reference to RFC8402 beside the 1st mention of
prefix-SID
### Section 11, limit the implementation of local FRR policies
Based on this assumption, in order to facilitate the operation of
FRR, and limit the implementation of local FRR policies
Do you mean "limit the need for implementation of local FRR policies"? (And you
could drop “implementation of” for that matter.)
### Section 11, TI-LFA and SR policies don't mix?
The last paragraph, regarding the use of SR policies, and also Section 9,
leaves me wondering whether a simpler statement would be that TI-LFA is
inappropriate for use in a network that makes use of SR policies. Is this a
fair characterization?
#Ahmed; No it is not. Another overloading similar to overloading the
word "key" above :)
### Section 12, can't this be an appendix?
#Ahmed Already moved to an appendix in version 14 as suggested by Gunter
Shouldn’t this be an Appendix? In general, that seems like a common (and good!)
practice for inessential information like this, especially when it has a
potentially limited shelf-life.
Also, although I appreciate that you provided some rudimentary parameterization
of the topologies in Table 1, I think it would be helpful to at least say what
time period the topologies reflect —
draft-francois-rtgwg-segment-routing-ti-lfa-00 dates to summer of 2015; are we
talking about the topologies that were in vogue in 2015? Those of 2023? Etc.
### Section 12, granularity wut
We do not cover the
case for 2 SIDs (Section 6.3) separately because there was no
granularity in the result.
I don’t understand what this means. Can you rephrase it? Generally, I find the
words “granular”. “granularity”, and related have almost zero descriptive
power. :-(
### Section 12, "2 or more" or "2", "3", and no more?
In your description of the table, you say,
The convention that we use is as follows
* 0 SIDs: the calculated repair path starts with a directly
...
* 1 SIDs: the repair node is a PQ node, in which case only 1 SID is
...
* 2 or more SIDs: The repair path consists of 2 or more SIDs as
described in Section 6.3 and Section 6.4. We do not cover the
case for 2 SIDs (Section 6.3) separately
...
But the table headers show:
+-------------+------------+------------+------------+------------+
| Network | 0 SIDs | 1 SID | 2 SIDs | 3 SIDs |
+-------------+------------+------------+------------+------------+
I.e. the table headers don’t show “2 or more” they show 2, and 3, broken out
distinctly, and no "or more" case. Seems like these need to be reconciled.
### Section 13, guaranteed upper bound
The techniques described in this document are internal
functionalities to a router that result in the ability to guarantee
an upper bound on the time taken to restore traffic flow upon the
failure of a directly connected link or node.
This is the only place in the document where you talk about guaranteed upper
bound. This is a fairly strong promise to make, I think you shouldn't be
mentioning it unless you provide some kind of support for how the guarantee is
provided. Note, I don't question that TI-LFA can be part of the machinery
providing such a guarantee, but without showing your work I don't think you can
make this claim.
#Ahmed: Changed to wording to "can guarantee"
## Notes
This review is in the ["IETF Comments" Markdown format][ICMF], You can use the
[`ietf-comments` tool][ICT] to automatically convert this review into
individual GitHub issues.
[ICMF]:https://github.com/mnot/ietf-comments/blob/main/format.md
[ICT]:https://github.com/mnot/ietf-comments
_______________________________________________
rtgwg mailing list -- [email protected]
To unsubscribe send an email to [email protected]