Alvaro Retana has entered the following ballot position for draft-ietf-rtgwg-backoff-algo-07: Discuss
When responding, please keep the subject line intact and reply to all email addresses included in the To and CC lines. (Feel free to cut this introductory paragraph, however.) Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html for more information about IESG DISCUSS and COMMENT positions. The document, along with other ballot positions, can be found here: https://datatracker.ietf.org/doc/draft-ietf-rtgwg-backoff-algo/ ---------------------------------------------------------------------- DISCUSS: ---------------------------------------------------------------------- I am balloting DISCUSS because I believe that this document presents an incomplete and vague description of a specification, which (as is) won't result in consistent implementations. Consistency, through the specification of a standard algorithm is used as the basis to justify this work: "To allow multi-vendor networks to have all routers delay their SPF computations for the same duration, this document specifies a standard algorithm." I am specifically and specially concerned about the fact that there are no defaults or even suggested values: This document does not propose default values for the parameters because these values are expected to be context dependent. Implementations are free to propose their own default values. If the whole purpose of standardizing an algorithm is for different implementation to behave the same way and (specifically) "to have all routers delay their SPF computations for the same duration", then not defining defaults (and not being clear in the recommendations -- more on this below) makes the specification incomplete and vague! Section 6 tries to provide guidelines about defaults, but it falls short! In order to satisfy the goals stated in Section 2, operators are RECOMMENDED to configure delay intervals such that SPF_INITIAL_DELAY <= SPF_SHORT_DELAY and SPF_SHORT_DELAY <= SPF_LONG_DELAY. Why are the operators not REQUIRED to meet that relationship? Are there cases when it is ok not to follow those guidelines? Would (for example) the SPF_LONG_DELAY ever be less than SPF_INITIAL_DELAY? The other Normative Language in this section can't really be enforced, and provide (at best) very weak guidance. When setting (default) values, one SHOULD consider the customers and their application requirements, the computational power of the routers, the size of the network, and, in particular, the number of IP prefixes advertised in the IGP, the frequency and number of IGP events, the number of protocols reactions/computations triggered by IGP SPF (e.g., BGP, PCEP, Traffic Engineering CSPF, Fast ReRoute computations). "SHOULD consider..." How can this statement be Normatively enforced? Using "SHOULD" implies that it is ok to only partially consider the list you provided, or even a different set of criteria. Based on the suggestions above, I can't imagine how a vendor can set default values (even if "free to propose their own")...or how the average network operator could configure anything beyond the numbers that you mentioned as examples in the text. For example, the average network operator might ask: under the same circumstances, should my bigger routers (ones with presumably more computational power) have lower or higher delays with respect to my smaller routers? ... Note that some or all of these factors may change over the life of the network. In case of doubt, it's RECOMMENDED to play it safe and start with safe, i.e., longer timers. How can "playing it safe" be Normatively enforced? For the standard algorithm to be effective in mitigating micro-loops, it is RECOMMENDED that all routers in the IGP domain, or at least all the routers in the same area/level, have exactly the same configured values. [A similar statement is made in Section 7.] If it is so important, why is consistency not mandatory? IOW, why is it only "RECOMMENDED" and not "REQUIRED"? When is it ok to not do it? Back to the point of this DISCUSS, the importance of consistent values is clear! Based on the experience of existing implementations, please specify "safe" default values. ---------------------------------------------------------------------- COMMENT: ---------------------------------------------------------------------- [I know that some of these comments have been brought up in the SecDir and GenArt reviews, but I have not seen an update yet.] (1) Besides the lack of guidance (see above), there are several other inconsistencies throughout the document: (1.1) Section 3: "The HOLDDOWN_INTERVAL MUST be defaulted or configured to be longer than the TIME_TO_LEARN_INTERVAL." Which one, defaulted OR configured? Is it ok for the implementation to provide a default value that doesn't comply with the expectation that the operator will configure the correct value? It seems to me that the definition of MUST doesn't fit with an option. (1.2) Section 4: "If subsequent IGP events are received in a short period of time (TIME_TO_LEARN_INTERVAL)...In this situation, we delay the routing computation by SHORT_SPF_DELAY." Note that Section 3 provided example values for TIME_TO_LEARN_INTERVAL and SHORT_SPF_DELAY as 1 sec and 50-100 ms, respectively. If IGP events are received within the TIME_TO_LEARN_INTERVAL window, then the SPF_DELAY ("delay between the first IGP event...and the start of that routing table computation") set to SHORT_SPF_DELAY will be triggered before TIME_TO_LEARN_INTERVAL...which means that the SPF run after SHORT_SPF_DELAY won't cover all the changes. Is that what you meant, or are you assuming that the SPF_DELAY will start *after* the TIME_TO_LEARN_INTERVAL? (1.3) Section 5.1: "LONG_WAIT: State reached after TIME_TO_LEARN_INTERVAL. In other words, state reached after TIME_TO_LEARN_INTERVAL in state SHORT_WAIT." But Section 3 defines TIME_TO_LEARN_INTERVAL as "the maximum duration typically needed to learn all the IGP events related to a single component failure" -- why don't the events from that single failure start while in QUIET state? OR are you saying (in 5.1) that the TIME_TO_LEARN_INTERVAL is not measured from the initial IGP Event? (1.4) What is the relationship between HOLDDOWN_INTERVAL and the *_SPF_DELAY? I would assume that *_SPF_DELAY is always less than HOLDDOWN_INTERVAL, but the document doesn't specify that relationship anywhere. (1.5) Section 6: "All the parameters MUST be configurable [I-D.ietf-isis-yang-isis-cfg] [I-D.ietf-ospf-yang] at the protocol instance granularity." Given that the references to the YANG models are listed as Informative, what does that statement mean? Is it a directive to what must be included in the models? What about implementations that don't use YANG (yet)? (2) Section 5.4 (FSM Events) (2.1) When will "Transition 7: SPF_TIMER expiration, while in QUIET" happen? Because when an IGP Event occurs in QUIET state, the FSM moves to SHORT_WAIT, the SPF_TIMER should never expire in QUIET state. (2.2) "Transition 3: LEARN_TIMER expiration." is defined between SHORT_WAIT and LONG_WAIT, which (at first glance) seems to match how 5.1 defines "LONG_WAIT:...state reached after TIME_TO_LEARN_INTERVAL in state SHORT_WAIT." However, the LEARN_TIMER in only started when an IGP event happens in QUIET_STATE (transition 1). (2.3) For completeness, the HOLDDOWN_TIMER expiration events (5 and 6) should include resetting all the timers, just in case...and to be consistent with the initialization description. (3) From Section 3: "Routing table computation: Computation of the routing table..." This is a circular definition [1]. I'm sure the authors can figure out a clear way to explain the meaning without using the terms being defined... (4) "Note that previously implemented SPF delay algorithms counted the number of SPF computations." References? Knowing that the references may not be stable (pointing to a vendor's website), you might want to simply remove this sentence and simply make the point in the paragraph as to why a time interval is used. Note that the point of this document is not to compare the specification to "previously implemented algorithms". (5) I am surprised that no other documents "must be read to understand or implement the technology" [2] resulting in no Normative References (beyond rfc2119). I would think that at least the OSPF and ISIS specs should be Normative. (6) For the rtgwg-chairs/Shepherd: A quick scan of the mail archive shows that this document wasn't reviewed by the ospf/isis WGs. Given that what is specified here affects the protocols directly, I would think that formal review is needed. [I note that a couple of the Chairs of the ospf/isis WGs are co-authors of this document, and that a note was indeed sent when the -00 version of the individual draft was published -- still, it would have been nice to at least explicitly inform of the progress.] (7) Nit: "(e.g. Loop Free Alternates..." The closing parenthesis is missing. (8) Nit: Please put a forward reference to 5.1 when the QUIET state is mentioned in Section 4. (9) Nit: s/QUIET_STATE/QUIET state. [1] https://en.wikipedia.org/wiki/Circular_definition [2] https://www.ietf.org/iesg/statement/normative-informative.html _______________________________________________ rtgwg mailing list rtgwg@ietf.org https://www.ietf.org/mailman/listinfo/rtgwg