As part of doing the shepherd write-up for this document, I did a review of the draft.
My comments are shown below as a diff on draft-ietf-rtgwg-spf-uloop-pb-statement-06.txt. They can also be viewed at: https://github.com/cbowers/outgoing-feedback-on-ietf-drafts-2018/commit/c1c5018f857e9c7c0f4123c3de1e87041178e387 Thanks, Chris ============= diff --git a/draft-ietf-rtgwg-spf-uloop-pb-statement-06.txt b/draft-ietf-rtgwg-spf-uloop-pb-statement-06.txt index 353ce3c..3dff746 100644 --- a/draft-ietf-rtgwg-spf-uloop-pb-statement-06.txt +++ b/draft-ietf-rtgwg-spf-uloop-pb-statement-06.txt @@ -21,7 +21,16 @@ Abstract In this document, we are trying to analyze the impact of using different Link State IGP implementations in a single network in - regards of micro-loops. The analysis is focused on the SPF triggers + regards of micro-loops. + +======= +[CB] + In this document, we are trying to analyze the impact of using + different Link State IGP implementations in a single network, with + respect to micro-loops. + +======== + The analysis is focused on the SPF triggers and SPF delay algorithm. Requirements Language @@ -95,13 +104,39 @@ Table of Contents Link State IGP protocols are based on a topology database on which an SPF (Shortest Path First) algorithm like Dijkstra is implemented to find the optimal routing paths. - + + ===== + [CB] proposed modified text since the Shortest Path First algorithm and + Djikstra algorithm are essentially synonomous. Also propose to use + "consistent set of non-looping routing paths", since shortest path routing + is often not optimal from a traffic engineering perspective. + + [proposed text] + Link State IGP protocols are based on a topology database on which the + SPF (Shortest Path First) algorithm is run to + find a consistent set of non-looping routing paths. + + ===== + Specifications like IS-IS ([RFC1195]) propose some optimizations of the route computation (See Appendix C.1) but not all the implementations are following those not mandatory optimizations. +============ +[CB] [proposed text] +but not all implementations follow those non-mandatory +optimizations. +============= + We will call "SPF trigger", the events that would lead to a new SPF computation based on the topology. + +============ +[CB] [proposed text] + We will call "SPF triggers", the events that would lead to a new SPF + computation based on the topology. +============= + Link State IGP protocols, like OSPF ([RFC2328]) and IS-IS ([RFC1195]), are using multiple timers to control the router behavior @@ -118,11 +153,27 @@ Internet-Draft spf-microloop January 2018 Some of those timers are standardized in protocol specification, some are not especially the SPF computation related timers. + +============ +[CB] [proposed text] + Some of those timers are standardized in protocol specification, while some + are not. The SPF computation related timers have generally remained + unspecified. +============= For non standardized timers, implementations are free to implement it in any way. For some standardized timer, we can also see that rather than using static configurable values for such timer, implementations may offer dynamically adjusted timers to help controlling the churn. + +============ +[CB] In the dicussion above, it is unclear about what the meaning of "timer" is. +Is it the numerical value of a timer? Is it the trigger conditions and logic +for a timer to start or be reset? Is the the action taken when the timer expires? +Perhaps the text could clarified by referring to "timer behavior" and "timer values" + +============= + We will call "SPF delay", the timer that exists in most implementations that specifies the required delay before running SPF @@ -138,6 +189,17 @@ Internet-Draft spf-microloop January 2018 Some micro-loop mitigation techniques have been defined by IETF (e.g. [RFC6976], [I-D.ietf-rtgwg-uloop-delay]) but are not implemented due to complexity or are not providing a complete mitigation. + +========== +[CB] +This paragraph needs to be clearer. +[proposed text] + Two micro-loop mitigation techniques have been defined by the IETF. + [RFC6976] has not been widely implemented, presumably due to the complexity + of the technique. [I-D.ietf-rtgwg-uloop-delay] has been implemented. + However, it does not prevent all micro-loops that can occur + for a given topology and failure scenario. +========== In multi-vendor networks, using different implementations of a link state protocol may favor micro-loops creation during the convergence @@ -185,17 +247,24 @@ Internet-Draft spf-microloop January 2018 will forward the traffic to C through B, but as B as not converged yet, B will loop back traffic to A, leading to a micro-loop. +======== +[CB] +Figure 1 and figure 4 are essentially the same topology, but the nodes +have different names. I think it would be much better for the reader of this +document to consolidate the two figures into a single figure. +======== + The micro-loop appears due to the asynchronous convergence of nodes in a network when an event occurs. - Multiple factors (and combination of these factors) may increase the + Multiple factors (or a combination of these factors) may increase the probability for a micro-loop to appear: o the delay of failure notification: the more B is advised of the failure later than A, the more a micro-loop may have a chance to appear. - o the SPF delay: most of the implementations supports a delay for + o the SPF delay: most implementations support a delay for the SPF computation to try to catch as many events as possible. If A uses an SPF delay timer of x msec and B uses an SPF delay timer of y msec and x < y, B would start converging after A @@ -204,8 +273,8 @@ Internet-Draft spf-microloop January 2018 o the SPF computation time: mostly a matter of CPU power and optimizations like incremental SPF. If A computes its SPF faster than B, there is a chance for a micro-loop to appear. CPUs are - today faster enough to consider SPF computation time as - negligeable (order of msec in a large network). + today fast enough to consider SPF computation time as + negligible (on the order of milliseconds in a large network). o the SPF computation order: an SPF trigger can be common to multiple IGP areas or levels (e.g., IS-IS Level1/Level2) or for @@ -215,8 +284,8 @@ Internet-Draft spf-microloop January 2018 done in A and B for each area/level/topology/SPF-algorithm is different, there is a possibility for a micro-loop to appear. - o the RIB and FIB prefix insertion speed or ordering: highly - implementation dependant. + o the RIB and FIB prefix insertion speed or ordering. This is highly + dependent on the implementation. @@ -225,22 +294,21 @@ Litkowski, et al. Expires July 28, 2018 [Page 4] Internet-Draft spf-microloop January 2018 - This document will focus on analysis SPF delay (and associated - triggers). + This document will focus on analysis of the SPF delay behavior and the associated + triggers. 3. SPF trigger strategies - Depending of the change advertised in LSP/LSA, the topology may be + Depending on the change advertised in an LSPDU or LSA, the topology may be affected or not. An implementation may avoid running the SPF computation (and may only run IP reachability computation instead) if - the advertised change is not affecting topology. + the advertised change does not affect the topology. Different strategies exists to trigger the SPF computation: - 1. An implementation may always run a full SPF whatever the change - to process. + 1. An implementation may always run a full SPF for any type of change. - 2. An implementation may run a full SPF only when required: e.g. if + 2. An implementation may run a full SPF only when required. For example, if a link fails, a local node will run an SPF for its local LSP update. If the LSP from the neighbor (describing the same failure) is received after SPF has started, the local node can @@ -250,26 +318,28 @@ Internet-Draft spf-microloop January 2018 3. If the topology does not change, an implementation may only recompute the IP reachability. - As pointed in Section 1, SPF optimizations are not mandatory in - specifications, leading to multiple strategies to be implemented. + As noted in Section 1, SPF optimizations are not mandatory in + specifications. This has led to the implementation of + different strategies. 4. SPF delay strategies Implementations of link state routing protocols use different - strategies to delay the SPF computation. We usually see the - following: + strategies to delay the SPF computation. The two most + common SPF delay behaviors are the following. - 1. Two steps delay. + 1. Two phase delay. 2. Exponential backoff delay. - Those behavior will be explained in the next sections. + These behaviors are described in the following sections. -4.1. Two steps SPF delay +4.1. Two phase SPF delay - The SPF delay is managed by four parameters: + For the two phase SPF delay, the SPF delay is managed by four parameters: - o Rapid delay: amount of time to wait before running SPF. + o Rapid delay: amount of time to wait before running SPF, after the + initial SPF trigger event. @@ -281,13 +351,13 @@ Litkowski, et al. Expires July 28, 2018 [Page 5] Internet-Draft spf-microloop January 2018 - o Rapid runs: amount of consecutive SPF runs that can use the rapid - delay. When the amount is exceeded the delay moves to the slow + o Rapid runs: the number of consecutive SPF runs that can use the rapid + delay. When the number is exceeded, the delay moves to the slow delay value . o Slow delay: amount of time to wait before running SPF. - o Wait time: amount of time to wait without events before going back + o Wait time: amount of time to wait without receiving SPF trigger events before going back to the rapid delay. Example: Rapid delay = 50msec, Rapid runs = 3, Slow delay = 1sec, @@ -308,7 +378,9 @@ Internet-Draft spf-microloop January 2018 | | | | || | | < wait time > - Figure 2 - Two steps delay algorithm + Figure 2 - Two phase delay algorithm + + 4.2. Exponential backoff @@ -394,13 +466,20 @@ Internet-Draft spf-microloop January 2018 for delaying PRC. We consider that E is using a SPF trigger strategy - that always compute Full SPF and exponential backoff strategy for SPF + that always computes a Full SPF for any change, and uses the exponential backoff strategy for SPF delay (start=150ms, inc=150ms, max=1s) We also consider the following sequence of events (note : the time scale does not intend to represent a real router time scale where jitters are introduced to all timers) : +========== +[CB] +This note about jitter and time scale (or timeline) is not clear. I suggest describing +it in more detail or deleting it. +========== + + o t0=0 ms: a prefix is declared down in the network. We consider this event to happen at time=0. @@ -487,12 +566,12 @@ Internet-Draft spf-microloop January 2018 Route computation event time scale In the table above, we can see that due to discrepancies in the SPF - management, after multiple events (of a different type), the values - of the SPF delay are completely misaligned between nodes leading to - long micro-loops creation. + management, after multiple events of a different type, the values + of the SPF delay are completely misaligned between node S and node E, + leading to the creation of micro-loops. - The same issue can also appear with only single type of events as - displayed below: + The same issue can also appear with only a single type of event as + shown below: +--------+--------------------+------------------+------------------+ | Time | Network Event | Router S events | Router E events | @@ -587,6 +666,28 @@ Internet-Draft spf-microloop January 2018 6. Proposed work items +=============== +[CB] +Since we are publishing this document after the SPF backoff algorithm +draft is published, I think the list of three proposed work items below will be +confusing. Someone reading this RFC will wonder why the +SPF backoff algorithm RFC (which will have an earlier RFC number) +doesn't satisfy the list of proposed work items. + +Perhaps this section should be renamed something like +"Benefits of standardized SPF delay behavior", and the list of proposed +work items should be removed. + +It may also make sense to explicitly say that the +SPF backoff algorithm draft/RFC is a solution that +satisfies this problem statement. +And that we are publishing the document in order to +capture the reasoning that led to that draft. Text to this +effect should probably go in the introduction, instead +of this section. + +=============== + In order to enhance the current Link State IGP behavior, authors would encourage working on standardization of some behaviours. @@ -603,14 +704,23 @@ Internet-Draft spf-microloop January 2018 Using the same event sequence as in figure 2, we may expect fewer and/or shorter micro-loops using standardized implementations. + +=========== +[CB] I think the text should refer to one of the previous tables and not Figure 2. +Figure 2 shows the two step delay algorithm. +=========== +--------+--------------------+------------------+------------------+ | Time | Network Event | Router S events | Router E events | +--------+--------------------+------------------+------------------+ | t0=0 | Prefix DOWN | | | | 10ms | | Schedule PRC (in | Schedule SPF (in | - - + +=========== +[CB] +It seems like there is a typo here. Presumably router E should schedule a +PRC (not an SPF) at 10ms in this table. +=========== Litkowski, et al. Expires July 28, 2018 [Page 11] ^L @@ -677,13 +787,48 @@ Internet-Draft spf-microloop January 2018 +--------+--------------------+------------------+------------------+ Route computation event time scale - + +============= +[CB] +I think the term "time scale" throughout this document is not the right one. +Perhaps the term "timeline" would be better or the phrase "sequence of events". +============= +[CB] +There are several different tables with the same caption +"Route computation event time scale". +Regardless of the replacement term for "time scale", it would be helpful to make a +distinction between the tables with each caption. For example, this last +table could have a caption like "Route computation when S and E use the +same standardized behavior". + +========== As displayed above, there could be some other parameters like router computation power, flooding timers that may also influence micro- loops. In Figure 4, we consider E to be a bit slower than S, leading - to micro-loop creation. Despite of this, we expect that by aligning + to micro-loop creation. + +================= +[CB] +There is nothing in Figure 4 that shows that that E is slower than S. +Perhaps it would be clearer to say something like: +"In all of the +examples in this document comparing the SPF timer behavior of +router S and router E, we have made router E a bit slower than +router S. This can lead to microloops even when both S and E use +a common standardized SPF behavior. +================= + + + Despite of this, we expect that by aligning implementations at least on SPF trigger and SPF delay, service provider may reduce the number and the duration of micro-loops. +=================== +[CB] +"Despite of this" should read "In spite of this" or "Despite this". +Or in this case "However" might be better. + +s/service provider/service providers/ +================== 7. Security Considerations
_______________________________________________ rtgwg mailing list firstname.lastname@example.org https://www.ietf.org/mailman/listinfo/rtgwg