Hi Mahesh, Thanks for posting the update to this draft.
I'll now wait for the response to the review of the sequence number draft and the updates. Once we close on that, I will do a quick round of check across the 3 documents before progressing them together. Thanks, Ketan On Tue, Jun 10, 2025 at 11:37 AM Ketan Talaulikar <[email protected]> wrote: > Hi Mahesh, > > Thanks for sharing the updates. It looks good to me. Just some minor > suggestions: > > 1) Please see if you can leave out the "on the box" part out of the text > related to further analysis of the provided BFD packet loss statistics. > This will leave the door open for both on/off box (as in telemetry based) > solutions. Also, consider if you would like to add the sentence about > leveraging this new reported stat along with the received packet stats from > RFC 9314 for determination of session stability - perhaps the same one that > is there in Appendix A? > > 2) You could also leave out the BFD Sequence Number reference from Section > 2. The reference to the sequence numbers draft in section 6 is correct and > entirely informative. > > 3) In Appendix A: s/experimental/experiment > > Please go ahead and post the update. That way the WG will also have some > time to review while the authors work on the changes to the sequence > numbers draft. > > Thanks, > Ketan > > > On Tue, Jun 10, 2025 at 3:54 AM Mahesh Jethanandani < > [email protected]> wrote: > >> Hi Ketan, >> >> Please find enclosed the proposed changes to the draft. >> >> >> >> On Jun 9, 2025, at 6:43 AM, Ketan Talaulikar <[email protected]> >> wrote: >> >> Hi Mahesh (and also Jeff and Ashesh), >> >> Thanks for your responses and clarifications. I've gone through them and >> it has been helpful. I am choosing to respond on this thread only so that >> my comments are in one place and easy for the authors to process. >> >> Please check inline below for responses. >> >> >> On Sat, Jun 7, 2025 at 1:38 AM Mahesh Jethanandani < >> [email protected]> wrote: >> >>> Hi Ketan, >>> >>> On May 15, 2025, at 4:05 AM, Ketan Talaulikar <[email protected]> >>> wrote: >>> >>> Hello Authors/WG, >>> >>> Thanks for the work put into this document. It has been in the works for >>> a long time in an on/off mode. There is some more work needed before it can >>> be taken up for IESG evaluation. >>> >>> I would like to share my review of the v18 of this document. >>> >>> General Comment/Suggestion: >>> This is about the contents of this document and its relationship with >>> draft-ietf-bfd-optimizing-authentication and >>> draft-ietf-bfd-secure-sequence-numbers. I believe this document does not >>> depend on those other two, at least not normatively as indicated today. >>> This proposal is self sufficient with the new null auth type and the two >>> existing BFD auth types that use meticulous incrementing sequence numbers. >>> As such, for smooth progression of this work, I would strongly recommend >>> removing all references to those drafts and the ISAAC-based auth types or >>> the Optimized Auth from this document. The >>> draft-ietf-bfd-secure-sequence-numbers that actually specifies the two >>> ISAAC-based auth types can instead refer to the draft-ietf-bfd-stability to >>> indicate that those new auth types are suitable for use for measuring BFD >>> packet loss. This way, this document becomes independent of the other two >>> for its further processing. >>> >>> >>> This draft does refer to draft-ietf-bfd-secure-sequence-numbers, but >>> that reference can be informative instead of normative. And you are right, >>> there is no reference to draft-ietf-bfd-secure-sequence-numbers from this >>> document, and we can drop it being mentioned in Section 12, Normative >>> References. >>> >> >> KT> Thanks. >> >> >>> >>> >>> Please find below my comments in the idnits output of v18 and look for >>> <EoRv18> at the very end of the review. If you don't see that, then likely >>> the email has been truncated by your email client and you should look at >>> the BFD WG email archive for the full version. >>> >>> Thanks, >>> Ketan >>> >>> >>> 14 BFD Stability >>> 15 draft-ietf-bfd-stability-18 >>> >>> 17 Abstract >>> >>> 19 This document describes extensions to the Bidirectional Forwarding >>> 20 Detection (BFD) protocol to measure BFD stability. Specifically, it >>> 21 describes a mechanism for detection of BFD packet loss. >>> >>> <major> The title/name of "BFD Stability" is misleading to me. It gives >>> an >>> impression of how stable is the BFD session, as in - is it flapping a >>> lot or is >>> staying up and stable for a long interval? Why not call this BFD Packet >>> Loss >>> Monitoring ... or something like that which is a simple term and yet >>> perhaps >>> gives the true picture of what this feature is about? >>> >>> >>> As we discussed, counting of lost packets is just a method. What is >>> missing in todays implementations is the quality of the session, as in, >>> whether the session is Up while dropping packets or is Up and not dropping >>> any packets. Something that can predict whether the session is stable. I am >>> open to a suggestion that reflects that sentiment. Something more than this >>> draft counts lost packets 😜 >>> >> >> KT> Thanks for the context and discussions from Jeff, Mahesh and Ashesh. >> I don't have a better technical term to offer and so let us go with what >> the WG has come up with. Please see if you could add some explanatory text >> that paraphrases what you all (I especially found the way Ashesh put it to >> be helpful) have said to provide a context to the reader (i.e., those >> reviewing during the IETF LC, the IESG, and readers after publication). >> >> >>> >>> >>> 98 This document does not propose any BFD extension to measure data >>> 99 traffic loss or delay on a link or tunnel and the scope is limited >>> to >>> 100 BFD packets. >>> >>> <major> Please provide some text for justification for the experimental >>> status - something on similar lines as the other two documents will work >>> just as well. >>> >>> >>> Ok. Taking a cue from the other drafts here is what I am suggesting as >>> text (in the Appendix): >>> >>> This document describes an experiment that will present a candidate >>> solution to predict whether a given BFD session will continue to be >>> stable. The experiment will use the packet lost count and the >>> ‘received-packet-count’ defined in [RFC 9314] to determine how stable is >>> the session. The reason for why this document is on an Experimental track >>> is because there is no known implementations or proof-of-concept. As a >>> result, the authors are not clear whether a simple lost count is enough to >>> predict the stability or there will be a need to have a more granular count. >>> >>> This document is classified as Experimental and is not part of the IETF >>> Standards Track. >>> >>> >> KT> Thanks. >> >> >>> >>> >>> 120 The reader is expected to be familiar with the BFD [RFC5880], >>> 121 Optimizing BFD Authentication >>> 122 [I-D.ietf-bfd-optimizing-authentication] and Meticulous Keyed ISAAC >>> 123 for BFD Authentication [I-D.ietf-bfd-secure-sequence-numbers]. >>> >>> <major> I see no reason for the above two references or dependencies in >>> this >>> document. They seem unnecessary to me. What is the normative (must have) >>> dependency that I am missing? And why is even an informative reference >>> really >>> necessary? >>> >>> >>> See above. >>> >> >> KT> Ack >> >> >>> >>> >>> 139 In a faulty datapath scenario, an operator can use BFD health >>> 140 information to trigger delay and loss measurement OAM protocol >>> 141 (Connectivity Fault Management (CFM) or Loss Measurement (LM)-Delay >>> 142 Measurement (DM)) to further isolate the issue. >>> >>> <minor> Please provide informative references for the CFM and DM >>> technologies >>> >>> >>> Ok. I am going to reference Y.1731 as: >>> >>> [Y.1731] ITU-T, "OAM Functions and Mechanisms for Ethernet-based >>> Networks", Recommendation G.8013/Y.1731, November 2013. >>> >>> >>> and DM as described in RFC 6374. >>> >>> >> KT> Ack >> >>> >>> >>> >>> >>> 150 5. NULL Auth Type >>> >>> <question> Why is a null auth type, or even a sequence number necessary >>> for BFD >>> packet loss calculation? Is it not OK to expect that the other endpoint >>> is >>> going to send X number of packets every interval? And if we don't get >>> those X >>> packets at every interval, then we have a packet loss? Perhaps I am >>> missing >>> something obvious and if so, it would be good to capture the rationale >>> that >>> really needs these sequence numbers for this measurement. >>> >>> 179 Auth Key ID: The authentication key ID in use for this packet. >>> Must >>> 180 be set to zero and ignored on receipt. >>> >>> <minor> s/must/MUST >>> >>> >>> Ok. >>> >> >> KT> Thanks >> >> >>> >>> >>> 216 6.1. Loss Measurement >>> >>> 218 Loss measurement counts the number of BFD control packets missed at >>> 219 the receiver during any Detection Time period. The loss is >>> detected >>> 220 by comparing the Sequence Number field in successive BFD control >>> 221 packets. The Sequence Number in each successive control packet >>> 222 generated on a BFD session by the transmitter is incremented by >>> one. >>> 223 This loss count can then be exposed using the YANG module defined >>> in >>> 224 the subsequent section. >>> >>> <major> Packets may be reordered and arrive with different delays. Let >>> us say that the >>> packet that was supposed to arrive in interval I were delayed to arrive >>> in interval >>> I+1. i.e., we get one extra packet in the interval I+1. This does not >>> indicate >>> a packet loss in interval I, but the procedure above seems to log it as >>> a packet loss? >>> >>> >>> This issue is discussed later in Section 6.2 titled Out of Order Packets. >>> >> >> KT> Please see if you can put a forward reference. >> >> >>> >>> >>> 226 The first BFD authentication section with a non-zero sequence >>> number, >>> 227 in a valid BFD control packet, processed by the receiver is used >>> for >>> 228 bootstrapping the logic. >>> >>> <major> Is the loss counter reset when the BFD session goes down? Is >>> there a >>> notion of time period that is tracked/reported here? Is there a notion >>> of a >>> percentage of BFD packets lost that is being reported? How useful is it >>> to >>> simply report the lost packet count without any of these other contexts? >>> Looking at the model, the history of this data for the previous uptime >>> is also >>> not being tracked. Have these aspects been considered by the WG? >>> >>> >>> As stated above, a section will describe the experiment that this >>> document is planning to conduct. Other implementations can go further and >>> do on the box mapping packet loss to a time interval, when the loss >>> happened and do further analytics. But that is outside the scope of this >>> draft. >>> >> >> KT> In view of all of your responses, I would strongly recommend adding >> some text that at least touches upon the use of telemetry or even in >> general an external monitoring mechanism being able to leverage this data >> along with existing counters to get a better insight into the stability of >> the BFD session. And, of course, say that such mechanisms are outside the >> scope of this document. This will help those reviewing/reading down the >> publication path and pre-empt some of the same questions that I asked. >> >> >>> >>> >>> 239 Implementations MAY provide mechanisms wherein all expected packets >>> 240 received across an expected interval but delivered out of order are >>> 241 not considered lost packets. >>> >>> <major> Why is this not a MUST? How is it ok to do incorrect and >>> inaccurate >>> reporting of BFD packet loss? Please see my previous comment. >>> >>> >>> Good question. I am going to let other BFD experts pitch in. A quick >>> look at RFC 5880 tells me it is silent on out of order packets, and keeping >>> track of out of order packets will require a modification to the protocol. >>> >> >> KT> There wasn't a problem accepting out of order packets in base BFD >> (w/o auth). With proper auth, they would be dropped. Here, there is really >> no auth and the null auth is only for measuring packet loss. So, I still >> feel that the implementation at least SHOULD (if not MUST) consider and >> factor in these out of order packets i.e., not consider them as loss. The >> document does not say that out of order delivery is an error condition that >> is being measured/monitored. >> >> >>> >>> >>> 243 7. Stability YANG Module >>> >>> <question> I am not an IETF YANG expert. I would like to check if there >>> are >>> any issues with an experimental RFC augmenting a standards track YANG >>> model. >>> >>> >>> I do not believe there is an issue, as the recent discussion on netmod >>> mailing list reveal. >>> >> >> KT> Ack - we are good here. >> >> >>> >>> >>> 599 9. Security Consideration >>> >>> 601 9.1. YANG Security Considerations >>> >>> <minor> Please reorder the sections. I know some of the authors are YANG >>> champs, but let us not put the cart before the horse :-) >>> >>> >>> Do you mean discussing BFD NULL Auth Security Considerations before YANG >>> Security Considerations? I can do that, but they are discussing two very >>> different aspects of the draft. One is talking about Security >>> Considerations of the protocol, what can happen when a malicious user >>> injects packets etc., while the other one is talking about security >>> considerations as it relates to managing the feature on the box. >>> >> >> KT> Yes. Please see if you could split them into sub-sections as done for >> the optimizing auth document. We'll need to cover both aspects anyway. >> >> >>> >>> >>> 626 addition, and as stated in Out of Order Packets (Section 6.2), on >>> 627 links such as LAG or ECMP, there is a possibility of packets being >>> 628 delivered out of order. A strict comparison of increasing sequence >>> 629 numbers may result in classifying those out of order packets as >>> 630 packet loss. >>> >>> <minor> Does this text blob not belong to the Null Auth or a separate BFD >>> Packet loss monitoring sub-section? >>> >>> >>> Ok. This text already appears in Section 6.2. Therefore, we can drop the >>> last sentence. >>> >> >> KT> I think it was important in the security consideration - was just >> checking if it should be in its own sub-section focused on null-auth itself >> and not the YANG part. >> >> >>> >>> >>> 652 When the NULL Authentication type is used for BFD Stability >>> purposes, >>> 653 maliciously injected packets that do not reset the BFD session can >>> 654 resemble high packet loss. Sessions such as, multi-hop routed >>> paths, >>> 655 tunnels without authentication, or MPLS LSP, therefore, have >>> security >>> 656 guarantees that are identical to situations where BFD is run >>> without >>> 657 authentication. >>> >>> <minor> How about someone could manipulate the sequence numbers and give >>> a >>> wrong idea of packet loss? Possibly raise false alarms? >>> >>> >>> The NULL authentication mechanism uses the Meticulous Keyed ISAAC for >>> generating and inserting a sequence number in the packet. On the wire, the >>> sequence number is not meticulous and therefore it is very hard for anybody >>> other than the sender and the receiver to guess what that sequence number >>> should be on the wire. >>> >> >> KT> OK. Let us see what we get from the security folks. >> >> Thanks, >> Ketan >> >> >>> >>> Thanks. >>> >>> >>> <EoRv18> >>> >>> >>> Mahesh Jethanandani >>> [email protected] >>> >>> >>> >>> >> >> Mahesh Jethanandani >> [email protected] >> >> >> >> >> >> >>
