Hi Mahesh,

Thanks for posting the update to this draft.

I'll now wait for the response to the review of the sequence number draft
and the updates. Once we close on that, I will do a quick round of check
across the 3 documents before progressing them together.

Thanks,
Ketan


On Tue, Jun 10, 2025 at 11:37 AM Ketan Talaulikar <[email protected]>
wrote:

> Hi Mahesh,
>
> Thanks for sharing the updates. It looks good to me. Just some minor
> suggestions:
>
> 1) Please see if you can leave out the "on the box" part out of the text
> related to further analysis of the provided BFD packet loss statistics.
> This will leave the door open for both on/off box (as in telemetry based)
> solutions. Also, consider if you would like to add the sentence about
> leveraging this new reported stat along with the received packet stats from
> RFC 9314 for determination of session stability - perhaps the same one that
> is there in Appendix A?
>
> 2) You could also leave out the BFD Sequence Number reference from Section
> 2. The reference to the sequence numbers draft in section 6 is correct and
> entirely informative.
>
> 3) In Appendix A: s/experimental/experiment
>
> Please go ahead and post the update. That way the WG will also have some
> time to review while the authors work on the changes to the sequence
> numbers draft.
>
> Thanks,
> Ketan
>
>
> On Tue, Jun 10, 2025 at 3:54 AM Mahesh Jethanandani <
> [email protected]> wrote:
>
>> Hi Ketan,
>>
>> Please find enclosed the proposed changes to the draft.
>>
>>
>>
>> On Jun 9, 2025, at 6:43 AM, Ketan Talaulikar <[email protected]>
>> wrote:
>>
>> Hi Mahesh (and also Jeff and Ashesh),
>>
>> Thanks for your responses and clarifications. I've gone through them and
>> it has been helpful. I am choosing to respond on this thread only so that
>> my comments are in one place and easy for the authors to process.
>>
>> Please check inline below for responses.
>>
>>
>> On Sat, Jun 7, 2025 at 1:38 AM Mahesh Jethanandani <
>> [email protected]> wrote:
>>
>>> Hi Ketan,
>>>
>>> On May 15, 2025, at 4:05 AM, Ketan Talaulikar <[email protected]>
>>> wrote:
>>>
>>> Hello Authors/WG,
>>>
>>> Thanks for the work put into this document. It has been in the works for
>>> a long time in an on/off mode. There is some more work needed before it can
>>> be taken up for IESG evaluation.
>>>
>>> I would like to share my review of the v18 of this document.
>>>
>>> General Comment/Suggestion:
>>> This is about the contents of this document and its relationship with
>>> draft-ietf-bfd-optimizing-authentication and
>>> draft-ietf-bfd-secure-sequence-numbers. I believe this document does not
>>> depend on those other two, at least not normatively as indicated today.
>>> This proposal is self sufficient with the new null auth type and the two
>>> existing BFD auth types that use meticulous incrementing sequence numbers.
>>> As such, for smooth progression of this work, I would strongly recommend
>>> removing all references to those drafts and the ISAAC-based auth types or
>>> the Optimized Auth from this document. The
>>> draft-ietf-bfd-secure-sequence-numbers that actually specifies the two
>>> ISAAC-based auth types can instead refer to the draft-ietf-bfd-stability to
>>> indicate that those new auth types are suitable for use for measuring BFD
>>> packet loss. This way, this document becomes independent of the other two
>>> for its further processing.
>>>
>>>
>>> This draft does refer to draft-ietf-bfd-secure-sequence-numbers, but
>>> that reference can be informative instead of normative. And you are right,
>>> there is no reference to draft-ietf-bfd-secure-sequence-numbers from this
>>> document, and we can drop it being mentioned in Section 12, Normative
>>> References.
>>>
>>
>> KT> Thanks.
>>
>>
>>>
>>>
>>> Please find below my comments in the idnits output of v18 and look for
>>> <EoRv18> at the very end of the review. If you don't see that, then likely
>>> the email has been truncated by your email client and you should look at
>>> the BFD WG email archive for the full version.
>>>
>>> Thanks,
>>> Ketan
>>>
>>>
>>> 14                             BFD Stability
>>> 15                      draft-ietf-bfd-stability-18
>>>
>>> 17 Abstract
>>>
>>> 19   This document describes extensions to the Bidirectional Forwarding
>>> 20   Detection (BFD) protocol to measure BFD stability.  Specifically, it
>>> 21   describes a mechanism for detection of BFD packet loss.
>>>
>>> <major> The title/name of "BFD Stability" is misleading to me. It gives
>>> an
>>> impression of how stable is the BFD session, as in - is it flapping a
>>> lot or is
>>> staying up and stable for a long interval? Why not call this BFD Packet
>>> Loss
>>> Monitoring ... or something like that which is a simple term and yet
>>> perhaps
>>> gives the true picture of what this feature is about?
>>>
>>>
>>> As we discussed, counting of lost packets is just a method. What is
>>> missing in todays implementations is the quality of the session, as in,
>>> whether the session is Up while dropping packets or is Up and not dropping
>>> any packets. Something that can predict whether the session is stable. I am
>>> open to a suggestion that reflects that sentiment. Something more than this
>>> draft counts lost packets 😜
>>>
>>
>> KT> Thanks for the context and discussions from Jeff, Mahesh and Ashesh.
>> I don't have a better technical term to offer and so let us go with what
>> the WG has come up with. Please see if you could add some explanatory text
>> that paraphrases what you all (I especially found the way Ashesh put it to
>> be helpful) have said to provide a context to the reader (i.e., those
>> reviewing during the IETF LC, the IESG, and readers after publication).
>>
>>
>>>
>>>
>>> 98   This document does not propose any BFD extension to measure data
>>> 99   traffic loss or delay on a link or tunnel and the scope is limited
>>> to
>>> 100   BFD packets.
>>>
>>> <major> Please provide some text for justification for the experimental
>>> status - something on similar lines as the other two documents will work
>>> just as well.
>>>
>>>
>>> Ok. Taking a cue from the other drafts here is what I am suggesting as
>>> text (in the Appendix):
>>>
>>> This document describes an experiment that will present a candidate
>>> solution to predict whether a given  BFD session will continue to be
>>> stable. The experiment will use the packet lost count and the
>>> ‘received-packet-count’ defined in [RFC 9314] to determine how stable is
>>> the session. The reason for why this document is on an Experimental track
>>> is because there is no known implementations or proof-of-concept. As a
>>> result, the authors are not clear whether a simple lost count is enough to
>>> predict the stability or there will be a need to have a more granular count.
>>>
>>> This document is classified as Experimental and is not part of the IETF
>>> Standards Track.
>>>
>>>
>> KT> Thanks.
>>
>>
>>>
>>>
>>> 120   The reader is expected to be familiar with the BFD [RFC5880],
>>> 121   Optimizing BFD Authentication
>>> 122   [I-D.ietf-bfd-optimizing-authentication] and Meticulous Keyed ISAAC
>>> 123   for BFD Authentication [I-D.ietf-bfd-secure-sequence-numbers].
>>>
>>> <major> I see no reason for the above two references or dependencies in
>>> this
>>> document. They seem unnecessary to me. What is the normative (must have)
>>> dependency that I am missing? And why is even an informative reference
>>> really
>>> necessary?
>>>
>>>
>>> See above.
>>>
>>
>> KT> Ack
>>
>>
>>>
>>>
>>> 139   In a faulty datapath scenario, an operator can use BFD health
>>> 140   information to trigger delay and loss measurement OAM protocol
>>> 141   (Connectivity Fault Management (CFM) or Loss Measurement (LM)-Delay
>>> 142   Measurement (DM)) to further isolate the issue.
>>>
>>> <minor> Please provide informative references for the CFM and DM
>>> technologies
>>>
>>>
>>> Ok. I am going to reference Y.1731 as:
>>>
>>>    [Y.1731]  ITU-T, "OAM Functions and Mechanisms for Ethernet-based
>>>              Networks", Recommendation G.8013/Y.1731, November 2013.
>>>
>>>
>>> and DM as described in RFC 6374.
>>>
>>>
>> KT> Ack
>>
>>>
>>>
>>>
>>>
>>> 150 5.  NULL Auth Type
>>>
>>> <question> Why is a null auth type, or even a sequence number necessary
>>> for BFD
>>> packet loss calculation? Is it not OK to expect that the other endpoint
>>> is
>>> going to send X number of packets every interval? And if we don't get
>>> those X
>>> packets at every interval, then we have a packet loss? Perhaps I am
>>> missing
>>> something obvious and if so, it would be good to capture the rationale
>>> that
>>> really needs these sequence numbers for this measurement.
>>>
>>> 179   Auth Key ID: The authentication key ID in use for this packet.
>>> Must
>>> 180   be set to zero and ignored on receipt.
>>>
>>> <minor> s/must/MUST
>>>
>>>
>>> Ok.
>>>
>>
>> KT> Thanks
>>
>>
>>>
>>>
>>> 216 6.1.  Loss Measurement
>>>
>>> 218   Loss measurement counts the number of BFD control packets missed at
>>> 219   the receiver during any Detection Time period.  The loss is
>>> detected
>>> 220   by comparing the Sequence Number field in successive BFD control
>>> 221   packets.  The Sequence Number in each successive control packet
>>> 222   generated on a BFD session by the transmitter is incremented by
>>> one.
>>> 223   This loss count can then be exposed using the YANG module defined
>>> in
>>> 224   the subsequent section.
>>>
>>> <major> Packets may be reordered and arrive with different delays. Let
>>> us say that the
>>> packet that was supposed to arrive in interval I were delayed to arrive
>>> in interval
>>> I+1. i.e., we get one extra packet in the interval I+1. This does not
>>> indicate
>>> a packet loss in interval I, but the procedure above seems to log it as
>>> a packet loss?
>>>
>>>
>>> This issue is discussed later in Section 6.2 titled Out of Order Packets.
>>>
>>
>> KT> Please see if you can put a forward reference.
>>
>>
>>>
>>>
>>> 226   The first BFD authentication section with a non-zero sequence
>>> number,
>>> 227   in a valid BFD control packet, processed by the receiver is used
>>> for
>>> 228   bootstrapping the logic.
>>>
>>> <major> Is the loss counter reset when the BFD session goes down? Is
>>> there a
>>> notion of time period that is tracked/reported here? Is there a notion
>>> of a
>>> percentage of BFD packets lost that is being reported? How useful is it
>>> to
>>> simply report the lost packet count without any of these other contexts?
>>> Looking at the model, the history of this data for the previous uptime
>>> is also
>>> not being tracked. Have these aspects been considered by the WG?
>>>
>>>
>>> As stated above, a section will describe the experiment that this
>>> document is planning to conduct. Other implementations can go further and
>>> do on the box mapping packet loss to a time interval, when the loss
>>> happened and do further analytics. But that is outside the scope of this
>>> draft.
>>>
>>
>> KT> In view of all of your responses, I would strongly recommend adding
>> some text that at least touches upon the use of telemetry or even in
>> general an external monitoring mechanism being able to leverage this data
>> along with existing counters to get a better insight into the stability of
>> the BFD session. And, of course, say that such mechanisms are outside the
>> scope of this document. This will help those reviewing/reading down the
>> publication path and pre-empt some of the same questions that I asked.
>>
>>
>>>
>>>
>>> 239   Implementations MAY provide mechanisms wherein all expected packets
>>> 240   received across an expected interval but delivered out of order are
>>> 241   not considered lost packets.
>>>
>>> <major> Why is this not a MUST? How is it ok to do incorrect and
>>> inaccurate
>>> reporting of BFD packet loss? Please see my previous comment.
>>>
>>>
>>> Good question. I am going to let other BFD experts pitch in. A quick
>>> look at RFC 5880 tells me it is silent on out of order packets, and keeping
>>> track of out of order packets will require a modification to the protocol.
>>>
>>
>> KT> There wasn't a problem accepting out of order packets in base BFD
>> (w/o auth). With proper auth, they would be dropped. Here, there is really
>> no auth and the null auth is only for measuring packet loss. So, I still
>> feel that the implementation at least SHOULD (if not MUST) consider and
>> factor in these out of order packets i.e., not consider them as loss. The
>> document does not say that out of order delivery is an error condition that
>> is being measured/monitored.
>>
>>
>>>
>>>
>>> 243 7.  Stability YANG Module
>>>
>>> <question> I am not an IETF YANG expert. I would like to check if there
>>> are
>>> any issues with an experimental RFC augmenting a standards track YANG
>>> model.
>>>
>>>
>>> I do not believe there is an issue, as the recent discussion on netmod
>>> mailing list reveal.
>>>
>>
>> KT> Ack - we are good here.
>>
>>
>>>
>>>
>>> 599 9.  Security Consideration
>>>
>>> 601 9.1.  YANG Security Considerations
>>>
>>> <minor> Please reorder the sections. I know some of the authors are YANG
>>> champs, but let us not put the cart before the horse :-)
>>>
>>>
>>> Do you mean discussing BFD NULL Auth Security Considerations before YANG
>>> Security Considerations? I can do that, but they are discussing two very
>>> different aspects of the draft. One is talking about Security
>>> Considerations of the protocol, what can happen when a malicious user
>>> injects packets etc., while the other one is talking about security
>>> considerations as it relates to managing the feature on the box.
>>>
>>
>> KT> Yes. Please see if you could split them into sub-sections as done for
>> the optimizing auth document. We'll need to cover both aspects anyway.
>>
>>
>>>
>>>
>>> 626   addition, and as stated in Out of Order Packets (Section 6.2), on
>>> 627   links such as LAG or ECMP, there is a possibility of packets being
>>> 628   delivered out of order.  A strict comparison of increasing sequence
>>> 629   numbers may result in classifying those out of order packets as
>>> 630   packet loss.
>>>
>>> <minor> Does this text blob not belong to the Null Auth or a separate BFD
>>> Packet loss monitoring sub-section?
>>>
>>>
>>> Ok. This text already appears in Section 6.2. Therefore, we can drop the
>>> last sentence.
>>>
>>
>> KT> I think it was important in the security consideration - was just
>> checking if it should be in its own sub-section focused on null-auth itself
>> and not the YANG part.
>>
>>
>>>
>>>
>>> 652   When the NULL Authentication type is used for BFD Stability
>>> purposes,
>>> 653   maliciously injected packets that do not reset the BFD session can
>>> 654   resemble high packet loss.  Sessions such as, multi-hop routed
>>> paths,
>>> 655   tunnels without authentication, or MPLS LSP, therefore, have
>>> security
>>> 656   guarantees that are identical to situations where BFD is run
>>> without
>>> 657   authentication.
>>>
>>> <minor> How about someone could manipulate the sequence numbers and give
>>> a
>>> wrong idea of packet loss? Possibly raise false alarms?
>>>
>>>
>>> The NULL authentication mechanism uses the Meticulous Keyed ISAAC for
>>> generating and inserting a sequence number in the packet. On the wire, the
>>> sequence number is not meticulous and therefore it is very hard for anybody
>>> other than the sender and the receiver to guess what that sequence number
>>> should be on the wire.
>>>
>>
>> KT> OK. Let us see what we get from the security folks.
>>
>> Thanks,
>> Ketan
>>
>>
>>>
>>> Thanks.
>>>
>>>
>>> <EoRv18>
>>>
>>>
>>> Mahesh Jethanandani
>>> [email protected]
>>>
>>>
>>>
>>>
>>
>> Mahesh Jethanandani
>> [email protected]
>>
>>
>>
>>
>>
>>
>>

Reply via email to