Re: AD Evaluation Review of draft-ietf-bfd-stability-18

Ketan Talaulikar Wed, 11 Jun 2025 19:52:11 -0700

Hi Mahesh,

Thanks for posting the update to this draft.


I'll now wait for the response to the review of the sequence number draft
and the updates. Once we close on that, I will do a quick round of check
across the 3 documents before progressing them together.

Thanks,
Ketan


On Tue, Jun 10, 2025 at 11:37 AM Ketan Talaulikar <[email protected]>
wrote:

> Hi Mahesh,
>
> Thanks for sharing the updates. It looks good to me. Just some minor
> suggestions:
>
> 1) Please see if you can leave out the "on the box" part out of the text
> related to further analysis of the provided BFD packet loss statistics.
> This will leave the door open for both on/off box (as in telemetry based)
> solutions. Also, consider if you would like to add the sentence about
> leveraging this new reported stat along with the received packet stats from
> RFC 9314 for determination of session stability - perhaps the same one that
> is there in Appendix A?
>
> 2) You could also leave out the BFD Sequence Number reference from Section
> 2. The reference to the sequence numbers draft in section 6 is correct and
> entirely informative.
>
> 3) In Appendix A: s/experimental/experiment
>
> Please go ahead and post the update. That way the WG will also have some
> time to review while the authors work on the changes to the sequence
> numbers draft.
>
> Thanks,
> Ketan
>
>
> On Tue, Jun 10, 2025 at 3:54 AM Mahesh Jethanandani <
> [email protected]> wrote:
>
>> Hi Ketan,
>>
>> Please find enclosed the proposed changes to the draft.
>>
>>
>>
>> On Jun 9, 2025, at 6:43 AM, Ketan Talaulikar <[email protected]>
>> wrote:
>>
>> Hi Mahesh (and also Jeff and Ashesh),
>>
>> Thanks for your responses and clarifications. I've gone through them and
>> it has been helpful. I am choosing to respond on this thread only so that
>> my comments are in one place and easy for the authors to process.
>>
>> Please check inline below for responses.
>>
>>
>> On Sat, Jun 7, 2025 at 1:38 AM Mahesh Jethanandani <
>> [email protected]> wrote:
>>
>>> Hi Ketan,
>>>
>>> On May 15, 2025, at 4:05 AM, Ketan Talaulikar <[email protected]>
>>> wrote:
>>>
>>> Hello Authors/WG,
>>>
>>> Thanks for the work put into this document. It has been in the works for
>>> a long time in an on/off mode. There is some more work needed before it can
>>> be taken up for IESG evaluation.
>>>
>>> I would like to share my review of the v18 of this document.
>>>
>>> General Comment/Suggestion:
>>> This is about the contents of this document and its relationship with
>>> draft-ietf-bfd-optimizing-authentication and
>>> draft-ietf-bfd-secure-sequence-numbers. I believe this document does not
>>> depend on those other two, at least not normatively as indicated today.
>>> This proposal is self sufficient with the new null auth type and the two
>>> existing BFD auth types that use meticulous incrementing sequence numbers.
>>> As such, for smooth progression of this work, I would strongly recommend
>>> removing all references to those drafts and the ISAAC-based auth types or
>>> the Optimized Auth from this document. The
>>> draft-ietf-bfd-secure-sequence-numbers that actually specifies the two
>>> ISAAC-based auth types can instead refer to the draft-ietf-bfd-stability to
>>> indicate that those new auth types are suitable for use for measuring BFD
>>> packet loss. This way, this document becomes independent of the other two
>>> for its further processing.
>>>
>>>
>>> This draft does refer to draft-ietf-bfd-secure-sequence-numbers, but
>>> that reference can be informative instead of normative. And you are right,
>>> there is no reference to draft-ietf-bfd-secure-sequence-numbers from this
>>> document, and we can drop it being mentioned in Section 12, Normative
>>> References.
>>>
>>
>> KT> Thanks.
>>
>>
>>>
>>>
>>> Please find below my comments in the idnits output of v18 and look for
>>> <EoRv18> at the very end of the review. If you don't see that, then likely
>>> the email has been truncated by your email client and you should look at
>>> the BFD WG email archive for the full version.
>>>
>>> Thanks,
>>> Ketan
>>>
>>>
>>> 14                             BFD Stability
>>> 15                      draft-ietf-bfd-stability-18
>>>
>>> 17 Abstract
>>>
>>> 19   This document describes extensions to the Bidirectional Forwarding
>>> 20   Detection (BFD) protocol to measure BFD stability.  Specifically, it
>>> 21   describes a mechanism for detection of BFD packet loss.
>>>
>>> <major> The title/name of "BFD Stability" is misleading to me. It gives
>>> an
>>> impression of how stable is the BFD session, as in - is it flapping a
>>> lot or is
>>> staying up and stable for a long interval? Why not call this BFD Packet
>>> Loss
>>> Monitoring ... or something like that which is a simple term and yet
>>> perhaps
>>> gives the true picture of what this feature is about?
>>>
>>>
>>> As we discussed, counting of lost packets is just a method. What is
>>> missing in todays implementations is the quality of the session, as in,
>>> whether the session is Up while dropping packets or is Up and not dropping
>>> any packets. Something that can predict whether the session is stable. I am
>>> open to a suggestion that reflects that sentiment. Something more than this
>>> draft counts lost packets 😜
>>>
>>
>> KT> Thanks for the context and discussions from Jeff, Mahesh and Ashesh.
>> I don't have a better technical term to offer and so let us go with what
>> the WG has come up with. Please see if you could add some explanatory text
>> that paraphrases what you all (I especially found the way Ashesh put it to
>> be helpful) have said to provide a context to the reader (i.e., those
>> reviewing during the IETF LC, the IESG, and readers after publication).
>>
>>
>>>
>>>
>>> 98   This document does not propose any BFD extension to measure data
>>> 99   traffic loss or delay on a link or tunnel and the scope is limited
>>> to
>>> 100   BFD packets.
>>>
>>> <major> Please provide some text for justification for the experimental
>>> status - something on similar lines as the other two documents will work
>>> just as well.
>>>
>>>
>>> Ok. Taking a cue from the other drafts here is what I am suggesting as
>>> text (in the Appendix):
>>>
>>> This document describes an experiment that will present a candidate
>>> solution to predict whether a given  BFD session will continue to be
>>> stable. The experiment will use the packet lost count and the
>>> ‘received-packet-count’ defined in [RFC 9314] to determine how stable is
>>> the session. The reason for why this document is on an Experimental track
>>> is because there is no known implementations or proof-of-concept. As a
>>> result, the authors are not clear whether a simple lost count is enough to
>>> predict the stability or there will be a need to have a more granular count.
>>>
>>> This document is classified as Experimental and is not part of the IETF
>>> Standards Track.
>>>
>>>
>> KT> Thanks.
>>
>>
>>>
>>>
>>> 120   The reader is expected to be familiar with the BFD [RFC5880],
>>> 121   Optimizing BFD Authentication
>>> 122   [I-D.ietf-bfd-optimizing-authentication] and Meticulous Keyed ISAAC
>>> 123   for BFD Authentication [I-D.ietf-bfd-secure-sequence-numbers].
>>>
>>> <major> I see no reason for the above two references or dependencies in
>>> this
>>> document. They seem unnecessary to me. What is the normative (must have)
>>> dependency that I am missing? And why is even an informative reference
>>> really
>>> necessary?
>>>
>>>
>>> See above.
>>>
>>
>> KT> Ack
>>
>>
>>>
>>>
>>> 139   In a faulty datapath scenario, an operator can use BFD health
>>> 140   information to trigger delay and loss measurement OAM protocol
>>> 141   (Connectivity Fault Management (CFM) or Loss Measurement (LM)-Delay
>>> 142   Measurement (DM)) to further isolate the issue.
>>>
>>> <minor> Please provide informative references for the CFM and DM
>>> technologies
>>>
>>>
>>> Ok. I am going to reference Y.1731 as:
>>>
>>>    [Y.1731]  ITU-T, "OAM Functions and Mechanisms for Ethernet-based
>>>              Networks", Recommendation G.8013/Y.1731, November 2013.
>>>
>>>
>>> and DM as described in RFC 6374.
>>>
>>>
>> KT> Ack
>>
>>>
>>>
>>>
>>>
>>> 150 5.  NULL Auth Type
>>>
>>> <question> Why is a null auth type, or even a sequence number necessary
>>> for BFD
>>> packet loss calculation? Is it not OK to expect that the other endpoint
>>> is
>>> going to send X number of packets every interval? And if we don't get
>>> those X
>>> packets at every interval, then we have a packet loss? Perhaps I am
>>> missing
>>> something obvious and if so, it would be good to capture the rationale
>>> that
>>> really needs these sequence numbers for this measurement.
>>>
>>> 179   Auth Key ID: The authentication key ID in use for this packet.
>>> Must
>>> 180   be set to zero and ignored on receipt.
>>>
>>> <minor> s/must/MUST
>>>
>>>
>>> Ok.
>>>
>>
>> KT> Thanks
>>
>>
>>>
>>>
>>> 216 6.1.  Loss Measurement
>>>
>>> 218   Loss measurement counts the number of BFD control packets missed at
>>> 219   the receiver during any Detection Time period.  The loss is
>>> detected
>>> 220   by comparing the Sequence Number field in successive BFD control
>>> 221   packets.  The Sequence Number in each successive control packet
>>> 222   generated on a BFD session by the transmitter is incremented by
>>> one.
>>> 223   This loss count can then be exposed using the YANG module defined
>>> in
>>> 224   the subsequent section.
>>>
>>> <major> Packets may be reordered and arrive with different delays. Let
>>> us say that the
>>> packet that was supposed to arrive in interval I were delayed to arrive
>>> in interval
>>> I+1. i.e., we get one extra packet in the interval I+1. This does not
>>> indicate
>>> a packet loss in interval I, but the procedure above seems to log it as
>>> a packet loss?
>>>
>>>
>>> This issue is discussed later in Section 6.2 titled Out of Order Packets.
>>>
>>
>> KT> Please see if you can put a forward reference.
>>
>>
>>>
>>>
>>> 226   The first BFD authentication section with a non-zero sequence
>>> number,
>>> 227   in a valid BFD control packet, processed by the receiver is used
>>> for
>>> 228   bootstrapping the logic.
>>>
>>> <major> Is the loss counter reset when the BFD session goes down? Is
>>> there a
>>> notion of time period that is tracked/reported here? Is there a notion
>>> of a
>>> percentage of BFD packets lost that is being reported? How useful is it
>>> to
>>> simply report the lost packet count without any of these other contexts?
>>> Looking at the model, the history of this data for the previous uptime
>>> is also
>>> not being tracked. Have these aspects been considered by the WG?
>>>
>>>
>>> As stated above, a section will describe the experiment that this
>>> document is planning to conduct. Other implementations can go further and
>>> do on the box mapping packet loss to a time interval, when the loss
>>> happened and do further analytics. But that is outside the scope of this
>>> draft.
>>>
>>
>> KT> In view of all of your responses, I would strongly recommend adding
>> some text that at least touches upon the use of telemetry or even in
>> general an external monitoring mechanism being able to leverage this data
>> along with existing counters to get a better insight into the stability of
>> the BFD session. And, of course, say that such mechanisms are outside the
>> scope of this document. This will help those reviewing/reading down the
>> publication path and pre-empt some of the same questions that I asked.
>>
>>
>>>
>>>
>>> 239   Implementations MAY provide mechanisms wherein all expected packets
>>> 240   received across an expected interval but delivered out of order are
>>> 241   not considered lost packets.
>>>
>>> <major> Why is this not a MUST? How is it ok to do incorrect and
>>> inaccurate
>>> reporting of BFD packet loss? Please see my previous comment.
>>>
>>>
>>> Good question. I am going to let other BFD experts pitch in. A quick
>>> look at RFC 5880 tells me it is silent on out of order packets, and keeping
>>> track of out of order packets will require a modification to the protocol.
>>>
>>
>> KT> There wasn't a problem accepting out of order packets in base BFD
>> (w/o auth). With proper auth, they would be dropped. Here, there is really
>> no auth and the null auth is only for measuring packet loss. So, I still
>> feel that the implementation at least SHOULD (if not MUST) consider and
>> factor in these out of order packets i.e., not consider them as loss. The
>> document does not say that out of order delivery is an error condition that
>> is being measured/monitored.
>>
>>
>>>
>>>
>>> 243 7.  Stability YANG Module
>>>
>>> <question> I am not an IETF YANG expert. I would like to check if there
>>> are
>>> any issues with an experimental RFC augmenting a standards track YANG
>>> model.
>>>
>>>
>>> I do not believe there is an issue, as the recent discussion on netmod
>>> mailing list reveal.
>>>
>>
>> KT> Ack - we are good here.
>>
>>
>>>
>>>
>>> 599 9.  Security Consideration
>>>
>>> 601 9.1.  YANG Security Considerations
>>>
>>> <minor> Please reorder the sections. I know some of the authors are YANG
>>> champs, but let us not put the cart before the horse :-)
>>>
>>>
>>> Do you mean discussing BFD NULL Auth Security Considerations before YANG
>>> Security Considerations? I can do that, but they are discussing two very
>>> different aspects of the draft. One is talking about Security
>>> Considerations of the protocol, what can happen when a malicious user
>>> injects packets etc., while the other one is talking about security
>>> considerations as it relates to managing the feature on the box.
>>>
>>
>> KT> Yes. Please see if you could split them into sub-sections as done for
>> the optimizing auth document. We'll need to cover both aspects anyway.
>>
>>
>>>
>>>
>>> 626   addition, and as stated in Out of Order Packets (Section 6.2), on
>>> 627   links such as LAG or ECMP, there is a possibility of packets being
>>> 628   delivered out of order.  A strict comparison of increasing sequence
>>> 629   numbers may result in classifying those out of order packets as
>>> 630   packet loss.
>>>
>>> <minor> Does this text blob not belong to the Null Auth or a separate BFD
>>> Packet loss monitoring sub-section?
>>>
>>>
>>> Ok. This text already appears in Section 6.2. Therefore, we can drop the
>>> last sentence.
>>>
>>
>> KT> I think it was important in the security consideration - was just
>> checking if it should be in its own sub-section focused on null-auth itself
>> and not the YANG part.
>>
>>
>>>
>>>
>>> 652   When the NULL Authentication type is used for BFD Stability
>>> purposes,
>>> 653   maliciously injected packets that do not reset the BFD session can
>>> 654   resemble high packet loss.  Sessions such as, multi-hop routed
>>> paths,
>>> 655   tunnels without authentication, or MPLS LSP, therefore, have
>>> security
>>> 656   guarantees that are identical to situations where BFD is run
>>> without
>>> 657   authentication.
>>>
>>> <minor> How about someone could manipulate the sequence numbers and give
>>> a
>>> wrong idea of packet loss? Possibly raise false alarms?
>>>
>>>
>>> The NULL authentication mechanism uses the Meticulous Keyed ISAAC for
>>> generating and inserting a sequence number in the packet. On the wire, the
>>> sequence number is not meticulous and therefore it is very hard for anybody
>>> other than the sender and the receiver to guess what that sequence number
>>> should be on the wire.
>>>
>>
>> KT> OK. Let us see what we get from the security folks.
>>
>> Thanks,
>> Ketan
>>
>>
>>>
>>> Thanks.
>>>
>>>
>>> <EoRv18>
>>>
>>>
>>> Mahesh Jethanandani
>>> [email protected]
>>>
>>>
>>>
>>>
>>
>> Mahesh Jethanandani
>> [email protected]
>>
>>
>>
>>
>>
>>
>>

Re: AD Evaluation Review of draft-ietf-bfd-stability-18

Reply via email to