I guess the reality is that the diag field is a weak and ambiguous feature that I spent five minutes thinking about, and is ultimately (and happily) not subject to normative verification because it is write-only.
The received value is edge-triggerable if you grab the value during the next session establishment phase (you get at least one packet from the other guy prior to the session coming Up), so I suppose one could change the language to zero the local value on the Up transition as suggested without losing the (mis)feature altogether. If that’s the case, I would add a sentence in the field description saying that the received diag value refers to the previous session when received in a packet bearing a state other than Up. Of course this means that the value received with Up is ambiguous, because it could be referring to the previous session or the current one, depending on how it was implemented, but that’s already the case. Or is it possible to simply log a permanent erratum that explains the ambiguity and encourages people not to worry about it? It’s already taken up more time than it is worth, considering that it cannot affect interoperability (and so I suppose never should have been normative, but we’d probably still be discussing it anyhow). Thanks, —Dave On Feb 22, 2023, at 8:14 AM, Jeffrey Haas <[email protected]<mailto:[email protected]>> wrote: [External Email. Be cautious of content] Dave, Just as a reminder, the context for why this errata is being discussed is this inquiry: https://mailarchive.ietf.org/arch/msg/rtg-bfd/YIeCo-nQicI_OIcVncYaJM5Zz6c/<https://urldefense.com/v3/__https://mailarchive.ietf.org/arch/msg/rtg-bfd/YIeCo-nQicI_OIcVncYaJM5Zz6c/__;!!NEt6yMaO-gk!GVv_NZcFCL5MFRsyAGepCNjAgo_HL9681GUVxIxvb1r3cJUhJggRmVYcVL1krqjB7EUqNmQR11p9$> More below: On Feb 17, 2023, at 12:04 PM, Dave Katz <[email protected]<mailto:[email protected]>> wrote: On Feb 17, 2023, at 8:47 AM, Reshad Rahman <[email protected]<mailto:[email protected]>> wrote: Having the diag field as breadcrumb has been extremely useful indeed. But both ends can store last diag field sent/received, we don't have to keep sending the diag field after the failure has cleared. It seems odd to be sending a diag field which happened e.g. a year ago. That property helped me when debugging my implementation, as it survives the restart/reboot of the far end. There is also no timeout that would make sense; “forever, for small values of ‘forever'” is semantically consistent and the only thing that makes sense (to me, at least). Resetting it to zero only deletes information (albeit a tiny amount of it) and doesn’t help anything; you know that the session is up, so the diagnostic for its most recent transition to non-upness is disambiguated. Debugging broken things is a scramble for bits of data; leaving a breadcrumb is a polite gift. From my perspective, the breadcrumb is useful to note during the transitions, and not simply the transitions for the state. Examples have been given where the diag is updated as part of a state transition (governed by normative text in 5880), or transitions that may happen while the session remains up (e.g. concatenated path down, echo, etc.). The RFC isn't great about saying how you clear such things when the state is still Up; intuitively it's to return to "No Diagnostic". However, your own leaning, Dave, is "leave it set forever". Using the above examples for diag signaling an event while leaving the state up, I don't think you mean that. So, again, the interesting breadcrumbs are when Things Change. Each of these items is an edge transition of note. If I care about the event, I care about it when it happens and will remember it. I'm not going to look at diag to reflect this forever. Also the text in 6.8.1 says "The diagnostic code specifying the reason for the most recent change in the local session state.". To me that means resetting bfd.LocalDiag to 0 when the state changes to Up. Thus the language that needs fixing (I know, I wrote it...) AFAIK IOS-XR and JunOS reset LocalDiag. It'd be good to hear from other implementations. Probably. I might have even coded it that way 20 years ago, or someone else did later, thus underscoring the largely-irrelevant nature of this discussion... I did confirm with Juniper's BFD developers that it's reset to 0 when we transition to Up. Given the maturity of the feature, I'd suggest sticking to the reality on the ground. -- Jeff
