On Thursday, March 23, 2023, 02:36:53 PM EDT, Jeffrey Haas <[email protected]>
wrote:
On Mar 23, 2023, at 2:17 PM, Reshad Rahman <[email protected]>
wrote:
Hi all,
+1 to Jeff's comment on not wanting to pretend that everything is fine.
And if we're running BFD single-hop and BFDoLAG where needed, this is a
non-issue right?
Not quite.
In theory, if we had a full set of link tests from A..Z, including exercising
each LAG member, one would think everything should be fine. This is an ideal
basis case.
In practice, what's often seen is that even with full coverage of the paths
that there are end-to-end forwarding faults for various reasons. In at least
some of these cases it's because BFD is implemented in a layer that isn't
exercising the full data path. To pick a somewhat vendor neutral example,
consider BFD implemented directly on the line card but not participating in the
layer 3 ECMP load balancer, or at the LAG level not participating in the layer
2 equivalent.<RR> That does seem to be a correct implementation, but besides
the point...
It's for reasons like this that we have discussions about whether it makes
sense to run single-hop BFD in addition to BFD-on-LAG covering the same
link.<RR> Right. This is what I meant by "running BFD SH and BFDoLAG where
needed". I'd think that running BFD SH on top of LAG, even if BFDoLAG is
enabled, would be needed anyway just because they exercise different layers.
(It's also worth reminding the Working Group that these types of discussions
were a motivation for the LIME Working Group we had some years ago. It very
much covered this space, but didn't come to successful outcomes.)
Going back to Abhinav's original question, here are my own observations:
RFC 5880 tells us that once a session is Up, we should demultiplex solely based
on the Discriminators. (RFC 5880, §6.3)
RFC 5881, used by RFC 5883 tells us that we MUST NOT change the source ports.
However, it doesn't provide a lot of justification for the WHY of that. Given
the prior point, what is the harm? Some speculation:
- Even if you MUST demux based on Discriminators, I wouldn't place wagers on
there being no implementations that aren't looking at the full layer-4
signature as part of the procedures. In particular, middlebox steering may get
in the way.- It's often necessary for hardware based BFD implementations to put
in exceptions to rate policers to permit BFD to work.
Speculation aside, changing the source port most likely would work.
Is it a good idea? Probably not.
Is it a great tool to try to exercise specific legs of an ECMP? Almost
certainly not at high rates. It'd also be clumsy.
Could you do this with some level of success? Probably.
Would I want to support debugging issues with this as a vendor? No.<RR> +1 to
all the above.
Regards,Reshad.
-- Jeff